About Erik
Erik is a Staff Developer Advocate at MongoDB, specializing in Search.
Erik is a self-proclaimed huge search/findability nerd - “If you can’t find it, it doesn’t exist” (or might as well not exist). He has co-authored Lucene in Action, the definitive guide to using the Lucene library. As a Lucene and open source enthusiast, Erik serves as a member of the Apache Software Foundation and a committer on the Lucene and Solr projects. He began his search career building findability solutions for 19th century literature and art, created Project Blacklight, then co-founded Lucidworks. At Lucidworks, Erik assisted in the development of many production search systems, helped build out a successful enterprise search platform deployed at some of the biggest organizations, and built the initial versions and successfully deployed solutions for typeahead and rules. For the past couple of years, he has been enthusiastically advocating for developers interested in, and building on, MongoDB’s Atlas Search platform.
Where to find Erik on the web:
- On LinkedIn: https://www.linkedin.com/in/erikhatcher/
- Erik's MongoDB presence: https://mdb.link/erik
Let’s start from the beginning — how did you get involved in the search tech industry?
As a fresh Java developer looking for ways to streamline a build system, I discovered a tool called Ant. Ant opened my eyes to the open-source world through the Apache Software Foundation. I immersed myself into the community and code, turning users' questions into learning and teaching opportunities by researching and answering everything I could publicly. This community presence led to me being approached by a publisher, Manning, who encouraged me to write a book on the subject. Rather than go it alone, I recruited a co-author (shout out, Steve!). In the book, we built an example application to manage a personal photo collection which of course warrants a search engine. The Apache ecosystem led me to Lucene, a powerful and easy to use Java library, which we incorporated into our book project. I became enamored with Lucene, and actively joined its community and knew that this technology also deserved a book. Again, I solicited a co-author and Otis joined the effort. I then began working at the University of Virginia building serendipitous discovery tools for 19th century artifacts, and that led to the emergence of Project Blacklight. From there, we founded Lucidworks - and helped the biggest and most interesting organizations in the world.
Tell us about your current role and what you’re working on these days.
I am a Staff Developer Advocate at MongoDB, specializing in Search. As a developer advocate, I spend my time writing content and helping developers incorporate lexical and vector search techniques into their applications built on MongoDB Atlas.
Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?
What’s in a name? Everything! I developed a simple, embedded, search user interface into Solr. Solr’s HTTP end-point paths begin with /solr. The interface was powered by a templating framework named Velocity. I wanted to be clever and riff off the “light” meme of the name Solr, incorporating the “speed” meme of Velocity. The speed of light, known as the constant c
in the infamous equation E=mc^2 stands for Celeritas. So I added a /solr/itas
(get it? celeritas->solr-itas) end-point to Solr, committed it back to the project, and then later learned my mistake. Folks didn’t see it as the clever word play I intended; I was approached at a conference and informed it was seen as a disease! Solr-itis. Opps, that was embarrassing. I then renamed it to /solr/browse
, going into remission from its ailment.
Lesson learned - naming things should be taken seriously, and scrutinized from various perspectives. A rose by any other name will smell as sweet, but could sound ridiculous if read or pronounced differently than intended.
What are some of the biggest misconceptions about search that you often encounter?
It’s under-estimated how sophisticated lexical search can be. My job is to educate developers who have no prior search experience, and it’s generally shocking to them the capabilities and considerations such as: language(s), synonyms, partial matching, and relevancy ranking/tuning. It’s a daunting learning curve, yet the reward is immense when done right.
How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?
We are sure to see two sides emerge: unprecedented semantic capabilities continuing to develop, and the stark reminders that the sophisticated nuances of lexical search are crucially important. AI Powered Search, or is it Search Powered AI? Yes it is!
Can you share an example of a particularly challenging production issue you’ve encountered in your work with search technologies, and the process you used to resolve it?
One challenge with all search engines is “eventual consistency”. If a product record is modified in the database, how quickly will those changes appear to user searches? Sometimes that’s just not fast enough, and I have encountered customers unhappy with that delay when using a search engine for the particular problem isn’t the right fit. I wrote about this in an article titled "When NOT to use Atlas Search"
Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?
While initially building directly on Lucene, when Solr was contributed to the Apache Software Foundation I migrated my work to it. Solr is a server-based system wrapped around Lucene much like Elasticsearch and OpenSearch. Respect to all Lucene-based systems, as they are a scalable means to the Lucene gem inside. The bulk of my search career, at Lucidworks, has been built on Solr. These days, Atlas Search is where I’m at: a great document database backed by the awesome Lucene library! (and our roadmap has it being open sourced and packaged into MongoDB Community Edition before the end of the year)
Is there a log error/alert that terrifies/annoys you in particular?
I look at things differently. To me, error messages are learning opportunities!
What is a golden tip for optimizing search performance that you’ve picked up in your years of experience?
Keep the end goal in mind, measure, analyze, adjust, repeat.
If you're building something from scratch - what does your ideal search tech stack look like?
Of course there’s professional bias here, and being a pragmatic programmer there’s always an “it depends” aspect to technology decisions as the answer is going to depend on concretely what the environment and application being built is, but…. Of course Atlas Search is my starting point. It’s hosted, free to start and build with, and is a very straightforward data-to-search-engine mapping.
What is the most unexpected or unconventional way you’ve seen search technologies applied?
Finding art that matches a color scheme - an oldie but a goodie.
Give us a spicy take/controversial opinion on something related to Search
Lexical search will stay relevant, and become even more important within semantic search. Find what I asked for, not just what you think I meant.
Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?
Ambient Findability, by Peter Morville. Discoverability involves the user interface and serendipity too - not just vectors and TF/IDF.
Anything else you want to share? Feel free to tell us about a product or project you’re working on or anything else that you think the search community will find valuable
Right now I’m immersed in hybrid search techniques - how to combine various methods of indexing and finding content to get the best of all worlds.