Top Voices in Search Tech: Doug Turnbull

Doug Turnbull

The "Top Voices in Search-Tech" initiative is a carefully curated showcase of the most impactful and influential search-tech professionals from around the world that you can connect with and learn from.


About Doug

Doug Turnbull is an expert in search technology and relevance engineering, currently serving as Principal Engineer at Daydream, where he builds hybrid search systems combining lexical and vector retrieval, and develops LLM-driven quality programs for e-commerce search. Previously, he led machine-learning-driven search initiatives at Reddit, significantly improving search relevance through Learning to Rank methods. Doug also advanced e-commerce search at Shopify and served as CTO at OpenSource Connections. He co-authored the influential book Relevant Search (Manning, 2016) and created popular open-source tools, including Quepid and the Elasticsearch Learning to Rank plugin. He regularly speaks at industry conferences, making search relevance accessible to engineers.

Where to find Doug on the web:

Let’s start from the beginning — how did you get involved in the search tech industry?

Mostly through happenstance. I moved to Charlottesville, Virginia, from the DC area. It’s a couple of hours away and a smaller college town. In 2012, I was doing remote work for my DC-based company, which was miserable. I wanted to get to know the Charlottesville tech community, so when a neighborhood block party was coming up, I decided to wear a nerdy t-shirt and see who would connect with me. My shirt said something like, “My code never has bugs—it always has features.” That’s when Eric Pew, who’s long been active in the Solr community, saw me and said, “Hey, we have a consulting firm.” We started chatting, and one thing led to another. I went from being a C and C++ developer to working at that consulting firm, where I needed to be the search expert—something that often happens at these companies. I was a young guy, thrown in the deep end on search projects a lot, didn’t always get it right, but I learned a ton working at Open Source Connections and had a great time coming up the learning curve in search.

Back then, people built applications with Solr and Elasticsearch by just throwing data in and seeing results come back—there wasn’t much sophistication in ranking or anything. I gradually carved out a niche: whenever someone complained that the search results weren’t accurate, I’d dive in to figure out what was going on. That became its own interesting problem and a journey that I’m still on today.

Tell us about your current role and what you’re working on these days.

I’m currently a search engineer at Daydream, a fashion startup, and I also do training and consulting—so I mix it up. For me, working with a variety of companies through training and consulting is the best professional development I could have. Daydream is interesting because fashion search comes with very particular specs: jeans without rips, not stonewashed, knee-length or calf-length—It's very, very specific. At Daydream, all of this happens through a chatbot, essentially an AI stylist. We have conversations with users about their profiles and preferences, then show them products based on what they like. Integrating an LLM in a RAG system has been challenging in some ways—because it opens up far more possibilities than a normal search app—and easier in others, thanks to the LLM’s assistance.

This summer, I’m also offering some training on what I call “Cheating at Search With LLMs” - trying to rethink the search stack with LLMs instead of the traditional model of throwing queries and data into Elasticsearch, with an API around it.

What are some of the biggest misconceptions about search that you often encounter?

There are a lot of misconceptions, and they shift every couple of years. One early misconception — which I don’t think holds anymore — was, “I just throw my data into Elasticsearch and it’ll work like Google.” With the rise of RAG and vector databases, I don’t think people think this way anymore. Another big misconception is underestimating how custom each search experience needs to be. People recognize that e-commerce, enterprise, and job search are all different for example, but even within e-commerce there’s a huge gap between searching for books, matching a specific part number as a purchasing agent, or fashion search. And within fashion itself, there’s a big difference between the RAG-ish work that I’m doing and using a search bar to search for something fashion-related.

So I think people dramatically underestimate how much customization goes into crafting search relevance and ranking for their particular use case.

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

In the last year or so, local open-source LLMs have become much easier to run and far more capable—without needing a massive seven-billion-parameter model. I think we’ll move into a space where, even at query time, query understanding is handled by an LLM.

I recently wrote a blog post titled “All Search is Structured Now” because if you’re searching through an LLM, it can do what used to be a months-long project—NLP query understanding and entity extraction—automatically. Now it just works: if it sees attributes like color or material, it pulls those into a structured query, which you can then translate into an Elasticsearch query or a vector-database lookup. That, to me, is huge.

As recently as a couple of years ago, an “ML Engineer” was someone who trained models. Today, an ML engineer is someone who puts LLMs to work for classification or similar tasks as a first pass. They might train custom models later, but the default approach has shifted to: “I have an LLM—can I use it to solve the 80% problem?”

Can you share an example of a particularly challenging production issue you’ve encountered in your work with search technologies, and the process you used to resolve it?

During my time at Reddit, we were deploying the Solr Learning to Rank (LTR) plugin. We used shadow traffic—sending real requests to the cluster and discarding the responses—but Solr 7’s LTR performance was abysmal: latency spiked, garbage collection was terrible, and even a small amount of shadow LTR traffic caused the cluster to collapse. That was probably the second time I completely broke Reddit search.

It triggered a full on incident. Thankfully, we had an amazing infrastructure engineer named Chris Fournier who helped us spin up a separate cluster solely for LTR experiments so we could better investigate the issue.

So, even though I still evaluate changes offline using metrics like NDCG, I think it’s important to push features into production—behind feature flags—and test them with real user traffic. Having that dedicated cluster lets us reproduce and diagnose the issue under genuine load. So yeah, lesson learned: whenever possible, test in production.

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

Some obvious tools that come to mind are Lucene, OpenSearch, Elasticsearch, and Solr. Beyond that, I remember, 12 years ago, trying to do machine learning in Hadoop—now the full ML stack is open source, which has been phenomenal for my work with tools like PyTorch, TensorFlow, and Hugging Face. When it comes to training models, for example, there used to be a library called RankLib with academic-style code that wasn’t really maintainable long term. Today, the tooling for this kind of work is just incredible.

What is a golden tip for optimizing search performance that you’ve picked up in your years of experience?

When you’re building a search application, you need feedback—and there are different kinds. You can over-index on quantitative signals like NDCG or A/B-test metrics, gathering labels and watching your ranking score tick up or down. But if you focus too much on those aggregate numbers, you lose sight of the experience for a single query. Sure, n = 1—but that one often represents a pain point shared by many users.

Some issues might be one-offs you can safely ignore, but others reveal important patterns. As you scale to large user populations, it’s easy to become obsessed with moving an A/B metric by 0.5% and undervalue sitting down with a user to walk through the search experience and collect qualitative feedback on what feels weird or unsatisfying. That balance—between quantitative metrics and real-world user insight—is the golden tip I’ve picked up over the years.

What is the most unexpected or unconventional way you’ve seen search technologies applied?

I’ve seen search used for what I’d call a “fuzzy join” between two databases. Imagine indexing names from a US state records system and a federal one: search can help you understand that “Turnbull, Douglas” who lived in Maryland is the same person as “Doug Turnbull” who now lives in Virginia. I’ve seen this approach work pretty effectively at large scale to link data across disparate systems.

NDCG is overrated.

I actually wrote an entire blog post on this and we touched on this earlier in this conversation.

First of all, it’s very hard to get high-quality labels, and most teams aren’t great at it. Folks dive deep into perfecting metrics—whether via clickstream data or having humans label everything—but you end up obsessing over scores instead of real impact. There are other ways to think about quality: measure how much your search results shift after a change to gauge risk—you may not know if it’s good or bad risk, but at least you know the magnitude. Pair that with qualitative feedback, especially if you’re a startup. NDCG and complex rating infrastructures can be hugely overrated—like scaling out massive infrastructure too soon. Often it’s smarter to stay simple: use a few basic metrics or even simple acceptance tests (e.g., “If I search for category X, do I get items in that category?”). Those low-hanging fruits can deliver real value, especially when you’re early-stage and search isn’t your core area of expertise.

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

There’s a great podcast called How AI is Built by Nikolay Gerold. He did an entire season talking to search experts on AI and RAG topics, and I’d highly recommend it—you’ll get a variety of perspectives. There’s no single “right” way to do search; many approaches can work, and this podcast does a fantastic job showcasing that.

Anything else you want to share? Feel free to tell us about a product or project you’re working on or anything else that you think the search community will find valuable

Everyone is invited to check out my blog: www.softwaredoug.com.

And this summer I'm offering a course called “Cheat at Search With LLMs” and I really encourage people to join. I'm rethinking the search stack and I want people to watch as I either embarrass myself or we all learn something together because I think that's the best way to learn.

Pulse - Elasticsearch Operations Done Right

The world's top search experts love Pulse

Learn More

Know a search-tech guru that we should feature?

Let us know

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.