Eric Pugh - Search Tech Top Voices

The "Top Voices in Search-Tech" initiative is a carefully curated showcase of the most impactful and influential search-tech professionals from around the world that you can connect with and learn from.

About Eric

Eric Pugh is the co-founder of OpenSource Connections, where he helps clients build their own search teams and improve their search maturity through project leadership and trusted advisory. He is an active maintainer on the OpenSearch Search Relevance and Documentation projects, and recently became an OpenSearch Ambassador. Eric is dedicated to expanding the search expertise and collaboration within the OpenSearch community.

Beyond OpenSearch, Eric has been involved in the open source world as a developer, committer and speaker for the past twenty years. He is a member of the Apache Software Foundation and believes in the power of sharing knowledge as a community, from co-authoring the first book on Apache Solr to being one of the instigators of the Haystack Conference series.

OpenSource Connections’ mission to empower the world’s search teams comes directly from Eric’s belief in the open source software movement, and the importance of educating people to succeed with it, so that people own their technology.

When not thinking about search, Eric likes to get his hands dirty by building furniture. His next project is a reproduction Danish modern couch, using just hand tools!

Where to find Eric on the web:

Let’s start from the beginning — how did you get involved in the search tech industry?

I think, like a lot of people, I kind of backed into it. Search seems to attract very different personalities. You’ve got people who go the information retrieval route through school, but then there’s a whole bunch of folks who studied physics, philosophy, languages—you name it—and they just discover search along the way.

For me, it started many years ago. My company was a professional services firm, and at the time, we were working with some organizations in the intelligence community on an Army intelligence application called Pathfinder. Back then, if you were building a big application, you used Oracle databases as your backend—of course you did, it was the default choice.

This was during the global war on terror and the hunt for Osama bin Laden. The challenge they faced was that they were collecting massive amounts of reports and data from the field, and everyone spelled names differently. Even acronyms varied—OBL, UBL, and so many others. They were trying to match and cross-reference all of this data using SQL and Oracle tools. Oracle even had a product at the time—nicknamed the brain—that relied heavily on regular expressions for matching. Regex is powerful, but only up to a point.

My colleague, Scott Staltz, who I still work with today, had the idea to try this up-and-coming technology called Lucene, a library for indexing data, to see if it could make better correlations than a mountain of regex patterns. It became a showdown: Lucene vs. the Oracle product. Lucene crushed it. That was our very first search project.

Around that time, I’d already been involved in open source for a while. I had met Eric Hatcher, a big name in the Lucene world, a couple of years earlier through Apache Ant (an old build tool) after contributing some code examples to his book. After this first project, I got a bit more exposure to Lucene and kept hearing about its potential. Eventually, Eric introduced me to this brand-new project called Apache Solr. He said, “This wraps Lucene in a way that makes it easy to talk to and use for building search applications. You should check it out.”

At that time, my company was a general open-source shop. We did Drupal, some .NET, and all sorts of projects. But gradually, we started taking on more and more search work. That’s when another important connection came along: David Smiley, who I’d met on a different job. David got a book contract to write the very first book on Apache Solr. His wife was expecting their first child, and she said, “You need to finish this book before the baby arrives!” Originally, I was going to write a short contribution to the book. Instead, I got promoted to co-author to help David finish it on time. That changed everything. Co-authoring the very first Solr book opened a ton of doors.

I’ll never forget going to one of the earliest Lucene Revolution conferences, in Boston. Walking into that event felt surreal—everyone knew my name because of the book. It was the first time in my career I felt like a rock star. And it was at that moment I realized something: the market was speaking to us. On the plane ride home, I had my own Jerry Maguire moment. I wrote a manifesto to my two co-founders, Scott Staltz and Jason Hall, saying: “We shouldn’t be a general open-source company anymore. The market wants us to be a search company.”

That was our big pivot. We said goodbye to Drupal and other general development work and focused entirely on search. It wasn’t part of some carefully calculated strategy—it was simply following where the work was, what the community wanted, and, honestly, being a little bit lucky by meeting the right people at the right time.

So that’s how I ended up here. Totally unplanned, but I’ve been in search ever since.

Tell us about your current role and what you’re working on these days.

I’m one of the co-founders of Open Source Connections. For a while, I led the company as CEO, but over time I realized what I really love is working directly with clients and their teams. Someone once described consulting as “professional ADD,” and that resonated with me. I love the constant variety—each new project brings new challenges, and I get to reuse what I’ve learned before in fresh ways.

For the past couple of years, I’ve been focused on consulting, working closely with some of our largest clients on transformational projects. Much of my recent work has been centered around evaluations and discovery—helping organizations understand where they stand today and where they need to go to improve their search systems. Over the last 18 months, a big part of my time has been spent collaborating with the OpenSearch community, particularly with Stavros McCrackus, the product manager for search at AWS. Together, we’ve been focused on one of the most critical, yet historically overlooked, aspects of search: evaluation.

In search, there are two sides to think about: cost and value. Cost is about infrastructure—how many servers you need, replication strategies, uptime, and now vector storage. I’m really focused on the value side, which is about outcomes: are we accomplishing the goals of our users and businesses? In e-commerce, that might mean helping customers find and buy products. In a marketplace, it could be enabling users to sell their unwanted goods. In enterprise search, it’s about helping employees find the right information to get their work done, like finding and submitting a form to request a new laptop. Without evaluation, you can’t measure whether your search system is actually delivering that value.

What I’ve always found odd is that search engines have never treated evaluation as a first-class feature. It’s always been something bolted on, an afterthought. No open-source search engine has ever led with evaluation as its defining capability. They tend to focus on scale and infrastructure—the cost side—without making it easy to track and improve value. This gap is what we’ve been working to address.

We started with an initiative called User Behavior Insights (UBI). Evaluation depends on data, so UBI provides a standardized way to capture what happens after a user performs a search. Did they click the first result or the third? Did they filter or refine their query? Did they leave without interacting? Most importantly, did they reach a conversion event, like checking out in an e-commerce site, downloading a document, or completing a task in an enterprise system? By tying searches to these real-world outcomes, we can finally measure success in a meaningful way.

In 2025, our efforts took a huge leap forward with the release of the Search Relevance Workbench, built directly into OpenSearch. This tool takes the data gathered through UBI and allows teams to evaluate specific search configurations and algorithms right inside the platform. It supports classic offline evaluation techniques—query sets, relevance judgments, and information retrieval metrics like NDCG and precision—but it’s all integrated into OpenSearch itself.

For the first time, search teams don’t need external tools or custom-built systems to measure search quality. Evaluation becomes a seamless part of the workflow, enabling organizations to focus not just on infrastructure costs, but on delivering real value to their users and businesses.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

I think one of my biggest failures was not embracing the power of machine learning early enough and fast enough. I confess — I took statistics twice in college, maybe I just had a good time in college.

When machine learning came along, it was really the key to unlocking the power of all the data that we had been pouring into our search engines. Back in the “big data” era, the focus was on scale — getting massive amounts of data into the search engine. That was the space where I felt very comfortable. But machine learning was a whole new domain. It required more math, more scientific thinking, and was less about pure engineering and ops. Because of that, I didn’t embrace it when it first appeared, and I’ve always felt like I’ve been trailing in that space ever since.

I absolutely recognize the power and value of machine learning today, but I don’t think I’ve contributed as much to how we use it in search as I could have if I had leaned in earlier. And honestly, I’m still frustrated by how poorly our search engines integrate with machine learning workflows. Data scientists want to work in Python. If they need to scale, they turn to PySpark. Yet search engines don’t fit naturally into those environments.

It should be easy. A search result set is essentially a data frame, right? But that’s not how search engines are built today. Data scientists often end up pulling data out of the search engine entirely just so they can process it using the tools they’re comfortable with. That’s a missed opportunity. There are so many powerful capabilities in search engines that could make their jobs easier — but we never met them where they are.

If I had embraced machine learning earlier, maybe I could have pushed harder for search engines to grab more of the machine learning mindshare. Instead, today, most data scientists treat search as just another data source to extract from, rather than a powerful partner in solving their problems. It’s a lesson for me — adapt early and bridge the gap between communities before those worlds drift too far apart.

What are some of the biggest misconceptions about search that you often encounter?

I think it’s fascinating how search is everywhere. Pretty much any application you look at — if you peel back the layers — there’s search at its core. You see all these buzzy AI startups today, or before that it was machine learning startups, doing really cool predictive stuff. But when you dig deep into what they’ve built, you almost always find a search engine underneath it all.

Search is so foundational now that I think it should almost be treated like a computing utility, not just its own niche domain. Any interaction you have on the internet likely has a search engine somewhere behind it: when you type into an autocomplete field, when you search for a file on your laptop, when you navigate almost any website — there’s search powering those experiences.

And yet, most people don’t realize this. Their entire concept of “search” is usually just Google. They don’t think about all the invisible places where search engines are working behind the scenes to connect them with the right information. It’s everywhere — just hidden in plain sight.

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

I like to think of AI as our biggest “frenemy.” A frenemy is both a friend and a rival — and I believe AI can play either role.

On one hand, I’ve seen enterprise search teams struggle over the last few years because all the innovation budget has been pulled away from search and poured into AI initiatives. There’s even a joke: if you want your project funded, just call it “AI.” Ironically, though, what’s the first thing most of these brand-new AI teams build? A retrieval augmented generation (RAG) system. And what’s at the heart of RAG? Search.

So these AI teams often end up reinventing the wheel — tackling challenges like evaluation, messy content, and ranking in uncertain environments — the same problems the search world has been solving for over 40 years. It’s fascinating to watch them rediscover the fundamentals of information retrieval while believing they’re doing something entirely new. In reality, their “cutting-edge” work is often just the next generation of search applications.

This is why I say AI is both a friend and a rival. The friend part is that we’re now at a point where we can deliver amazingly powerful results — systems that don’t just return documents, but actually make decisions and drive real actions. Imagine a job platform that doesn’t just help you search for your next great role, but recommends the perfect opportunity proactively. We’re not fully there yet, but we’re heading in that direction.

But there’s also a dark side. These same powerful systems can go horribly wrong. My favorite example is from a grocery chain in New Zealand that built a RAG-based recipe recommender using customer purchase data. It ended up recommending a “light, odorless cocktail” — which turned out to be chlorine gas. So while these systems can be transformational, they also require rigorous oversight and evaluation to prevent dangerous or embarrassing mistakes.

And that’s where I see the biggest overlap between the AI and search worlds. In order to succeed, AI teams are now adopting our language and methods — classic search metrics like precision, recall, NDCG, and judgments. These concepts weren’t historically part of the machine learning world, where the focus was on labeled datasets and training/testing splits. Now, as AI systems grow more complex and more intertwined with search, they’re embracing these information retrieval fundamentals.

Ultimately, that’s exciting to me. It means that as AI evolves, evaluation will become even more central, not just for search relevance, but for the broader impact these systems have on users and businesses.

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

One open-source project that’s always intrigued me is Vespa. It’s been around as long as Lucene and was originally incubated at Yahoo. Vespa took a very ambitious approach from the start — not just positioning itself as a search index, but as a data processing and serving engine. It was one of the first platforms to make tensors and other early machine learning/AI concepts first-class citizens, and it also built in feedback loops for gathering data right inside the product. While I don’t know if Vespa itself will be the “product of the future,” I see it as a strong pointer to where the future might go, which makes it fascinating to watch.

Another one that stands out is Algolia, but for a completely different reason. Algolia’s success is a powerful reminder that ease of use is a value all its own. Too often, search lives in this sort of “priesthood” of experts with deep tribal knowledge — people who know the arcane syntax and secret tricks for query tuning and indexing strategies. Algolia flipped that on its head by making search radically simple to set up and manage. Their growth really drives home the point that search needs to be more accessible if we want it to evolve and reach more users.

Then there’s Tantivy, which I love because it’s like a wake-up call to Lucene. Tantivy is built in Rust, while Lucene is built in Java, and they share many similarities in how they’re structured. What excites me is that Tantivy has pushed innovation in ways that highlight Java’s limitations — especially around things like leveraging GPUs, handling massive amounts of RAM, and taking advantage of new hardware like SSDs. It’s also been cool to see Java itself innovate recently, with JDK updates now coming out every six months. Honestly, I never thought I’d be complimenting Oracle, but they’ve been surprisingly good stewards of the Java language. That’s important because Lucene’s future is tied to Java’s evolution, and it needs to keep up to compete with high-performance Rust-based engines like Tantivy or any new projects that emerge from completely different computing paradigms.

Lastly, I’ve been intrigued by Postgres. It’s the classic “does-everything” database, and I sometimes joke that if you stick with databases long enough, you’ll eventually end up back at Postgres. It now supports JSON structures, vectors, and of course, traditional relational data. There are also newer efforts like ParadeDB, which bring TF-based search capabilities directly into Postgres. That’s important because it’s a reminder to dedicated search engines: if they want to justify the operational overhead, ETL pipelines, and complexity they introduce, they need to deliver significantly more value than what developers can get “for free” by just embedding search into their primary data store.

What is the most unexpected or unconventional way you’ve seen search technologies applied?

One of the most unexpected ways I’ve seen search applied was when we turned a search engine into… a message queue.

I was working with a client in the intelligence community where we had an extremely restricted software environment. Getting approval for any new tool was a massive headache — layers of vetting, security reviews, bureaucracy. But one thing we did already have approved was Apache Solr.

We had all these data files that needed to be processed. There were scripts to handle the processing and ways to distribute them, but we didn’t have a way to track which files were “in progress” and which were done. So, we got creative: we used Solr itself as a queueing system.

Each file was stored as a document, and we used a field in Solr to track its status. Our scripts would check Solr, grab a file marked as “available,” update its status to “in progress,” and move on. It worked beautifully at the scale we needed. We even managed the order and flow by tracking our own data structures. It was kind of hacky, but it was surprisingly effective.

The second oddball use case I’ve been part of? Using a distributed search engine as a distributed blob store. We had to store and access large amounts of data, and with Solr’s replication and distribution, it worked better than you’d expect. If a node went down, no big deal — the data was already replicated elsewhere.

So yeah, I’ve used a search engine to power a queue and even to store blobs, which I’m pretty sure isn’t what the Solr creators had in mind. But when you have limited tools, you get creative — and sometimes search really can do it all.

I think my hot take is this: it’s incredibly difficult to sell a search engine. It’s very hard to build a sustainable commercial business around a generic search platform unless you’re solving a very specific problem for a well-defined audience.

When I look back at the history of search, there have been plenty of players that looked promising for a while—Inktomi, Autonomy, AltaVista, Fast, and others. But very few of them have had lasting commercial success. The companies that have thrived tend to be those that wrapped search in a solution tailored to a particular domain.

For example, Elastic has been hugely successful, but a big part of that success comes from its focus on observability as a core use case. Similarly, Algolia succeeded by building an ultra-easy, super-scalable solution targeted at developers who want to embed search into their applications without the complexity. In contrast, “search as a service” or “search as a platform” on its own is an extremely tough market. Open source has been tremendously disruptive here, essentially gutting the space for pure commercial vendors. I don’t see that dynamic changing anytime soon.

If you want to make a serious go of it in search, you need an angle. You need to take the powerful core concept of search and wrap it with something else—whether that’s a killer use case, a simplified experience, or a set of features that directly solve a pressing business problem. Without that, it’s very difficult to stand out or to build a truly sustainable business.

What is a golden tip for optimizing search performance or a piece of career advice that you would give your younger self?

I think the advice I’d want to give to my younger self — though I also know my younger self would probably never have taken it — would be to really focus on understanding the math and lower-level data structures at a much deeper, more intuitive level.

As I’ve moved more into metrics and evaluation work later in my career, I find myself constantly relearning concepts on the fly, like how to calculate things such as NDCG. Because I never built a solid foundational math background in information retrieval, I’m often working things out in the moment, rather than having that deep, instinctive understanding to draw on. It makes it harder to fully grasp why certain algorithms behave the way they do, or to deeply understand why specific evaluation metrics mean what they mean.

Instead of figuring these things out for myself, I often need someone else to explain, “This is what this metric means, Eric,” and then it clicks. But I don’t get there on my own. If I had built that foundation earlier, it would have made me a much stronger search practitioner and allowed me to engage with the field at a deeper, more technical level.

That said, knowing my younger self — my strengths, weaknesses, and interests — I also know I probably wouldn’t have taken that advice at the time. So while it’s the advice I’d give, I’m realistic enough to admit it probably wouldn’t have changed my path all that much!

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Absolutely! I totally can, because I almost wore the t-shirt today. So, one of my good friends, Nick Zandraznney, runs Bonsai, which is a hosted Solr, OpenSearch, and Elasticsearch company. A couple of years ago, they created both a t-shirt (one of my absolute favorites) and a companion website called https://storyofsearch.com/

They only ever wrote Chapter One, titled Humans and Books, but it’s fantastic. It’s a wonderfully simple, interactive walkthrough that explains what TF-IDF is and breaks down the core concepts of search in a really engaging way. It only takes about three minutes to go through, with clear visualizations that help you understand the fundamentals: what an index is, what tokens are, and how term frequency works.

Given that I’ve admitted I never really mastered the deep math behind information retrieval, this is exactly the kind of approachable, visual explanation that makes it all click. It’s the resource I always point people to when they want to start understanding search concepts in a fun, accessible way. If you’re curious about search and want to grasp the basics quickly, storyofsearch.com is a hidden gem.

Twice a year, we run the Haystack Conference, which is a conference by practitioners, for practitioners. It’s one of my favorite events because it brings together people who are deep in the weeds of search, relevance, and now AI-powered search systems.

I’m especially excited about the upcoming EU edition, happening September 23rd–25th in Berlin. What’s great about Haystack is not just the live event but also the incredible archive we’ve built over the years.

If you go to haystackconf.com and click on “Past Talks,” you’ll find seven years’ worth of talks from search professionals around the world. These cover everything from search relevance and machine learning to RAG systems, vector search, and the latest AI advancements. It’s an incredible free resource for anyone in the search community, whether you’re looking for inspiration, practical techniques, or lessons learned from real-world projects.

The conference itself is always a great opportunity to connect with like-minded practitioners, share successes and failures, and see what others are building. If you can’t make it to Berlin this time, definitely check out the talks online — it’s like a living history of the evolution of search.

Top Voices in Search Tech: Eric Pugh

About Eric

Let’s start from the beginning — how did you get involved in the search tech industry?

Tell us about your current role and what you’re working on these days.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

What are some of the biggest misconceptions about search that you often encounter?

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

What is the most unexpected or unconventional way you’ve seen search technologies applied?

What is a golden tip for optimizing search performance or a piece of career advice that you would give your younger self?

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Know a search-tech guru that we should feature?

Top Voices in Search Tech: Eric Pugh

About Eric

Let’s start from the beginning — how did you get involved in the search tech industry?

Tell us about your current role and what you’re working on these days.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

What are some of the biggest misconceptions about search that you often encounter?

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

What is the most unexpected or unconventional way you’ve seen search technologies applied?

Give us a spicy take/controversial opinion on something related to Search

What is a golden tip for optimizing search performance or a piece of career advice that you would give your younger self?

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Anything else you want to share? Feel free to tell us about a product or project you’re working on, or anything else that you think the search community will find valuable

Know a search-tech guru that we should feature?