David Tippett - Search Tech Top Voices

The "Top Voices in Search-Tech" initiative is a carefully curated showcase of the most impactful and influential search-tech professionals from around the world that you can connect with and learn from.

About David

David Tippett builds search and data systems, drawing on a decade of experience in data engineering, DevOps, and software engineering. As the former lead developer advocate for OpenSearch at AWS, he played a key role in growing its open-source community. Now at GitHub, he works across all aspects of search, from infrastructure to relevance.

Where to find David on the web:

Let’s start from the beginning — how did you get involved in the search tech industry?

I actually wanted to get involved in SearchTech while I was in college—the urge was there. But for some reason, I just couldn’t figure out what technology people were using to build search applications. I kept wondering, “What does Google do? How does Google do that?” And, of course, Google doesn’t exactly reveal how they do search.

It wasn’t until a couple of years after college that I discovered Elasticsearch while I was at one of my companies. I thought, “Hey, let’s see if there’s a better way for us to do this.” We were doing some people matching—someone would write their name, and then we’d match them in our database—and it was awful. It was basically stored procedures, and it was terrible. That’s when I realized, “Oh, this is a use case for search.”

I didn’t end up implementing it, which really bummed me out. But from there, I started learning more about Elasticsearch and how it worked. I did several different roles—Python, DevOps, data engineering—so I saw Elasticsearch and later Open Distro coming up over and over in the observability space. That’s really how I got my foot in the door with search.

I wasn’t actually hired for my first search job because I knew search. I was hired because I could write well, make videos, and I was really familiar with observability. That first real job in the search industry for me was as a senior developer advocate for OpenSearch. Until then, I hadn’t truly been doing search work, but then I landed this job with OpenSearch. I think they brought me in because of my observability experience, but for me, it was perfect—I wanted to do search.

I started talking to a lot of different product managers, asking, “How do I learn more about search? Where do I go? What do I do?” That’s when they introduced me to the Haystack conference, where search engineers dive into really deep search tech. The first time I went, it was totally overwhelming, but I got plugged in and kept up with the community. And that’s really what got the ball rolling with search for me.

Tell us about your current role and what you’re working on these days.

My current role is a search engineer for GitHub, and it’s funny — when I was interviewing, I expected it to be way more search-focused. But once I got here, I realized it’s much more about building a platform that’s super scalable, robust, and flexible enough to handle our varied traffic and workflows. A lot of GitHub’s search traffic comes from different directions: people scraping our APIs to get data out, or product managers, engineers, and security researchers each using search in their own ways. So lately, I’ve been making sure our platform is stable and performs how we want it to. Eventually, I want to move into building a better overall search experience for GitHub and figuring out what that looks like.

As for the tech stack, we run Elasticsearch for what I call “tech search,” meaning anywhere you have text content—issues, PRs, that kind of thing. Then we have our own in-house code search engine, which is basically a custom solution that lets us parse code and search it effectively at scale. The biggest challenge is handling the constant flow of new commits without storing a completely new copy of a repository every time. The team behind it pulled off some real wizardry to index just the changed elements, and it’s really fascinating, deep engineering.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

I think my favorite failure was back when I was working at OpenSearch (and I’m going to pick on them a bit here because they did some bad UX stuff). This was around the time vector search was just coming out and everyone was still figuring it out. I was ingesting documents into OpenSearch and embedding them as vectors, but I couldn’t get the results I expected. I dug and dug, thinking it was some super technical issue.

Then, on a call with one of our engineers, I realized I was trying to put documents that were really long into a single vector. After 512 words, it was just cutting off and ignoring the rest of my document. I was mortified because I’m the guy teaching people how to do vector search, and here I was missing something so basic. It was a great reminder that everybody makes mistakes—even the so-called “experts.” So the lesson is - always question everything.

What are some of the biggest misconceptions about search that you often encounter?

I think people assume search just works. But search isn’t a database, and a lot of people treat it like one. They say, “Oh, I’ve written a query. I want to find documents that match.” But it’s not that simple—there’s so much more going on behind the scenes.

For example, the recency of issues can be really important if you want good results. You might need to boost documents that are more recent or more popular. There’s a lot of work involved in figuring out what matters to a particular user and then tuning the search toward that. But search isn’t that advanced yet. Ours is still pretty simple because doing this well at scale is really hard.

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Like most new innovations, AI speeds things up. For instance, I can share the terms people need to know, and they can pick them up so much faster. I think that accelerated learning aspect will be one of the biggest changes we’ll see.

But I’m pretty pessimistic about AI actually making search better. Everyone thinks vector search, semantic search, or AI will magically improve their results, and I don’t believe that’s true. In fact, I expect things to get worse before people realize you can’t just throw AI at search without doing some real data engineering. You have to understand what’s important to your users—relevance, recency, specific factors they care about. Most companies aren’t doing that right now.

So they’ll try to incorporate it, and maybe they’ll see a little improvement at first. But a lot of people will still miss the bigger picture and have to adjust.

There are plenty to choose from. But one in particular that we still struggle with at GitHub is badly written queries. A few poorly formed queries can tank the entire cluster, and it’s really hard to identify them in advance. What does “bad” even mean in this context? How do you know before it happens?

As an industry, search doesn’t really have a solid solution for this yet. If you’re dealing with 10,000 or 50,000 queries per second, and just two or three problematic ones can bring down the cluster, how do you find them? That’s a huge challenge.

So far, we’ve added a lot of observability, rate limiting, and additional criteria to lessen the impact on our clusters. But I still don’t have a definitive answer to this issue.

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

I’m going to name two. The first is Haystack by Deepset. It was the first Python library I used to get into vector search, and I really appreciate the engineering they’ve put into it. Haystack is fantastic.

The second one—coming out of Open Source Connections, and somewhat related to OpenSearch—is the one I’m most excited about: User Behavior Insights (UBI). It’s basically an open-source schema that lets people share their insights. Once you’re collecting this data, you can use it for things like learning, without having to build an entire engineering organization to gather it all. They’re creating standards and standard tools around it, which I think is really exciting.

Is there a log error/alert that terrifies/annoys you in particular?

I think it comes back to that production area that I was talking about earlier. CPU utilization high. I see CPU utilization high on my cluster, and I'm just like, not this again.

I have to dive in and spend hours trying to figure out which query it was this time that took down the cluster. So while it's not exactly an error, it's just one of those ones that's very hard to track.

What is a golden tip for optimizing search performance that you’ve picked up in your years of experience?

Go back to the basics.

I think a lot of people don’t realize there’s already so much information out there on optimizing queries and other advanced tuning, but it almost always comes down to whether you’re following established best practices. For example, consider memory usage in OpenSearch or Elasticsearch: there’s a recommended percentage of total system memory you should allocate to the JVM—maybe 30% or 50%—or it might be a guideline around not exceeding 30–50GB for the heap. I’m not exactly sure of the current numbers, but the point is, if you step back and make sure all your fundamentals are in order, you’ll likely see far more performance gains than you’d expect.

What is the most unexpected or unconventional way you’ve seen search technologies applied?

There’s a great use case I saw where someone was doing data analysis in OpenSearch by creating a snapshot of the index, then using a tool to send that snapshot to customers. The customers could load it up and view the data because, oddly enough, many of them didn’t have access to Excel. Since OpenSearch is free and open source, they built an entire Excel-like workflow around it, just shipping indexes back and forth. I found it hilarious but also ingenious.

Vector search isn’t that good.

Everyone’s amazed and dazzled by it right now, but from my tests and many others, you can’t survive without keyword search too. Vector search is phenomenally expensive and really tough to scale. At GitHub’s size, we’re still not there yet.

So here’s my hot take: fix your keyword search first before you jump into vectors. Vectors won’t necessarily make things better—they’ll just make it weirder and harder to explain why you got the results you did.

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

I’d recommend checking out Doug Turnbull’s blog to start. He’s everywhere, and he’s fantastic—his books are great, too.

A second recommendation I have is connected to my ‘hot take’: to be a better search engineer, you need to be much better at data systems than most people realize. In that context, there’s a book called Designing Data-Intensive Applications by Martin Kleppmann . It's massive, and I don’t recommend it lightly, but it’s by far the best book I’ve ever read, whether you’re a search engineer or just a regular engineer. It covers so many fundamentals you need to build these kinds of systems.

I want to emphasize how important it is to seek out experts. I spent so much time trying to figure out search on my own, and if I’d just looked around for where people congregate II could have gotten into search engineering years earlier.

It’s funny, I work at GitHub now, and I only have three years of search experience. That’s partly because there are so few people in the search industry who truly know what they’re doing—it’s a tough field to break into. So definitely find those people so you can connect with them and learn from them.

Top Voices in Search Tech: David Tippett

About David

Let’s start from the beginning — how did you get involved in the search tech industry?

Tell us about your current role and what you’re working on these days.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

What are some of the biggest misconceptions about search that you often encounter?

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

Is there a log error/alert that terrifies/annoys you in particular?

What is a golden tip for optimizing search performance that you’ve picked up in your years of experience?

What is the most unexpected or unconventional way you’ve seen search technologies applied?

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Know a search-tech guru that we should feature?

Top Voices in Search Tech: David Tippett

About David

Let’s start from the beginning — how did you get involved in the search tech industry?

Tell us about your current role and what you’re working on these days.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

What are some of the biggest misconceptions about search that you often encounter?

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Can you share an example of a particularly challenging production issue you’ve encountered in your work with search technologies, and the process you used to resolve it?

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

Is there a log error/alert that terrifies/annoys you in particular?

What is a golden tip for optimizing search performance that you’ve picked up in your years of experience?

What is the most unexpected or unconventional way you’ve seen search technologies applied?

Give us a spicy take/controversial opinion on something related to Search

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Anything else you want to share? Feel free to tell us about a product or project you’re working on or anything else that you think the search community will find valuable

Know a search-tech guru that we should feature?