Alessandro Benedetti - Search Tech Top Voices

The "Top Voices in Search-Tech" initiative is a carefully curated showcase of the most impactful and influential search-tech professionals from around the world that you can connect with and learn from.

About Alessandro

Alessandro Benedetti is an Apache Lucene/Solr committer and and Solr chair of the PMC. Director and R&D Software Engineer at Sease Ltd.

His focus is on R&D in Information Retrieval, Information Extraction, Natural Language Processing, and Machine Learning. He firmly believes in Open Source as a way to build a bridge between Academia and Industry and facilitate the progress of applied research. Experience with a great variety of clients has taught him to be a proficient and professional consultant.

Alessandro is a passionate R&D software engineer at heart, continuously applying the latest trends in Information Retrieval and AI/Machine Learning to solve interesting search problems. He’s been studying Machine Learning applications such as Learning To Rank and Natural Language Techniques for years and more recently he’s been exploring Generative AI technologies and approaches such as Large Language Models and Retrieval Augmented Generation to improve the search user experience. He’s curious and actively coding and experimenting every week.

When he isn't on clients' projects, he is actively contributing to the open-source community and presenting the applications of leading-edge techniques in real-world scenarios at meet-ups and conferences such as ECIR, Search Solutions, the Lucene/Solr Revolution, Community Over Code (Ex ApacheCon), Haystack, FOSDEM, Berlin Buzzword, and Open Source Summit

Where to find Alessandro on the web:

Let’s start from the beginning — how did you get involved in the search tech industry?

I actually started back in university. My bachelor’s thesis in 2007 was on information retrieval, and for my master’s thesis, I decided to dive deeper into the field. I developed and studied a system that crawled the web, extracted information, recognized structures and entities across web pages, and allowed users to search across those entities. At the time, I was experimenting with what we might now call early machine learning techniques. That was really the starting point of my career in search.

From 2007 to 2009, I focused on that project. After finishing my degree, I stayed at the university for about seven months, continuing to work on it, and then I transitioned into the industry.

My first role was at a consulting company focused on open-source technologies. Around 2010, I began working with Apache Lucene and Apache Solr—my first exposure to industry-grade, open-source search software. It was love at first sight! Initially, my work involved reading the code to understand it and customizing setups for clients across Italy who were using Lucene and Solr. While supporting customers, I was also learning and becoming more and more passionate about the technology. That’s when I started contributing to the projects as well.

Over the years, as I moved through different roles and companies, my contributions and engagement with the community grew. In 2013, I moved to London, and I’ve never left the field of search since then. I kept improving my knowledge, participating in mailing lists, collaborating with other contributors, and strengthening my involvement with the community of committers.

Fast forward to around 2020, I became an Apache Lucene and Solr committer, which was a major milestone for me. It marked the culmination of many years of hands-on experience, contributions, and dedication to the search ecosystem.

Tell us about your current role and what you’re working on these days.

Since 2016, I’ve been directing a company called Sease that specializes in open-source search and machine learning integrations. As a director, I wear many hats. I supervise my team, but I’m also still very hands-on.

On the client side, I personally take part in consulting engagements, supervise projects, and work directly with companies to customize and develop search solutions. This can include building plugins, coding contributions, providing audits, and supporting organizations in better using open-source search technologies. Many of these are long-term partnerships, where we help companies with their entire information retrieval mission—starting from technical design all the way through to full implementation.

A big part of our work is pure open source. We reinvest the money we earn from clients (and sometimes from sponsors) into open-source contributions. This could mean writing code for Apache Solr, supporting the community mailing lists, improving documentation, or even creating entirely new projects related to Lucene, Solr, Elasticsearch, or OpenSearch under the umbrella of our company. This work is purely voluntary and driven by our passion for the community.

We also do research, publishing papers at academic conferences, and we actively evangelize within the industry. We speak at conferences to share what we’ve built, exchange ideas with others, and help advance the field of search.

So, in short, my role spans everything from software engineer to director, community advocate, and researcher. My goal is to bridge cutting-edge search technology with open collaboration and real-world impact.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

Probably the biggest failure I’ve experienced happened around 2015, before I had started my current company. At the time, I was a search team leader at another organization, and we were tasked with designing and developing our first integration of machine learning into search ranking—what we now call learning to rank.

The main problem was misalignment between the business and the technical team. The search team and the business had very different expectations. From the team’s perspective, we were doing groundbreaking R&D, experimenting with a cutting-edge technology. But the business expected clear, measurable improvements in performance and, ultimately, profits.

We spent about three months working on the project, building an initial integration. Technically, the team was proud of what we achieved—we’d been early adopters of a brand-new approach. But when we delivered the first iteration, the business didn’t see the impact they were expecting. For them, it wasn’t meaningful, and ultimately, the project was shut down after that first phase.

It was painful on a personal level. Some people even had to leave the company as a result of the project being dismissed, which felt like a big personal failure for me as a leader.

Looking back, the failure wasn’t about the technology. It was about communication and expectation-setting.

We should have spent much more time upfront with the business stakeholders, clearly explaining the risks and uncertainties of being early adopters. Machine learning was still new to search at that time, and there were no guarantees that three months of work would deliver a significant, immediate bump in profit or search quality.

Since then, I’ve become very careful when leading ML projects. I never assume that others fully understand the complexities involved. Even if it feels repetitive, I make sure to explain again and again that machine learning is not a silver bullet. Different layers of a company—technical teams, leadership, and business stakeholders—often have very different levels of understanding.

By setting expectations clearly from the start, you avoid painful surprises later on. That failure taught me to bridge those gaps, and it has shaped how I approach every search and machine learning project since.

What are some of the biggest misconceptions about search that you often encounter?

The biggest misconception I see when working in the search world is the belief that search is simple.

As users, we’re so accustomed to the seamless experience provided by large web search engines like Google or Bing. Nowadays, many people even use language models like ChatGPT to perform search tasks. The problem is that this polished experience hides the enormous complexity behind the scenes—thousands of hours of work by highly skilled specialists, working on every detail you never see.

From the user’s perspective, search feels effortless: you type something, and the right results appear. So when organizations decide to build a custom search engine for their own data, they often approach it with the same mindset: “How hard can it be? We’ll just grab some open-source software, plug it in, and we’re done.”

While open-source tools like Apache Solr, Elasticsearch, or OpenSearch are incredibly powerful and flexible, that flexibility comes at a cost. To truly leverage these systems—customizing them, extending their capabilities, and tailoring them to your specific use case—requires deep expertise and significant effort.

Most people underestimate this because they only ever see the finished product as users. Our brains tend to skip over the complexity when we don’t need to know how it works. But when you’re the one building the system, that hidden complexity suddenly becomes your problem.

In short: search is not simple—and achieving Google-level quality takes far more work than most people expect.

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

This topic is very close to my heart, not only because of my career but also because I’ve written a book on this subject, which will be released in October.

I believe large language models (LLMs) represent both the present and the future of search. They won’t replace search engines outright, but they will play a crucial role in augmenting and enhancing them. These models are powerful tools for generating text, and in the context of search, this ability to generate text unlocks entirely new possibilities. They can be used to produce answers to questions, generate summaries, or enrich the underlying data that powers search systems.

Traditionally, there were many things search practitioners wanted to do but struggled to implement. For instance, pre-processing documents to add more context or rephrasing queries to better capture user intent were challenging tasks. LLMs now make these processes far more achievable. They also open the door to improving relevance judgments, which have long been a difficult problem in search. In the past, teams debated whether to rely on business experts for ratings or to infer them from click data. Today, we have a third option: using language models to evaluate documents and queries, providing valuable data to fine-tune ranking algorithms.

Another exciting application is retrieval-augmented generation (RAG). In this approach, a search engine retrieves relevant documents, and then the LLM uses that information to generate a more natural, comprehensive answer for the user. From the user’s perspective, this creates a smarter, more intuitive search experience. Behind the scenes, it’s a seamless collaboration between traditional search technology and advanced language models.

Over the next two to five years, I see LLMs becoming deeply integrated into the search stack. They will act as a companion to modern search engines—pre-processing and enriching data, post-processing and refining queries, and bridging the gap between structured search and natural language understanding. Much of this will be invisible to the end user, who will simply experience better, more relevant results.

In short, LLMs won’t replace search engines. Instead, they will quietly empower them, enabling features that were once incredibly difficult—or even impossible—to achieve.

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

In general, I’d say the most impactful group of open-source technologies that have helped me are libraries. There are countless libraries that solve different aspects of natural language search. For example, spaCy is a great tool for NLP tasks, and there are many other specialized libraries designed to address very specific problems. I won’t list them all because there are so many, but these libraries have had a huge influence on my work, as they provide the building blocks for solving complex search challenges.

On a more foundational level, classic open-source technologies like Spring have also been incredibly important. They make it easier to build web applications, REST APIs, and other essential components. While they may seem like “boilerplate” tools, they significantly simplify and accelerate development.

Beyond libraries, there are also open-source tools focused specifically on search quality evaluation. For example, the team at OpenSource Connections has contributed a tool called Quepid. It’s a front-end application that allows users to rate search results for specific queries, helping teams measure and improve search quality. We’ve used Quepid extensively across different projects and have even contributed to it ourselves. It has also inspired ideas for improving other tools and projects we’ve developed, such as our own relevance evaluation tool, Rated Rank Evaluator.

Of course, there are thousands of open-source libraries and tools out there that have influenced my work in some way. But if I had to highlight just a few, spaCy, Spring, and Quepid stand out as technologies that have had a particularly meaningful impact on the search projects I’ve been involved with.

What is a golden tip for optimizing search performance or a piece of career advice that you would give your younger self?

I think I’ve been quite lucky in my career. While I never had formal mentors, I’ve always had people around me who were deeply involved in the open-source community. For example, in my first job, I had the privilege of working with Tommaso, who was an Apache Lucene and Solr committer. Later, I had the chance to work with other incredibly talented individuals who were fundamental to my growth.

Because of this, I don’t feel like there’s one revolutionary piece of advice that would have completely changed my path if I’d heard it earlier. Instead, what I’d like to share is a message for younger people who are just entering the field.

Focus on your passion. Work on what you truly enjoy and let that guide you. Engage as much as possible with people—because people are the key to growth. Meeting the right people, exchanging ideas, and having open conversations will help you learn and expand your perspective.

Be open to critique and to code reviews, and never take feedback personally. Don’t be overly protective of your code—share it, explore other people’s work, and contribute without hesitation. In the open-source world, people are generally very welcoming to newcomers. Most are happy to help new developers and experts grow, so there’s no need to be shy.

The more you engage with others, the more you’ll learn—not just about technology, but about what you truly enjoy. Don’t focus too much on external motivators like job titles, higher salaries, or jumping companies just for financial reasons. Instead, prioritize spending your time in a meaningful way: learning as much as you can, contributing to the community, and building strong relationships.

In the end, growth as a person, as a software engineer, and especially in the open-source world, comes from passion, curiosity, and meaningful connections with others.

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Absolutely. One of the most important books I’ve ever read in my career isn’t a technical book at all—it’s How to Win Friends and Influence People by Dale Carnegie.

At first glance, it might seem unrelated to search or technology, but it completely transformed the way I communicate and build relationships, both professionally and personally. The book dives deep into the fundamentals of human interaction—how to engage with people, how to frame conversations thoughtfully, and even how to write better emails. It teaches you to step back and think from the other person’s perspective, avoiding impulsive or emotional reactions that can derail communication.

Carnegie uses hundreds of real-life examples—historic business deals, personal relationships, and everyday situations—to illustrate his points. For me, this was eye-opening. It helped me recognize and correct classic mistakes I didn’t even realize I was making.

Reading this book made me more open to critique and far less attached to “the way I’ve always done things.” It encouraged me to continuously evolve and improve, especially in how I engage with my team, my peers, and the wider community.

While it’s not a technical manual, I’d consider it essential reading for anyone in tech. After all, success in our field isn’t just about writing great code—it’s about collaborating effectively, building trust, and influencing others in positive ways. This book gave me the tools to do exactly that, and I still return to its lessons even today.

Yes, absolutely—I have a very strong opinion about how the open source model is being handled today. I’ve been part of the open source world for many years, and I deeply believe in it. I see it as the best way to build technology that truly benefits everyone: a way for people across the world to collaborate, improve systems, and create tools that move the entire industry forward.

But at the same time, I’ve grown increasingly frustrated by misinformation and misuse of the term “open source.”

Take open source language models, for example. In recent years, as large proprietary models like OpenAI’s GPT have dominated the space, many companies have rushed to brand their products as “open source” to position themselves as the “good guys.”

Here’s the problem: in many cases, these so-called “open source” models are not really open at all.

Companies release only the final trained model, but they don’t share the training data, algorithms, or the processes behind it. They slap the “open source” label on it purely for marketing purposes, while keeping the most important parts completely closed. This practice has even earned a name: “open washing.”

It’s similar to greenwashing in sustainability, where companies loudly advertise their “zero carbon footprint” while quietly doing very little to actually reduce their environmental impact. In this case, companies are capitalizing on the trust and goodwill that the open source community has built over decades—without truly embracing the open source ethos.

For instance, even Meta’s LLaMA model, which was widely celebrated as open source, wasn’t truly open in the beginning. Very little was actually released to the public beyond the bare minimum needed to make headlines.

And the issue goes deeper.

Everybody wants to use open source, but very few organizations want to invest in it.

As a small company, my team and I make a huge number of contributions—often using our own funds or relying on small sponsors. Meanwhile, massive corporations benefit from open source projects every single day but often contribute little to nothing back.

There are exceptions, of course, and I don’t want to paint every company with the same brush. But the imbalance is stark. The result is that a lot of the real innovation and heavy lifting is done by small teams or individual contributors who rarely get the recognition—or financial support—that they deserve.

My take: Open source remains one of the most powerful forces for good in technology. But we need to call out open washing when we see it and demand accountability from companies that profit from open source without giving back. Otherwise, the very principles that make open source so transformative will erode—and we risk turning it into just another marketing gimmick.

We’re currently working—thanks to the sponsorship of some large corporations—on a number of AI-related features in Apache Solr. Our focus is on enriching Solr with vector search improvements and large language model (LLM) integrations. These features are coming soon and will be part of Solr 10, so keep an eye out for future releases.

I’m also preparing to publish a book with Springer, titled “How Large Language Models Can Help Your Search Project.”

The book explores how LLMs are impacting the search world and provides practical guidance on how to leverage them to improve search systems. It covers the history of AI and LLMs, dives into open-source applications, and explains how these technologies can be applied to real-world search projects.

There are chapters dedicated to tools like Apache Solr, OpenSearch, Elasticsearch, Vespa, and others. The book also examines the spectrum between commercial and open-source models, explaining how to evaluate “how open” a model really is.

My goal is for this to serve as a technical guide for practitioners—search engineers, data scientists, and software developers—who already have a solid background in search but want to understand how LLMs can enhance their systems. It’s packed with design patterns, implementation strategies, and practical tips to help teams take their search projects to the next level.

Top Voices in Search Tech: Alessandro Benedetti

About Alessandro

Let’s start from the beginning — how did you get involved in the search tech industry?

Tell us about your current role and what you’re working on these days.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

What are some of the biggest misconceptions about search that you often encounter?

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

What is a golden tip for optimizing search performance or a piece of career advice that you would give your younger self?

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Know a search-tech guru that we should feature?

Top Voices in Search Tech: Alessandro Benedetti

About Alessandro

Let’s start from the beginning — how did you get involved in the search tech industry?

Tell us about your current role and what you’re working on these days.

Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?

What are some of the biggest misconceptions about search that you often encounter?

How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?

Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?

What is a golden tip for optimizing search performance or a piece of career advice that you would give your younger self?

Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?

Give us a spicy take/controversial opinion on something related to Search

Anything else you want to share? Feel free to tell us about a product or project you’re working on, or anything else that you think the search community will find valuable

Know a search-tech guru that we should feature?