About Dmitry
Dmitry has been focusing on search engines since 2010 with Apache Lucene and Solr and since 2020 with Elasticsearch and OpenSearch. He was responsible for building a search team and search technology powering AlphaSense product which today is used by thousands of reputed banks, hedge funds and companies in almost any industry vertical around the world. At Silo.AI Dmitry led a team of NLP researchers, search, frontend and QA engineers working on search at web scale, interacting with Product Management, Engineers and Data teams on a daily basis.
Dmitry has worked on open source projects Luke and Quepid, and co-founded a few startups: in text analytics, edtech and team engagement space. He is the founder and host of the Vector Podcast. Having established himself as an independent researcher in vector search, Dmitry began working on Muves -- multilingual and multimodal search engine, together with his co-founders.
Most recently, he's been helping to improve multilingual map search at TomTom, and co-taught a course on GenAI and LLMs at the University of Helsinki. In free time he enjoys reading, cycling and blogging about AI and Search. Dmitry holds a PhD in Applied Mathematics and a Master's in Computer Science.
Where to find David on the web:
Let’s start from the beginning — how did you get involved in the search tech industry?
As a student I got fascinated by the chicken-egg problem of finding information, where users need to know something about what they search. This later led to my PhD studies around semantic analyzer with application to machine translation, and most recently to researching vector search.
Tell us about your current role and what you’re working on these days.
At TomTom I'm a Senior Product Manager, focusing on improving map search. Our search engine is multi-country, multilingual and used by lots of clients, in Automotive and Enterprise segments. At Muves I build software that analyzes complex information, like scientific papers and market data.
Could you describe a ‘favorite failure’—a setback that ultimately led to an important lesson or breakthrough in your work?
One set-back was on the onset of my career: I was fired from a job during the probation period. On one hand it was a good outcome, because I did not like the job, on the other hand this has taught me to evaluate what I'm getting into more carefully. This goes towards what team it will be, how I fit culturally and professionally, but also what is the overarching goal for the role. If either the hiring manager / company or me are not clear on one of these, it is usually a big sign saying "go look elsewhere".
What are some of the biggest misconceptions about search that you often encounter?
Some people treat search engine development as a software product. It is not, it is a data science product - and by virtue of this, you need to structure the teams and processes to cater to this and maximize the experimentation rate. The best search teams don't know a lot more than others, but they iterate fast and learn quicker than others. Another misconception I've seen is that putting a generic engineer on the search domain will work. It usually does not, because one needs the passion and energy to overcome the set-backs in the experiments, build intuition for what to try next and share the excitement with the rest of the team. And of course be ready to keep learning all the time :)
How do you envision AI and machine learning impacting search relevance and data insights over the next 2-3 years?
I'm glad to see semantics making its way into the core search engine design, through vector search capabilities. But I also want to see more open-minded prototyping, using LLMs as a helping hand. Establishing previously unimagined connections in the data can be a game-changer. My advice is to start coding with LLMs, impress yourself, learn where they are good at and where not, and become a better engineer.
Can you share an example of a particularly challenging production issue you’ve encountered in your work with search technologies, and the process you used to resolve it?
It was related to time for each user on Earth. We had some documents that would get published "in the future" for users that are on the US West coast, while East is already +1 day. And since you typically say: give me last X days, you would not see them. Solving this issue required figuring the maximum available time on the planet, which turns out to be in Line Islands, Kiribati (UTC+14). It is also exciting that the maximum difference in time on our planet is 26 hours, not 24 :)
Are there any open-source tools or projects—beyond Elasticsearch and OpenSearch—that have significantly influenced your work?
Apache Lucene / Solr, Quepid, luke, hnswlib, big-ann-benchmarks, hello-ltr, lightfm
Is there a log error/alert that terrifies/annoys you in particular?
"500 Internal Server Error"
What is a golden tip for optimizing search performance that you’ve picked up in your years of experience?
Setup a repeatable measurements pipeline, that should reflect the expectations of your users. Build an experimental pipeline that lets experiments flow and get them measured.
What is the most unexpected or unconventional way you’ve seen search technologies applied?
I've once used Lucene to speed up data matching in-memory, that was miles faster than a regular SQL DB.
If you're building something from scratch - what does your ideal search tech stack look like?
Python & a search engine that can be brought up from the code. Better if it is without docker!
Give us a spicy take/controversial opinion on something related to Search
A few search vendors go too deep into marketing themselves instead of showing how their engines will enable real, uncommon, use cases, but also, most importantly, where things won't work.
Can you suggest a lesser-known book, blog, or resource that would be valuable to others in the community?
"97 Things Every Programmer Should Know" and "The Practice of Programming"(Kernighan and Pike)
Anything else you want to share? Feel free to tell us about a product or project you’re working on or anything else that you think the search community will find valuable
In my spare time I'm working on Muves Papers product, that I really find useful for those wanting to stay on top of research. It is both a search engine and an LLM in the mix under the hood, and saves me (and others who tried) a lot of time. It has the goodies like vector search-based online recommendation feature, which works really well. Hit me up, if you want to be an early tester!