Hybrid Search: Combining Sparse and Deep Embeddings in Elasticsearch

Recently, search has gained significant attention due to the ongoing development of advanced embedding models, including those supported by sentence-transformers, SPLADE++ (sparse), OpenAI embeddings, and GTR-3B (T5 fine-tuned for retrieval). These state-of-the-art models are capable to augment existing search capabilities, enabling more efficient and accurate information retrieval by comprehending and representing textual data in a semantically rich manner.

Traditionally, information retrieval systems have relied on Bag of Words (BOW) techniques, where each document or query is transformed into distinct word buckets. These buckets are then used to build sparse frequency vectors using methods like TF-IDF or Okapi BM25. The primary goal of these techniques is to find exact term matches in a document collection. Efficient use of inverted index makes them fast too. However, these approaches may not always be ideal. For instance, when searching for "wetland," users may also be interested in documents containing terms like "pantanal," "rainforest," "amazon," or "forest." TF-IDF and BM25 might not perform well in such scenarios. Deep semantic search could potentially address this issue, but it typically requires domain-specific fine-tuning, which is often challenging due to the scarcity of available data.

Exact term matching may not find documents containing relevant but non-identical words. What if we could add related words to a document before indexing, even if they don't exist in the original text? With recent advancements in NLP, a new class of models, such as SPLADE++, can augment traditional term-based search by generating relevant words and their importance concerning a document. These models offer several advantages over dense approaches, such as efficient use of inverted indexes, explicit lexical matching, and interpretability. Additionally, they appear to be better at generalizing on out-of-domain data.

SPLADE can address the shortcomings of traditional algorithms like BM25, but dense embeddings may still be necessary in certain cases. For instance, when searching for blue jeans on an e-commerce search engine, we would want to search both images and text. We can perform two searches: 1) image-based search using dense image embeddings, and 2) text-based search using BM25 or SPLADE, or both. We can assign different weights to the ranking scores of each search, combine the results, and sort them by the weighted ranking score. Alternatively, we might have a collection of passages presenting various arguments, and we want to find passages that agree with a specific argument. Text-based search could struggle in this scenario, but domain-specific dense embeddings might come to our rescue if we have adequate data.

Each of the search techniques mentioned above offers unique advantages and drawbacks. Ideally, we would like to develop an approach that combines them in any configuration based on our specific use case. Fortunately, we can achieve this with Elasticsearch. In this guide, we will walk you through the steps to build a hybrid search using Elasticsearch.

SPLADE++ with FastAPI

You have the option to integrate SPLADE directly into your existing indexing service, but it's recommended to serve it from a separate service running on a GPU for near real-time response times.

After setting up the service, you can call the /sparse endpoint to obtain sparse embeddings for both your query string and document. Each embedding will have a dimension of 30522 (vocabulary size), but since it's a sparse embedding, most of the values will be zero. The response will only include non-zero values. A typical response from the /sparse endpoint may appear as follows:

With a basic T4 GPU, the SPLADE service can achieve a throughput of up to 50 QPS while maintaining an average latency of 40ms.

Dense embeddings for Text

You can use OpenAI's text-embedding-ada-002 to generate 1536-dimensional textual embeddings in the following manner:

Dense embeddings for Images

Numerous options exist for generating dense image embeddings. It's advisable to use a pre-trained model fine-tuned for your specific use case. If none are available, you can fine-tune your own. For this demonstration, we'll use OpenAI's CLIP to generate image embeddings, although these might not be ideal for your particular use case. It's recommended to host this model on a server, preferably on a GPU, similar to what we did with SPLADE (a basic T4 would work well).

In this tutorial, we'll demonstrate an e-commerce search implementation using various embedding generation methods we've discussed so far, such as sparse textual, dense textual, and dense visual embeddings. Employing dense textual embeddings might be overkill for e-commerce search, and they could potentially do more harm than good. A combination of sparse textual and dense visual embeddings should be sufficient for 99% of cases. However, we're including all methods for demonstration purposes.

Elasticsearch Indexing

For our e-commerce search, we can define following index mapping for our products index.

Once we define the mapping for the `products` index we can ingest each product to our index.

Elasticsearch Retrieval

After ingesting all the products, we can proceed with querying them. However, there are a few factors to consider before crafting a query. Although we'll write a single query, Elasticsearch will perform multiple sub-queries behind the scenes— one for dense-text, one for sparse-text, and one for dense-visual. Before combining the results, Elasticsearch can apply a boost factor ranging from 0 to 1 to individual sub-query. For instance, our final rank score could be:

\[\text{Score}(\text{q}) = 0.1 \times \text{dense\_text\_q} + 0.3 \times \text{dense\_visual\_q} + 0.6 \times \text{sparse\_text\_q}\]

However, this assumes that all scores from different sub-queries fall within the same range, which is not the case. dense_text embeddings generated using OpenAI are normalized, resulting in a similarity score ranging from [0, 1]. sparse_text embeddings generated using SPLADE are not normalized, leading to unknown score ranges. The same applies to dense_visual embeddings. To ensure that all three embeddings have an equal effect on search results, using an equal boost factor of 0.33 for sub-queries won't suffice. Instead, we must devise a unique boost factor for each sub-query based on our specific use case. You can use trial-and-error or benchmarking, whichever works for you. Additionally, a technique you could try is to normalize the scores to fit within a range of 0 to 1, through the use of min-max normalization.

You might wonder if normalizing all embeddings before indexing them would resolve this issue. However, normalizing embeddings might do more harm than good because each of the models discussed so far was trained using dot-product, so using cosine similarity wouldn't guarantee the same results. This problem is not specific to Elasticsearch but is common across retrieval systems. Even when using Pinecone, you would still need to determine an appropriate alpha value (Pinecone's version of boost) based on your use case. Despite these challenges, implementing hybrid search using Elasticsearch or even Pinecone remains feasible if the correct boost factor values are employed in the convex combination.

Now let's query our products index to search a blue denim jeans by Banana Republic. We only want top 10 results.

You can try any combination of boost multipliers based on your specific use-case.

Thank you for reading, and I hope you found this useful. If you have any questions, you can email me or follow me at twitter.