Hybrid Search for RAG: Combining BM25 and Vector Search for Precision and Recall

Vector search excels at semantic similarity—"days off" matches "paid vacation." But it struggles with precise identifiers: product codes, part numbers, legal reference numbers, error codes. Hybrid search combines the best of both worlds.

The Precision Problem

Imagine querying "RX-7000 thermal threshold" in a product catalog. Pure vector search may return RX-5000 results because the models are semantically similar. The vector embedding captures "thermal" and "threshold" but blurs the distinction between RX-7000 and RX-5000. For technical documentation, legal compliance, or product catalogs, that blur is unacceptable.

BM25 + Vector: Semantic and Keyword

Hybrid search uses two signals:

Dense (vector) embeddings: Capture meaning. Same embedding model you use today.
Sparse (BM25) representations: Capture exact keyword matches. "RX-7000" only matches "RX-7000."

At ingestion, you generate both dense and sparse vectors for each chunk. At query time, you send both. The vector database (e.g., Pinecone with hybrid indexes) combines them to return results that satisfy both semantic relevance and keyword precision.

When to Use Hybrid Search

Hybrid is ideal for:

Product catalogs: SKUs, model numbers, part numbers
Technical documentation: Error codes, API versions, command syntax
Legal and compliance: Reference numbers, regulation IDs, case citations
Any corpus where exact identifiers matter: Miss one character and you get the wrong answer.

Retrieval + Reranking: The Two-Phase Approach

A robust hybrid setup uses two phases:

Phase 1 (Recall): Send raw dense + sparse vectors to the index. The combined score brings relevant documents into the candidate set, but ordering may be imperfect.
Phase 2 (Ranking): Rerank candidates with a cross-encoder that reads query and document text together. This produces accurate relevance scores and fixes the final ordering.

Alpha scaling (weighted combination of dense + sparse scores) is often unreliable because dense and sparse scores live on different scales. Retrieval followed by reranking is the recommended approach.

Index Requirements

Hybrid search requires a specific index configuration:

Metric: dotproduct (cosine and euclidean don't support sparse vectors)
Index type: Hybrid (dense + sparse)
Existing indexes: Cannot be converted. You must create new hybrid indexes and re-ingest.

ShinRAG: Hybrid Search Built In

ShinRAG supports hybrid indexes end-to-end. Create a hybrid index, ingest with dense (OpenAI, Cohere, Voyage) and sparse (Pinecone Inference) embeddings, and query via agents or pipeline nodes. Both semantic and keyword signals are used automatically.

Build RAG Systems with Hybrid Search

Create hybrid indexes, ingest your product catalogs and technical docs, and get both semantic and keyword precision. No custom infrastructure required.

Get Started Free