HyDE (Hypothetical Document Embeddings) Implementation for RAG

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
HyDE (Hypothetical Document Embeddings) Implementation for RAG
Medium
from 1 business day to 3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Implementing HyDE (Hypothetical Document Embeddings) for RAG

HyDE is a retrieval improvement technique proposed by Gao et al. (2022). Instead of searching for documents by embedding the query directly, LLM generates a hypothetical answer to the question, and the search is performed by the embedding of that answer. The hypothetical answer resides in the "documents" space rather than "questions" space, so its embedding better matches actual documents.

Why HyDE Works

Embedding space asymmetry: questions and answers are different distributions in vector space. The embedding of question "what is the statute of limitations for employment disputes" falls in the queries region, not the region of documents with answers. HyDE generates text similar to documents in the corpus.

Standard RAG:
Query → Embedding(query) → search → documents

HyDE:
Query → LLM → Hypothetical_answer → Embedding(answer) → search → documents

HyDE Implementation

from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.retrievers import ParentDocumentRetriever
import asyncio

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

hyde_prompt = ChatPromptTemplate.from_template("""
Please write a passage to support/answer the question
Question: {query}
Passage:""")

async def hyde_retriever(query: str, base_retriever, top_k: int = 5):
    # Step 1: Generate hypothetical document
    hypothetical_docs = await asyncio.gather(
        *[asyncio.to_thread(
            lambda: llm.invoke(hyde_prompt.format(query=query)).content
        ) for _ in range(3)]
    )
    
    # Step 2: Embed each hypothetical document and retrieve
    all_docs = []
    for hypo_doc in hypothetical_docs:
        docs = await asyncio.to_thread(
            lambda: base_retriever.invoke(hypo_doc)
        )
        all_docs.extend(docs)
    
    # Step 3: Deduplicate and return top-k
    seen = set()
    unique_docs = []
    for doc in all_docs:
        if doc.metadata.get("id") not in seen:
            unique_docs.append(doc)
            seen.add(doc.metadata.get("id"))
    
    return unique_docs[:top_k]

Multi-Query HyDE

Combine HyDE with multi-query approach for better coverage:

class MultiQueryHyDE:
    def __init__(self, llm, embeddings, base_retriever):
        self.llm = llm
        self.embeddings = embeddings
        self.retriever = base_retriever
    
    def generate_hypothetical_answers(self, query: str, num=3) -> list:
        """Generate multiple hypothetical answers"""
        prompt = f"""Generate {num} plausible hypothetical answers to: {query}
Each answer should be a complete, standalone paragraph."""
        
        response = self.llm.invoke(prompt)
        answers = response.content.split("\n\n")
        return [a.strip() for a in answers if a.strip()]
    
    def retrieve(self, query: str, top_k: int = 5) -> list:
        hypothetical_answers = self.generate_hypothetical_answers(query)
        
        all_docs = []
        for answer in hypothetical_answers:
            docs = self.retriever.invoke(answer)
            all_docs.extend(docs)
        
        # Dedup and return
        seen = set()
        result = []
        for doc in all_docs:
            doc_id = doc.metadata.get("id")
            if doc_id not in seen:
                result.append(doc)
                seen.add(doc_id)
        
        return result[:top_k]

Practical Case: Technical Documentation Search

Task: assistant for 50,000 technical articles (avg 2500 words).

Results:

Metric Standard RAG HyDE
Context Recall 0.62 0.81
MRR@5 0.58 0.74
P@1 0.34 0.52
Latency 800ms 1200ms (+HyDE generation)

HyDE improved recall by 31% with acceptable latency increase.

Timeline

  • Implement HyDE retriever: 2–3 days
  • Test and tune prompt: 2–3 days
  • Compare vs baseline: 1–2 days
  • Total: 1 week