Multi-Query RAG Implementation for Improved Retrieval Quality

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Offered services

Showing 1 of 1 servicesAll 1566 services

Medium

from 1 business day to 3 business days

FAQ

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1240
Development of a web application for FEEDME
1167
Website development for BELFINGROUP
867
Development of an online store for the company FURNORO
1084
B2B Advance company logo design
563
Development of a web application for Enviok
829

Show more works

Implementing Multi-Query RAG for Improved Retrieval Quality

Multi-Query RAG is a retrieval improvement technique where the original query is automatically paraphrased in several ways, each variant is run through search, and results are combined. This reduces dependency of answer quality on specific query wording and improves retrieval completeness.

Problem Multi-Query Solves

Vector embedding models are sensitive to query formulation. The same question asked differently can yield different top-K results:

"How to take vacation?" → finds articles about applications
"Procedure for getting annual leave" → finds regulation section
"Rules for providing vacation days" → finds HR policy

Multi-Query combines all three and gets more complete context.

Implementation with LangChain

from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Qdrant

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

base_retriever = Qdrant.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name="documents",
    url="http://localhost:6333",
).as_retriever(search_kwargs={"k": 5})

retriever = MultiQueryRetriever.from_llm(
    retriever=base_retriever,
    llm=llm
)

results = retriever.invoke("How to file a vacation request?")

Custom Multi-Query with Prompt Control

from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import BaseOutputParser
import json

class LineListOutputParser(BaseOutputParser):
    def parse(self, text: str) -> list:
        lines = text.strip().split("\n")
        return [line.strip() for line in lines if line.strip()]

multi_query_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate 3
different versions of the given user question to retrieve relevant documents from a vector database.
Provide these alternative questions separated by newlines.

Original question: {question}

Alternative questions:"""
)

def multi_query_retrieval(query: str, retriever, llm) -> list:
    # Generate queries
    queries = llm.invoke(multi_query_prompt.format(question=query))
    query_list = [queries.content] + [query]
    
    # Retrieve from each
    unique_docs = set()
    all_docs = []
    
    for q in query_list:
        docs = retriever.invoke(q)
        for doc in docs:
            doc_id = doc.metadata.get("id")
            if doc_id not in unique_docs:
                unique_docs.add(doc_id)
                all_docs.append(doc)
    
    return all_docs

Practical Case: HR Documentation Assistant

Task: assistant for HR policies covering vacation, sick leave, business travel, bonuses.

Results:

Metric	Standard RAG	Multi-Query RAG
Context Recall	0.71	0.89
Answer Completeness	0.68	0.87
User Satisfaction	0.74	0.91
Latency	400ms	1.1s (3× queries in parallel)

Multi-Query improved context recall by 25% while maintaining acceptable latency.

Timeline

Implement multi-query generator: 1–2 days
Test and optimize prompt: 1–2 days
Evaluation and tuning: 2–3 days
Total: 1 week