Implementing Multi-Index RAG (Merging Multiple Sources)
Multi-Index RAG is an architecture where search is performed across multiple separate indexes (vector stores or collections), and results are merged into single context for LLM. This is necessary when working with heterogeneous data sources requiring different indexing strategies, or when isolating data by domain.
When Multi-Index Is Needed
Different data types: structured FAQ (short answers) and long regulations require different chunk sizes and retrieval strategies.
Different domains: legal documentation, technical documentation, product descriptions—semantic spaces weakly intersect, separate indexes give more precise retrieval.
Different sources: Confluence, SharePoint, Notion, GitHub—each requires its own parser and has specific metadata.
Security isolation: data from different departments stored in separate indexes with access control.
Multi-Index RAG Architecture
from typing import Optional
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.schema import Document
import asyncio
class MultiIndexRAG:
def __init__(self, embeddings, llm):
self.embeddings = embeddings
self.llm = llm
self.indexes: dict[str, Qdrant] = {}
self.router_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def add_index(self, name: str, collection: str, description: str):
"""Register index with description for router"""
self.indexes[name] = {
"retriever": Qdrant.from_existing_collection(
embeddings=self.embeddings,
collection_name=collection,
url="http://localhost:6333",
).as_retriever(search_kwargs={"k": 5}),
"description": description,
}
def route_query(self, query: str) -> list[str]:
"""LLM-router determines relevant indexes"""
index_descriptions = "\n".join([
f"- {name}: {info['description']}"
for name, info in self.indexes.items()
])
response = self.router_llm.invoke(f"""
Determine which of the following indexes to search to answer the query.
Return JSON-list of index names.
Available indexes:
{index_descriptions}
Query: {query}
Response (JSON list):""")
import json
try:
return json.loads(response.content)
except:
return list(self.indexes.keys()) # Fallback: all indexes
async def _search_index(self, index_name: str, query: str) -> tuple[str, list]:
"""Asynchronous search in one index"""
retriever = self.indexes[index_name]["retriever"]
docs = await asyncio.to_thread(retriever.invoke, query)
return index_name, docs
async def retrieve(self, query: str) -> dict[str, list]:
"""Parallel search across relevant indexes"""
relevant_indexes = self.route_query(query)
tasks = [
self._search_index(idx, query)
for idx in relevant_indexes
if idx in self.indexes
]
results = await asyncio.gather(*tasks)
return dict(results)
def build_context(self, search_results: dict[str, list]) -> str:
"""Assemble context from multiple indexes"""
context_parts = []
for index_name, docs in search_results.items():
if docs:
context_parts.append(f"## Source: {index_name}\n")
for doc in docs:
context_parts.append(f"- {doc.page_content}\n")
return "\n".join(context_parts)
Index Setup for Corporate Knowledge Base
rag = MultiIndexRAG(
embeddings=OpenAIEmbeddings(model="text-embedding-3-small"),
llm=ChatOpenAI(model="gpt-4o", temperature=0),
)
rag.add_index(
name="legal",
collection="legal_contracts",
description="Contracts, agreements, legal opinions",
)
rag.add_index(
name="hr",
collection="hr_policies",
description="HR policies: vacations, travel, hiring, termination",
)
rag.add_index(
name="it",
collection="it_procedures",
description="IT procedures: access, equipment, information security",
)
rag.add_index(
name="finance",
collection="finance_regulations",
description="Finance regulations: budget, procurement, advance reports",
)
rag.add_index(
name="faq",
collection="general_faq",
description="General frequently asked questions from employees",
)
Reranking Merged Results
After collecting results from multiple indexes, it's important to shuffle and rerank:
from flashrank import Ranker, RerankRequest
ranker = Ranker(model_name="ms-marco-MiniLM-L-12-v2")
def rerank_multi_index_results(
query: str,
search_results: dict[str, list[Document]],
top_n: int = 6,
) -> list[Document]:
"""Merges and reranks results from different indexes"""
# Collect all documents
all_docs = []
for docs in search_results.values():
all_docs.extend(docs)
if not all_docs:
return []
# Reranking
passages = [{"id": i, "text": doc.page_content} for i, doc in enumerate(all_docs)]
rerank_req = RerankRequest(query=query, passages=passages)
ranked = ranker.rerank(rerank_req)
return [all_docs[r["id"]] for r in ranked[:top_n]]
Practical Case: Corporate Assistant from 5 Sources
Sources: Confluence (5200 pages), SharePoint (3800 documents), JIRA (task export), GitHub (wiki, README), internal CRM documentation.
Problem with monolithic index: different content types have different optimal chunk sizes. GitHub README is best indexed functionally (code blocks + description), Confluence pages—by sections, CRM documentation—by answers.
Multi-Index configuration:
- 5 separate Qdrant collections
- LLM-router on GPT-4o-mini (~15ms overhead)
- Parallel search (async) reduces latency from 5×T to 1.2×T
Results:
- Context Recall: 0.71 (monolithic) → 0.88 (multi-index)
- Precision@5: 0.74 → 0.86
- Latency P95: 1.2s → 1.5s (parallel vs sequential +250ms)
- Routing accuracy (correct index set): 91%
Failure cases: 9% of queries fall into wrong index set—mainly cross-domain questions. Solution: on low router confidence, run search across all indexes with score threshold cutoff.
Federated Search with Access Control
def retrieve_with_permissions(
query: str,
user_id: str,
permission_service,
) -> dict[str, list]:
"""Search only across indexes allowed for user"""
allowed_indexes = permission_service.get_allowed_indexes(user_id)
relevant_indexes = [
idx for idx in route_query(query)
if idx in allowed_indexes
]
return {idx: search(idx, query) for idx in relevant_indexes}
Timelines
- Designing Multi-Index architecture: 1 week
- Developing ingestion pipelines (5 sources): 3–4 weeks
- LLM-router and integration: 1 week
- Reranking and evaluation: 1 week
- Total: 6–8 weeks







