Multi-Index RAG Implementation (Multiple Source Integration)

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Multi-Index RAG Implementation (Multiple Source Integration)
Complex
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Implementing Multi-Index RAG (Merging Multiple Sources)

Multi-Index RAG is an architecture where search is performed across multiple separate indexes (vector stores or collections), and results are merged into single context for LLM. This is necessary when working with heterogeneous data sources requiring different indexing strategies, or when isolating data by domain.

When Multi-Index Is Needed

Different data types: structured FAQ (short answers) and long regulations require different chunk sizes and retrieval strategies.

Different domains: legal documentation, technical documentation, product descriptions—semantic spaces weakly intersect, separate indexes give more precise retrieval.

Different sources: Confluence, SharePoint, Notion, GitHub—each requires its own parser and has specific metadata.

Security isolation: data from different departments stored in separate indexes with access control.

Multi-Index RAG Architecture

from typing import Optional
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.schema import Document
import asyncio

class MultiIndexRAG:
    def __init__(self, embeddings, llm):
        self.embeddings = embeddings
        self.llm = llm
        self.indexes: dict[str, Qdrant] = {}
        self.router_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    def add_index(self, name: str, collection: str, description: str):
        """Register index with description for router"""
        self.indexes[name] = {
            "retriever": Qdrant.from_existing_collection(
                embeddings=self.embeddings,
                collection_name=collection,
                url="http://localhost:6333",
            ).as_retriever(search_kwargs={"k": 5}),
            "description": description,
        }

    def route_query(self, query: str) -> list[str]:
        """LLM-router determines relevant indexes"""
        index_descriptions = "\n".join([
            f"- {name}: {info['description']}"
            for name, info in self.indexes.items()
        ])

        response = self.router_llm.invoke(f"""
Determine which of the following indexes to search to answer the query.
Return JSON-list of index names.

Available indexes:
{index_descriptions}

Query: {query}

Response (JSON list):""")

        import json
        try:
            return json.loads(response.content)
        except:
            return list(self.indexes.keys())  # Fallback: all indexes

    async def _search_index(self, index_name: str, query: str) -> tuple[str, list]:
        """Asynchronous search in one index"""
        retriever = self.indexes[index_name]["retriever"]
        docs = await asyncio.to_thread(retriever.invoke, query)
        return index_name, docs

    async def retrieve(self, query: str) -> dict[str, list]:
        """Parallel search across relevant indexes"""
        relevant_indexes = self.route_query(query)

        tasks = [
            self._search_index(idx, query)
            for idx in relevant_indexes
            if idx in self.indexes
        ]

        results = await asyncio.gather(*tasks)
        return dict(results)

    def build_context(self, search_results: dict[str, list]) -> str:
        """Assemble context from multiple indexes"""
        context_parts = []
        for index_name, docs in search_results.items():
            if docs:
                context_parts.append(f"## Source: {index_name}\n")
                for doc in docs:
                    context_parts.append(f"- {doc.page_content}\n")

        return "\n".join(context_parts)

Index Setup for Corporate Knowledge Base

rag = MultiIndexRAG(
    embeddings=OpenAIEmbeddings(model="text-embedding-3-small"),
    llm=ChatOpenAI(model="gpt-4o", temperature=0),
)

rag.add_index(
    name="legal",
    collection="legal_contracts",
    description="Contracts, agreements, legal opinions",
)
rag.add_index(
    name="hr",
    collection="hr_policies",
    description="HR policies: vacations, travel, hiring, termination",
)
rag.add_index(
    name="it",
    collection="it_procedures",
    description="IT procedures: access, equipment, information security",
)
rag.add_index(
    name="finance",
    collection="finance_regulations",
    description="Finance regulations: budget, procurement, advance reports",
)
rag.add_index(
    name="faq",
    collection="general_faq",
    description="General frequently asked questions from employees",
)

Reranking Merged Results

After collecting results from multiple indexes, it's important to shuffle and rerank:

from flashrank import Ranker, RerankRequest

ranker = Ranker(model_name="ms-marco-MiniLM-L-12-v2")

def rerank_multi_index_results(
    query: str,
    search_results: dict[str, list[Document]],
    top_n: int = 6,
) -> list[Document]:
    """Merges and reranks results from different indexes"""

    # Collect all documents
    all_docs = []
    for docs in search_results.values():
        all_docs.extend(docs)

    if not all_docs:
        return []

    # Reranking
    passages = [{"id": i, "text": doc.page_content} for i, doc in enumerate(all_docs)]
    rerank_req = RerankRequest(query=query, passages=passages)
    ranked = ranker.rerank(rerank_req)

    return [all_docs[r["id"]] for r in ranked[:top_n]]

Practical Case: Corporate Assistant from 5 Sources

Sources: Confluence (5200 pages), SharePoint (3800 documents), JIRA (task export), GitHub (wiki, README), internal CRM documentation.

Problem with monolithic index: different content types have different optimal chunk sizes. GitHub README is best indexed functionally (code blocks + description), Confluence pages—by sections, CRM documentation—by answers.

Multi-Index configuration:

  • 5 separate Qdrant collections
  • LLM-router on GPT-4o-mini (~15ms overhead)
  • Parallel search (async) reduces latency from 5×T to 1.2×T

Results:

  • Context Recall: 0.71 (monolithic) → 0.88 (multi-index)
  • Precision@5: 0.74 → 0.86
  • Latency P95: 1.2s → 1.5s (parallel vs sequential +250ms)
  • Routing accuracy (correct index set): 91%

Failure cases: 9% of queries fall into wrong index set—mainly cross-domain questions. Solution: on low router confidence, run search across all indexes with score threshold cutoff.

Federated Search with Access Control

def retrieve_with_permissions(
    query: str,
    user_id: str,
    permission_service,
) -> dict[str, list]:
    """Search only across indexes allowed for user"""
    allowed_indexes = permission_service.get_allowed_indexes(user_id)
    relevant_indexes = [
        idx for idx in route_query(query)
        if idx in allowed_indexes
    ]
    return {idx: search(idx, query) for idx in relevant_indexes}

Timelines

  • Designing Multi-Index architecture: 1 week
  • Developing ingestion pipelines (5 sources): 3–4 weeks
  • LLM-router and integration: 1 week
  • Reranking and evaluation: 1 week
  • Total: 6–8 weeks