Self-Query RAG with Metadata Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Self-Query RAG with Metadata Implementation
Medium
from 1 business day to 3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Implementing Self-Query RAG with Metadata

Self-Query RAG is a technique where LLM analyzes user query and automatically constructs structured filters on metadata in addition to vector search. Instead of searching only by semantics, the system precisely filters documents by date, type, author, department, and other attributes.

Problem Without Self-Query

Without Self-Query, query "security policies issued in 2024" searches all documents by semantics of "security," not filtering by year. User gets mixed results from different periods. With Self-Query: LLM extracts filter date >= 2024-01-01 AND doc_type = "security_policy" and applies it together with vector search.

Implementation via LangChain SelfQueryRetriever

from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Qdrant

# Metadata description for LLM
metadata_field_info = [
    AttributeInfo(
        name="doc_type",
        description="Document type: contract, regulation, policy, faq, procedure",
        type="string",
    ),
    AttributeInfo(
        name="department",
        description="Department or subdivision: hr, legal, finance, it, security",
        type="string",
    ),
    AttributeInfo(
        name="year",
        description="Document publication year",
        type="integer",
    ),
    AttributeInfo(
        name="status",
        description="Document status: active, archived, draft",
        type="string",
    ),
    AttributeInfo(
        name="author",
        description="Author or responsible party for document",
        type="string",
    ),
]

document_content_description = "Company corporate documentation: regulations, policies, contracts, procedures"

llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents=document_content_description,
    metadata_field_info=metadata_field_info,
    enable_limit=True,  # Allows LLM to specify limit in query
    verbose=True,
)

Self-Query Examples

# Example 1: Filter by year and type
result = retriever.invoke(
    "What security policies were active in 2023?"
)
# LLM generates filter: {"doc_type": "policy", "department": "security", "year": 2023, "status": "active"}
# Then executes vector search with this filter

# Example 2: Filter by department
result = retriever.invoke(
    "Show me HR department regulations"
)
# Filter: {"doc_type": "regulation", "department": "hr"}

# Example 3: No filter (pure vector search)
result = retriever.invoke(
    "How to prepare for an audit?"
)
# LLM doesn't extract structured filters — pure semantic search

Custom Self-Query Implementation Without LangChain

from pydantic import BaseModel, Field
from typing import Optional
from openai import OpenAI
import json

class SearchFilter(BaseModel):
    semantic_query: str = Field(description="Pure semantic part for vector search")
    doc_type: Optional[str] = Field(default=None, description="Document type")
    department: Optional[str] = Field(default=None, description="Department")
    year_from: Optional[int] = Field(default=None, description="Year from (inclusive)")
    year_to: Optional[int] = Field(default=None, description="Year to (inclusive)")
    status: Optional[str] = Field(default=None, description="Status: active/archived")

def parse_query_to_filter(user_query: str, client: OpenAI) -> SearchFilter:
    response = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[{
            "role": "system",
            "content": "Extract structured filters from user query for document search."
        }, {
            "role": "user",
            "content": user_query
        }],
        response_format=SearchFilter,
        temperature=0,
    )
    return response.choices[0].message.parsed

def self_query_search(user_query: str, vectorstore, top_k: int = 5) -> list:
    filter_obj = parse_query_to_filter(user_query, openai_client)

    # Build Qdrant filter
    qdrant_filter = build_qdrant_filter(filter_obj)

    return vectorstore.similarity_search(
        filter_obj.semantic_query,
        k=top_k,
        filter=qdrant_filter,
    )

Practical Case: Corporate Knowledge Base with Metadata

Task: Search assistant for 15,000 internal documents with metadata (type, department, year, status, author).

Before Self-Query: 42% of queries returned archived documents instead of current ones.

After Self-Query:

  • Archived documents in results for "current" queries: 42% → 3%
  • Precision@5: 0.68 → 0.89
  • User satisfaction: +31%

Failure cases: LLM sometimes misinterprets filter parameters on ambiguous queries. Solution—add confidence threshold and fallback to pure semantic search on low confidence.

Timelines

  • Marking document metadata: 1–3 weeks (depends on data availability)
  • Implementing Self-Query Retriever: 3–5 days
  • Testing and prompt tuning: 3–5 days
  • Total: 2–5 weeks