Implementing Self-Query RAG with Metadata
Self-Query RAG is a technique where LLM analyzes user query and automatically constructs structured filters on metadata in addition to vector search. Instead of searching only by semantics, the system precisely filters documents by date, type, author, department, and other attributes.
Problem Without Self-Query
Without Self-Query, query "security policies issued in 2024" searches all documents by semantics of "security," not filtering by year. User gets mixed results from different periods. With Self-Query: LLM extracts filter date >= 2024-01-01 AND doc_type = "security_policy" and applies it together with vector search.
Implementation via LangChain SelfQueryRetriever
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Qdrant
# Metadata description for LLM
metadata_field_info = [
AttributeInfo(
name="doc_type",
description="Document type: contract, regulation, policy, faq, procedure",
type="string",
),
AttributeInfo(
name="department",
description="Department or subdivision: hr, legal, finance, it, security",
type="string",
),
AttributeInfo(
name="year",
description="Document publication year",
type="integer",
),
AttributeInfo(
name="status",
description="Document status: active, archived, draft",
type="string",
),
AttributeInfo(
name="author",
description="Author or responsible party for document",
type="string",
),
]
document_content_description = "Company corporate documentation: regulations, policies, contracts, procedures"
llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
retriever = SelfQueryRetriever.from_llm(
llm=llm,
vectorstore=vectorstore,
document_contents=document_content_description,
metadata_field_info=metadata_field_info,
enable_limit=True, # Allows LLM to specify limit in query
verbose=True,
)
Self-Query Examples
# Example 1: Filter by year and type
result = retriever.invoke(
"What security policies were active in 2023?"
)
# LLM generates filter: {"doc_type": "policy", "department": "security", "year": 2023, "status": "active"}
# Then executes vector search with this filter
# Example 2: Filter by department
result = retriever.invoke(
"Show me HR department regulations"
)
# Filter: {"doc_type": "regulation", "department": "hr"}
# Example 3: No filter (pure vector search)
result = retriever.invoke(
"How to prepare for an audit?"
)
# LLM doesn't extract structured filters — pure semantic search
Custom Self-Query Implementation Without LangChain
from pydantic import BaseModel, Field
from typing import Optional
from openai import OpenAI
import json
class SearchFilter(BaseModel):
semantic_query: str = Field(description="Pure semantic part for vector search")
doc_type: Optional[str] = Field(default=None, description="Document type")
department: Optional[str] = Field(default=None, description="Department")
year_from: Optional[int] = Field(default=None, description="Year from (inclusive)")
year_to: Optional[int] = Field(default=None, description="Year to (inclusive)")
status: Optional[str] = Field(default=None, description="Status: active/archived")
def parse_query_to_filter(user_query: str, client: OpenAI) -> SearchFilter:
response = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[{
"role": "system",
"content": "Extract structured filters from user query for document search."
}, {
"role": "user",
"content": user_query
}],
response_format=SearchFilter,
temperature=0,
)
return response.choices[0].message.parsed
def self_query_search(user_query: str, vectorstore, top_k: int = 5) -> list:
filter_obj = parse_query_to_filter(user_query, openai_client)
# Build Qdrant filter
qdrant_filter = build_qdrant_filter(filter_obj)
return vectorstore.similarity_search(
filter_obj.semantic_query,
k=top_k,
filter=qdrant_filter,
)
Practical Case: Corporate Knowledge Base with Metadata
Task: Search assistant for 15,000 internal documents with metadata (type, department, year, status, author).
Before Self-Query: 42% of queries returned archived documents instead of current ones.
After Self-Query:
- Archived documents in results for "current" queries: 42% → 3%
- Precision@5: 0.68 → 0.89
- User satisfaction: +31%
Failure cases: LLM sometimes misinterprets filter parameters on ambiguous queries. Solution—add confidence threshold and fallback to pure semantic search on low confidence.
Timelines
- Marking document metadata: 1–3 weeks (depends on data availability)
- Implementing Self-Query Retriever: 3–5 days
- Testing and prompt tuning: 3–5 days
- Total: 2–5 weeks







