RAG Development with Elasticsearch Vector Database (kNN)
Elasticsearch since version 8.x supports native k-Nearest Neighbors search on dense vectors (dense_vector field). For teams already using Elasticsearch as a search engine, this is the most natural path to RAG — without adding new infrastructure. Native integration of BM25 full-text and vector search makes ES a strong choice for hybrid retrieval.
Creating an Index with dense_vector Field
from elasticsearch import Elasticsearch
es = Elasticsearch("http://localhost:9200")
# Creating index with mapping
index_config = {
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "russian", # Native Russian morphology support
},
"source": {"type": "keyword"},
"doc_type": {"type": "keyword"},
"page": {"type": "integer"},
"date": {"type": "date"},
"embedding": {
"type": "dense_vector",
"dims": 1536,
"index": True,
"similarity": "cosine",
# HNSW parameters
"index_options": {
"type": "hnsw",
"m": 16,
"ef_construction": 100,
}
}
}
},
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
}
}
es.indices.create(index="knowledge_base", body=index_config)
Indexing Documents
from openai import OpenAI
from elasticsearch.helpers import bulk
openai_client = OpenAI()
def generate_actions(chunks: list):
texts = [c["text"] for c in chunks]
# Batch embeddings
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [e.embedding for e in response.data]
for chunk, embedding in zip(chunks, embeddings):
yield {
"_index": "knowledge_base",
"_source": {
"content": chunk["text"],
"source": chunk["source"],
"doc_type": chunk["doc_type"],
"page": chunk.get("page", 0),
"embedding": embedding,
}
}
# Batch loading
bulk(es, generate_actions(document_chunks))
Hybrid Search: BM25 + kNN
Elasticsearch supports hybrid search via knn + query in a single request:
def hybrid_search_es(
query: str,
doc_type_filter: str = None,
top_k: int = 5
) -> list:
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
# Filter clause
filter_clause = []
if doc_type_filter:
filter_clause.append({"term": {"doc_type": doc_type_filter}})
# Hybrid: kNN + BM25 via RRF
body = {
"query": {
"bool": {
"must": {
"match": {
"content": {
"query": query,
"analyzer": "russian"
}
}
},
"filter": filter_clause,
}
},
"knn": {
"field": "embedding",
"query_vector": query_embedding,
"k": top_k * 3, # Extended set for fusion
"num_candidates": 100,
"filter": filter_clause,
},
"rank": {
"rrf": {
"window_size": 50,
"rank_constant": 20,
}
},
"size": top_k,
"_source": ["content", "source", "doc_type"],
}
response = es.search(index="knowledge_base", body=body)
return [
{
"text": hit["_source"]["content"],
"source": hit["_source"]["source"],
"score": hit["_score"],
}
for hit in response["hits"]["hits"]
]
Advantage: Russian Morphology Out of the Box
Elasticsearch with the russian analyzer supports Russian word stemming via Snowball. This is critical for the BM25 part of hybrid search — a query for "договором" will find documents with "договор", "договоры", "договорам".
# Morphological analysis test
es.indices.analyze(
index="knowledge_base",
body={"analyzer": "russian", "text": "договором аренды"}
)
# tokens: ["договор", "аренд"] — stemmed forms
Practical Case Study: Migrating Existing Elasticsearch to RAG
Context: Company uses ES 8.x as a search engine for 500K documents. Task: Add RAG on top without changing infrastructure.
Steps:
- Add
embeddingfield (dense_vector, dims=1536) to existing mapping - Batch vectorize existing documents (2 days, 500K × $0.02/1M = $10)
- Reindex with new field (6 hours)
- Add RRF fusion to search queries
- RAG layer on top of ES retrieval
Results (vs pure BM25):
- NDCG@5: 0.64 → 0.81
- Recall@10: 0.71 → 0.88
- Latency P95: 85ms → 140ms (hybrid)
- Faithfulness (RAGAS): 0.76 → 0.91
Transition from pure BM25 to hybrid kNN+BM25 gave +27% to NDCG without changing infrastructure.
Timeline
- Adding vector field + reindexing: 2–5 days
- Developing hybrid search queries: 3–5 days
- RAG pipeline and evaluation: 1–2 weeks
- Total: 2–4 weeks







