RAG Development with Elasticsearch (kNN) Vector Database

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
RAG Development with Elasticsearch (kNN) Vector Database
Medium
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

RAG Development with Elasticsearch Vector Database (kNN)

Elasticsearch since version 8.x supports native k-Nearest Neighbors search on dense vectors (dense_vector field). For teams already using Elasticsearch as a search engine, this is the most natural path to RAG — without adding new infrastructure. Native integration of BM25 full-text and vector search makes ES a strong choice for hybrid retrieval.

Creating an Index with dense_vector Field

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

# Creating index with mapping
index_config = {
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "russian",  # Native Russian morphology support
            },
            "source": {"type": "keyword"},
            "doc_type": {"type": "keyword"},
            "page": {"type": "integer"},
            "date": {"type": "date"},
            "embedding": {
                "type": "dense_vector",
                "dims": 1536,
                "index": True,
                "similarity": "cosine",
                # HNSW parameters
                "index_options": {
                    "type": "hnsw",
                    "m": 16,
                    "ef_construction": 100,
                }
            }
        }
    },
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 1,
    }
}

es.indices.create(index="knowledge_base", body=index_config)

Indexing Documents

from openai import OpenAI
from elasticsearch.helpers import bulk

openai_client = OpenAI()

def generate_actions(chunks: list):
    texts = [c["text"] for c in chunks]
    # Batch embeddings
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    embeddings = [e.embedding for e in response.data]

    for chunk, embedding in zip(chunks, embeddings):
        yield {
            "_index": "knowledge_base",
            "_source": {
                "content": chunk["text"],
                "source": chunk["source"],
                "doc_type": chunk["doc_type"],
                "page": chunk.get("page", 0),
                "embedding": embedding,
            }
        }

# Batch loading
bulk(es, generate_actions(document_chunks))

Hybrid Search: BM25 + kNN

Elasticsearch supports hybrid search via knn + query in a single request:

def hybrid_search_es(
    query: str,
    doc_type_filter: str = None,
    top_k: int = 5
) -> list:
    query_embedding = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    ).data[0].embedding

    # Filter clause
    filter_clause = []
    if doc_type_filter:
        filter_clause.append({"term": {"doc_type": doc_type_filter}})

    # Hybrid: kNN + BM25 via RRF
    body = {
        "query": {
            "bool": {
                "must": {
                    "match": {
                        "content": {
                            "query": query,
                            "analyzer": "russian"
                        }
                    }
                },
                "filter": filter_clause,
            }
        },
        "knn": {
            "field": "embedding",
            "query_vector": query_embedding,
            "k": top_k * 3,  # Extended set for fusion
            "num_candidates": 100,
            "filter": filter_clause,
        },
        "rank": {
            "rrf": {
                "window_size": 50,
                "rank_constant": 20,
            }
        },
        "size": top_k,
        "_source": ["content", "source", "doc_type"],
    }

    response = es.search(index="knowledge_base", body=body)
    return [
        {
            "text": hit["_source"]["content"],
            "source": hit["_source"]["source"],
            "score": hit["_score"],
        }
        for hit in response["hits"]["hits"]
    ]

Advantage: Russian Morphology Out of the Box

Elasticsearch with the russian analyzer supports Russian word stemming via Snowball. This is critical for the BM25 part of hybrid search — a query for "договором" will find documents with "договор", "договоры", "договорам".

# Morphological analysis test
es.indices.analyze(
    index="knowledge_base",
    body={"analyzer": "russian", "text": "договором аренды"}
)
# tokens: ["договор", "аренд"] — stemmed forms

Practical Case Study: Migrating Existing Elasticsearch to RAG

Context: Company uses ES 8.x as a search engine for 500K documents. Task: Add RAG on top without changing infrastructure.

Steps:

  1. Add embedding field (dense_vector, dims=1536) to existing mapping
  2. Batch vectorize existing documents (2 days, 500K × $0.02/1M = $10)
  3. Reindex with new field (6 hours)
  4. Add RRF fusion to search queries
  5. RAG layer on top of ES retrieval

Results (vs pure BM25):

  • NDCG@5: 0.64 → 0.81
  • Recall@10: 0.71 → 0.88
  • Latency P95: 85ms → 140ms (hybrid)
  • Faithfulness (RAGAS): 0.76 → 0.91

Transition from pure BM25 to hybrid kNN+BM25 gave +27% to NDCG without changing infrastructure.

Timeline

  • Adding vector field + reindexing: 2–5 days
  • Developing hybrid search queries: 3–5 days
  • RAG pipeline and evaluation: 1–2 weeks
  • Total: 2–4 weeks