Semantic Search over Text Documents Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Semantic Search over Text Documents Implementation
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Implementation of Semantic Search Over Text Documents

Semantic search understands the meaning of a query, not just keywords. The query "how to boost team motivation" finds documents about "personnel management methods" that don't contain a single word from the query. This is fundamentally different from BM25/TF-IDF.

Semantic Search Architecture

Bi-encoder (main working mode): separate models encode queries and documents into vector space. Search is finding nearest vectors via ANN (Approximate Nearest Neighbor).

Cross-encoder (reranking): takes a query+document pair and outputs relevance score. Slower (O(N) vs O(log N)), but more accurate. Applied for reranking top-K results from bi-encoder.

Combining bi-encoder (retrieve) + cross-encoder (rerank) is the standard for production systems.

Models for Russian Language

from sentence_transformers import SentenceTransformer, CrossEncoder

# Bi-encoder
bi_encoder = SentenceTransformer("cointegrated/rubert-tiny2")
# For better quality: "sbert-base-ru-mean-tokens"

# Cross-encoder
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")  # English
# For Russian: "DiTy/cross-encoder-russian-msmarco"

Vector Store and Index

Qdrant — recommended for production:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient("localhost", port=6333)
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=312, distance=Distance.COSINE),
)

# Indexing
embeddings = bi_encoder.encode(documents, batch_size=64, show_progress_bar=True)
client.upload_points("documents", [
    PointStruct(id=i, vector=emb.tolist(), payload={"text": doc})
    for i, (emb, doc) in enumerate(zip(embeddings, documents))
])

FAISS — for in-memory indexes, fast, no external service required:

import faiss
index = faiss.IndexFlatIP(312)  # Inner Product (cosine after normalization)
faiss.normalize_L2(embeddings)
index.add(embeddings)

Hybrid Search

Semantic + BM25 — better than either alone:

# BM25 component (Elasticsearch or rank_bm25)
from rank_bm25 import BM25Okapi
bm25 = BM25Okapi([doc.split() for doc in corpus])

# Semantic component
semantic_scores = cosine_similarity([query_emb], doc_embeddings)[0]

# RRF (Reciprocal Rank Fusion)
def rrf(bm25_ranks, semantic_ranks, k=60):
    scores = {}
    for rank, idx in enumerate(bm25_ranks):
        scores[idx] = scores.get(idx, 0) + 1/(k + rank)
    for rank, idx in enumerate(semantic_ranks):
        scores[idx] = scores.get(idx, 0) + 1/(k + rank)
    return sorted(scores, key=scores.get, reverse=True)

Query Expansion and Preprocessing

Search quality depends on query processing:

  • Spell correction: users make typos
  • Synonym expansion: "DMS" → "voluntary medical insurance"
  • Query rewriting via LLM: "where to buy laptop" → "notebook purchase online store"

Quality Metrics

  • NDCG@10: Normalized Discounted Cumulative Gain
  • MAP (Mean Average Precision): average precision across all queries
  • MRR (Mean Reciprocal Rank): reciprocal rank of first relevant result

Evaluation requires a set of queries with relevance labels (qrels). Can be created automatically: GPT-4o generates questions for each document, with the document being the "gold" answer.

Performance

Qdrant with HNSW index: < 10ms per query on 1M vectors. FAISS IndexIVFFlat: < 5ms on 10M vectors. Bottleneck is usually query embedding generation, not the search itself.