What is Haystack and what is it used for?

Haystack is a production-ready framework from deepset for building NLP pipelines. It supports RAG, question-answering systems, semantic search, and document processing. Its main advantage is a declarative pipeline model that simplifies testing and versioning.

How does Haystack differ from LangChain?

Haystack uses declarative YAML pipelines with typed components, making it suitable for auditing and DevOps. LangChain is better for rapid prototyping and agent scenarios. For production-grade RAG, Haystack is often more efficient.

Which DocumentStores are supported in Haystack?

Haystack supports InMemoryDocumentStore (for testing), Elasticsearch, OpenSearch, Qdrant, Weaviate, Milvus, and pgvector. The choice depends on scale: Qdrant is good for >1M vectors, pgvector for PostgreSQL integration.

How do you set up hybrid search in Haystack?

Hybrid search combines BM25 and semantic search via DocumentJoiner with reciprocal rank fusion (RRF) mode. This yields better results than either method alone. An example configuration is provided in the article.

How long does Haystack integration take?

A basic RAG pipeline with one DocumentStore and LLM takes 1–2 weeks. With hybrid search and custom reranker, 3–4 weeks. Full production deployment with monitoring takes 6–8 weeks. Timelines are refined after an audit.

What is Haystack and what is it used for?

Haystack is a production-ready framework from deepset for building NLP pipelines. It supports RAG, question-answering systems, semantic search, and document processing. Its main advantage is a declarative pipeline model that simplifies testing and versioning.

How does Haystack differ from LangChain?

Haystack uses declarative YAML pipelines with typed components, making it suitable for auditing and DevOps. LangChain is better for rapid prototyping and agent scenarios. For production-grade RAG, Haystack is often more efficient.

Which DocumentStores are supported in Haystack?

Haystack supports InMemoryDocumentStore (for testing), Elasticsearch, OpenSearch, Qdrant, Weaviate, Milvus, and pgvector. The choice depends on scale: Qdrant is good for >1M vectors, pgvector for PostgreSQL integration.

How do you set up hybrid search in Haystack?

Hybrid search combines BM25 and semantic search via DocumentJoiner with reciprocal rank fusion (RRF) mode. This yields better results than either method alone. An example configuration is provided in the article.

How long does Haystack integration take?

A basic RAG pipeline with one DocumentStore and LLM takes 1–2 weeks. With hybrid search and custom reranker, 3–4 weeks. Full production deployment with monitoring takes 6–8 weeks. Timelines are refined after an audit.

Haystack Integration for NLP Pipelines

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

Haystack Integration for NLP Pipelines

Medium

from 1 week to 3 months

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1347
Development of a web application for FEEDME
1247
Website development for BELFINGROUP
948
Development of an online store for the company FURNORO
1183
B2B Advance company logo design
642
Development of a web application for Enviok
921

Show more works

Haystack Integration for NLP Pipelines

We often encounter a situation: a company has already collected a document corpus, but search works via grep or plain BM25. Results are irrelevant, answers to customer queries must be found manually. Or the team tried LangChain, but the prototype turned out too fragile for production. Haystack (deepset) solves both problems: a production-ready framework with a declarative pipeline model, where components are connected into a graph with typed data. This simplifies testing, versioning, and replacing components. Our experience includes over 5 years in NLP and 20+ implemented RAG systems. Order an audit of your document corpus – we will select the optimal architecture. Saving up to 40% of time on information retrieval is a real result of implementation.

Why Haystack Over LangChain for RAG?

Haystack wins in scenarios where stability and testability are needed. For document-centric tasks – when the main work involves searching and processing a document corpus. For production-grade RAG – a reliable system is required, not a prototype. Haystack for RAG is 3–5 times more reliable than LangChain on large document volumes. Our team prefers explicit configuration: YAML pipelines are easier to audit than LangChain's Python code. Haystack also has built-in components for multi-hop question answering. We use Haystack for projects where stability is important, and leave LangChain for rapid prototyping and agent scenarios.

Criteria	Haystack	LangChain
Approach	Declarative YAML pipelines	Imperative Python code
Testing	Built-in evaluators (Faithfulness, ContextRelevance)	Requires manual setup
Versioning	Git-friendly configs	More complex, depends on code
DocumentStore	Wide support (Qdrant, ES, pgvector)	Via integrations

How to Build a RAG Pipeline on Haystack?

In Haystack 2.x, the architecture became stricter: typed @component.input and @component.output, unified Document object, DocumentStore abstraction. Here is a minimal example:

from haystack import Pipeline, Document
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import RAGPromptBuilder

pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=store))
pipeline.add_component("prompt_builder", RAGPromptBuilder())
pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o-mini"))

pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "generator.prompt")

Which DocumentStore to Choose?

The choice depends on scale and infrastructure. For fast development – InMemoryDocumentStore (up to 10K documents). For production – Elasticsearch (BM25 + semantic) or Qdrant (high performance, >1M vectors). If you already use PostgreSQL – pgvector. Qdrant configuration example:

from haystack_integrations.document_stores.qdrant import QdrantDocumentStore

document_store = QdrantDocumentStore(
    url="http://localhost:6333",
    index="documents",
    embedding_dim=1536,
    recreate_index=False,
)

DocumentStore	When to Use
InMemoryDocumentStore	Development, tests, <10K documents
ElasticsearchDocumentStore	Already have ES, need BM25 + semantic
QdrantDocumentStore	High performance, >1M vectors
PgvectorDocumentStore	Integration with PostgreSQL infrastructure
WeaviateDocumentStore	Managed cloud, built-in hybrid search

Document Indexing: Step-by-Step Recipe

Indexing pipeline is a separate stage. We use the following components:

Conversion: PyPDFToDocument for PDF, TextFileToDocument for TXT.
Cleaning: DocumentCleaner removes garbage.
Splitting: DocumentSplitter splits into sentences (split_length=5, split_overlap=2).
Embedding: OpenAIDocumentEmbedder with model text-embedding-3-small.
Writing: DocumentWriter saves to DocumentStore.

from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter

indexing = Pipeline()
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("cleaner", DocumentCleaner())
indexing.add_component("splitter", DocumentSplitter(
    split_by="sentence", split_length=5, split_overlap=2
))
indexing.add_component("embedder", OpenAIDocumentEmbedder(
    model="text-embedding-3-small"
))
indexing.add_component("writer", DocumentWriter(document_store=document_store))

Hybrid Search: Combining BM25 and Semantics

Haystack supports hybrid search via DocumentJoiner with reciprocal_rank_fusion (RRF) mode. This yields 30–40% better relevance than each method alone. Saves time on manual result filtering. Example:

from haystack.components.retrievers import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.components.joiners import DocumentJoiner

pipeline.add_component("bm25", InMemoryBM25Retriever(document_store=store, top_k=10))
pipeline.add_component("semantic", InMemoryEmbeddingRetriever(document_store=store, top_k=10))
pipeline.add_component("joiner", DocumentJoiner(join_mode="reciprocal_rank_fusion"))

How to Speed Up a RAG Pipeline?

Performance is critical. We use:

async mode via pipeline.run_async() for concurrent request processing;
batching for embedder components – up to 10x speedup during indexing;
CachingChecker + Redis to cache search results;
Prometheus metrics via Hayhooks middleware. Typical RAG pipeline latency with gpt-4o-mini and Qdrant is 1–3 seconds per request.

Serialization and Deployment: Step-by-Step Process

Serialize the pipeline to YAML. Haystack supports export via pipeline.dump().
Save YAML to Git – enables code review of configuration.
Set up CI/CD: on push to main, run tests (evaluation metrics) and deploy via Hayhooks.
Haystack Hayhooks provides REST API for serving pipelines, including Prometheus metrics.

Example YAML Pipeline

version: "2.0"
components:
  - name: retriever
    type: InMemoryBM25Retriever
    params:
      document_store: store
  - name: prompt_builder
    type: RAGPromptBuilder
  - name: generator
    type: OpenAIGenerator
    params:
      model: gpt-4o-mini
connections:
  - retriever.documents -> prompt_builder.documents
  - prompt_builder.prompt -> generator.prompt

According to the Haystack documentation, this format easily integrates with any CI/CD tools.

Evaluating RAG Quality

Haystack has built-in evaluators: FaithfulnessEvaluator (answer matches context), ContextRelevanceEvaluator (context relevant to question), SASEvaluator (semantic similarity of response to reference). We include these metrics in CI/CD to track quality with each update. Contact us for an audit of your project – we will help set up a full evaluation cycle.

Integration Timelines

Basic RAG pipeline (1 DocumentStore, 1 LLM): 1–2 weeks.
Hybrid search + custom reranker: 3–4 weeks.
Production deployment + monitoring + evaluation: 6–8 weeks.

The cost is calculated individually after an audit. Get a consultation – we will evaluate your project in 1–2 days. Our engineers are certified in Haystack and OpenAI. Contact us for a detailed audit.

NLP Development: Text Classification, NER, Embeddings, and Information Extraction

We often receive a task: process 50,000 support tickets — currently all manual. Dataset — 3,000 labeled examples, 12 categories, imbalance: one category occupies 40% of the sample, three at 1-2% each. Baseline accuracy — 78%. Sounds decent until you look at recall for rare classes: 0.31, 0.44, 0.28. These classes — complaints and churn threats — are most important to the business.

This is a typical NLP development project. The problem is not the algorithm but that accuracy is the wrong metric. Our experience across 30+ projects shows: we start by analyzing business metrics and only then choose the model.

Why accuracy is not the right metric for rare classes?

Accuracy ignores imbalance. If the "churn" class appears in 2% of cases, the model can predict "all good" and get 98% accuracy — but the business loses clients. Solution: F1 macro (averaged over all classes) or weighted F1. For NER — strict entity F1 (exact matches only). We guarantee: after choosing the correct metric, model quality becomes measurable and predictable.

Text Classification: From BERT to Distillation

BERT-like models are the standard for classification. ruBERT-base or ruBERT-large from DeepPavlov for Russian. multilingual-e5-large — for multiple languages in one pipeline. XLM-RoBERTa-large — a strong multilingual backbone.

Fine-tuning for classification: add a classification head on top of the [CLS] token, train for 3-5 epochs with lr=2e-5, weight decay=0.01. For imbalance — weighted CrossEntropyLoss or focal loss with gamma=2.0. Contact us — we will show a code snippet.

Imbalance case study. Dataset — 3,000 examples, imbalance 1:20. Solution: class_weight via sklearn + CrossEntropyLoss. Additionally — augmentation of rare classes via backtranslation (ru→en→ru through MarianMT). Recall for rare classes rose from 0.31 to 0.67 with a slight drop in accuracy (76%→74%). Full NLP development end-to-end took 3 weeks.

Distillation for production. BERT-large gives F1 0.89, but inference on CPU — 180ms. Distillation into DistilBERT or ruBERT-tiny2 reduces latency to 25ms with F1 0.84. Export to ONNX Runtime provides an additional 1.5-2x speedup. DistilBERT achieves 7x lower latency than BERT-large with only a 5% drop in macro F1 – a typical production trade-off.

Model	F1 macro	Latency (CPU)	Size
BERT-large	0.89	180 ms	1.3 GB
DistilBERT	0.84	25 ms	250 MB
ruBERT-tiny2	0.81	12 ms	120 MB
DistilBERT + ONNX	0.84	14 ms	150 MB

How to choose between BERT and LLM for your task?

For most classification and extraction tasks, BERT-sized models offer the best trade-off between cost and performance. Shift to LLMs only when the task demands generation, complex reasoning, or zero-shot generalization.

NER: Named Entity Recognition

NER — extracting persons, organizations, locations, dates, amounts, document numbers. For general categories (PER, ORG, LOC), pre-trained models work well. For specialized ones (medical terms, legal concepts) — fine-tuning is needed.

Data annotation. The main cost of an NER project. For a quality model — 500-2,000 labeled sentences per entity type. Tools: Label Studio (open source) or Prodigy (by spaCy creators). IOB2 format — standard.

Architecture. Token classification on top of BERT: each token gets a label (B-PER, I-PER, O). spaCy 3.x with transformer pipeline — a convenient production choice.

Nested entities. Standard IOB models cannot handle nested entities (organization inside an address). For such tasks — span-based NER: SpanBERT or SpERT. More complex but correct.

Post-processing is mandatory. The model predicts tokens — normalized entities are needed. Date — dateparser. Amounts — regex + validation. Names — deduplication via rapidfuzz. Included in our standard delivery.

Sentiment Analysis and Opinion Mining

Binary classification positive/negative works out of the box with BERT. Complexity — aspect-based sentiment analysis (ABSA): "the restaurant has good food but terrible service." For ABSA: aspect extraction (NER) + sentiment per aspect. Joint models BERT-for-ABSA — quality on Russian data is lower due to dataset scarcity. RuSentiment, SentiRuEval — main resources.

For production with simple positive/negative/neutral: distil models are enough. Three classes, balanced dataset, 2,000+ examples — F1 macro 0.82-0.87 in 1-2 days.

Text Summarization

Extractive summarization (select sentences) — TextRank or BM25 without training. Fast, no hallucinations. Good for long documents.

Abstractive (generates new text) — seq2seq: mT5, mBART, FRED-T5, ruT5-large. For production via LLM API (GPT-4, Claude) — often the best cost/quality/speed trade-off.

Embeddings: Vector Representations of Text

Embeddings are the foundation of semantic search, deduplication, clustering, RAG. Quality critically affects downstream tasks.

Models. E5-large-v2, BGE-M3, multilingual-e5-large — strong multilingual embedders. sentence-transformers/paraphrase-multilingual-mpnet-base-v2 — fast option. For Russian: ru-en-RoSBERTa (Skoltech) performs well on semantic textual similarity.

Embedding quality evaluation uses the MTEB benchmark as standard. But top results on MTEB don't guarantee success on a domain dataset — we build domain-specific eval.

Fine-tuning embeddings. If standard models don't give the required Recall@k — contrastive learning on domain pairs with MultipleNegativesRankingLoss. How to perform this for domain data:

Collect 500–2,000 semantically similar pairs from your domain.
Apply MultipleNegativesRankingLoss with a batch size of 32–64.
Train for 1–3 epochs using AdamW (lr=2e-5).
Evaluate Recall@k on a held-out domain test set.

This approach yields a 5–15% improvement in Recall@k in practice.

Dimensionality and storage. E5-large: 1024 dim, float32 — 4KB per vector. For 10M documents — 40GB. Quantization int8 reduces to 10GB. FAISS IVF_PQ — more compact but with losses. Included in our deployment recommendations.

Information Extraction

Structured extraction is a frequent task. Examples: key contract terms, technical characteristics, dates and amounts from invoices.

Regex + rule-based. For INN, OGRN, amounts, dates — more reliable than neural networks. No data required.
NER + post-processing. For variable formats.
LLM with structured output. GPT‑4 / Claude with JSON schema — for complex documents. Cost: minimal per document. For 10k+ documents/day — we calculate the economics.

We guarantee a hybrid: regex/NER for typical fields + LLM for edge cases. Our guarantee is backed by years of production experience and more than 30 projects.

Work Stages

Stage	Duration	What's included
Data and metric analysis	3-5 days	Class distribution, text lengths, baseline
Baseline (TF‑IDF + LogReg)	1 day	Quick estimate of gap with deep models
Training and validation	1-2 weeks	k‑fold, early stopping, error analysis
Deployment (ONNX + FastAPI)	1-2 weeks	REST API, batching, monitoring
Documentation and training	2-3 days	Model card, API docs, team training

Prototype on existing data — 1-3 weeks. Production system with CI/CD — 1.5–2.5 months. Cost is calculated individually — get a consultation for a project estimate.

What's Included

Model and pipeline architecture documentation
Access to the model via REST API (FastAPI + ONNX)
Client team training (2-hour webinar + Q&A)
Accuracy guarantee on the agreed test set
Months of post-delivery support (bug fixes, adaptation to new data)

Our Experience

Years of NLP projects from classification to RAG systems. The team includes ML engineers experienced with Hugging Face, spaCy, LangChain, MLOps. We use vLLM, Kubeflow, Weights & Biases — a production stack, not toys. Contact us to evaluate your NLP project within two days — request a free consultation on your text processing pipeline.