Elasticsearch Performance Optimization (shards, replicas, refresh_interval)
Slow Elasticsearch is almost always the result of wrong configuration, not lack of hardware. Extra shards kill performance more reliably than weak processors. Too frequent refresh makes indexing 3–5x slower than necessary. Elasticsearch optimization is primarily about correct design, and only then hardware tuning.
Shards: the main optimization point
Each shard is a separate Lucene index instance with its own file descriptors, JVM objects, and heap overhead. A cluster with 5 nodes holding 500 small indexes with 50 shards each = 25,000 shards = cluster crawls.
Rule: 1 shard = 10–50 GB of data. Less — shards too small (overhead dominates data). More — difficult to rebalance when adding a node.
Maximum shards per 1 GB heap: ~20 shards. With 16 GB heap = no more than 320 shards per node.
Check shard statistics:
# Shard size and distribution
curl -u elastic:pw "http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,docs,store,node"
# How many shards per node
curl -u elastic:pw "http://localhost:9200/_cat/nodes?v&h=name,shards,diskUsed,heapPercent"
Reduce number of shards via shrink API:
# First disable writes and move all shards to one node
PUT /products/_settings
{
"settings": {
"index.routing.allocation.require._name": "es-node-01",
"index.blocks.write": true
}
}
# Shrink to 1 shard
POST /products/_shrink/products_shrunk
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 1,
"index.routing.allocation.require._name": null,
"index.blocks.write": null
}
}
refresh_interval
Elasticsearch by default does refresh every second — creates a new Lucene segment from in-memory buffer and makes documents searchable. Each refresh is file operations, segment creation, IO load.
For real-time search (chat, notifications) — keep 1s.
For analytics, logs, ETL — increase to 30s–300s:
PUT /logs-*/_settings
{
"index.refresh_interval": "60s"
}
When bulk-loading data — disable temporarily:
PUT /products/_settings
{
"index.refresh_interval": "-1"
}
# Load data...
PUT /products/_settings
{
"index.refresh_interval": "1s"
}
POST /products/_refresh
Indexing speed gain with refresh_interval: -1 vs 1s — 3–5x.
Merge Policy and forcemerge
Lucene periodically merges small segments into large ones (merge). This frees space from deleted documents and speeds up search (fewer segments = fewer iterations). By default it happens in background but creates IO load.
For read-only indexes (archive data, completed rolling indexes) — force merge to 1 segment:
POST /logs-2024.01.01/_forcemerge?max_num_segments=1
After forcemerge, searching the index is significantly faster, and size decreases 20–40% by removing tombstone records for deleted docs.
Don't run forcemerge on actively indexed indexes — creates huge IO load.
Replicas during indexing
Replica is a synchronous copy of a shard on another node. Each written document is indexed into primary + all replica shards. With 2 replicas — tripled disk load during indexing.
When bulk-loading data into a new index:
// 1. Set 0 replicas during load
PUT /products/_settings
{ "index.number_of_replicas": 0 }
// 2. Load data
// ...
// 3. Restore replicas
PUT /products/_settings
{ "index.number_of_replicas": 1 }
Speed gain — 2–3x with 1 replica, 3–4x with 2 replicas.
Bulk API
Indexing one document at a time is an anti-pattern. Batch size depends on document size: target 5–15 MB batches.
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk, parallel_bulk
es = Elasticsearch([...])
def generate_actions(data):
for item in data:
yield {
"_index": "products",
"_id": item["id"],
"_source": item,
}
# Parallel bulk with multiple threads
success, errors = 0, 0
for ok, info in parallel_bulk(
es,
generate_actions(data),
thread_count=4,
chunk_size=500,
max_chunk_bytes=10 * 1024 * 1024, # 10 MB
raise_on_error=False
):
if ok:
success += 1
else:
errors += 1
print(f"Error: {info}")
Query optimization
Filter vs. Query: use filter everywhere you don't need score. Filters are cached at shard level, don't affect scoring.
// Slow (scoring + no cache)
{
"query": {
"term": { "is_active": true }
}
}
// Fast (no scoring + cached)
{
"query": {
"bool": {
"filter": [
{ "term": { "is_active": true } }
]
}
}
}
Wildcard and regexp — expensive operations, especially with leading wildcard (*term). Avoid or replace with edge N-gram analysis.
Deep pagination: from: 10000 — expensive. ES must collect 10,000 + size documents from each shard. Use search_after for pagination:
{
"query": { "match_all": {} },
"sort": [
{ "created_at": "desc" },
{ "_id": "asc" }
],
"search_after": ["2024-01-15T10:00:00", "abc123"],
"size": 20
}
Monitoring query performance
Profile API — detailed query execution breakdown:
POST /products/_search
{
"profile": true,
"query": {
"match": { "title": "laptop" }
}
}
Response contains breakdown per shard: query creation time, scoring, fetch. Helps find bottleneck.
Hot Threads API — what JVM is doing:
curl -u elastic:pw "http://localhost:9200/_nodes/hot_threads"
Heap and GC
When heap > 85%, aggressive GC kicks in, requests start lagging. Signs: GCOverheadLimit exceptions in logs, sharp throughput drop.
Check GC statistics:
curl -u elastic:pw "http://localhost:9200/_nodes/stats/jvm?pretty" | \
jq '.nodes[] | {name: .name, heap_used_percent: .jvm.mem.heap_used_percent, gc_young: .jvm.gc.collectors.young.collection_time_in_millis}'
G1GC (default in JDK 14+) is the best choice for ES. In jvm.options:
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
Timeline
Audit of existing cluster configuration with recommendations — 1 working day. Optimization of sharding, refresh_interval, bulk indexing — 2–3 days. Deep query optimization with profiling on real load — additionally 1–2 days.







