Elasticsearch performance optimization shards replicas refresh interval

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Elasticsearch performance optimization shards replicas refresh interval
Complex
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1218
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    853
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1047
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    819

Elasticsearch Performance Optimization (shards, replicas, refresh_interval)

Slow Elasticsearch is almost always the result of wrong configuration, not lack of hardware. Extra shards kill performance more reliably than weak processors. Too frequent refresh makes indexing 3–5x slower than necessary. Elasticsearch optimization is primarily about correct design, and only then hardware tuning.

Shards: the main optimization point

Each shard is a separate Lucene index instance with its own file descriptors, JVM objects, and heap overhead. A cluster with 5 nodes holding 500 small indexes with 50 shards each = 25,000 shards = cluster crawls.

Rule: 1 shard = 10–50 GB of data. Less — shards too small (overhead dominates data). More — difficult to rebalance when adding a node.

Maximum shards per 1 GB heap: ~20 shards. With 16 GB heap = no more than 320 shards per node.

Check shard statistics:

# Shard size and distribution
curl -u elastic:pw "http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,docs,store,node"

# How many shards per node
curl -u elastic:pw "http://localhost:9200/_cat/nodes?v&h=name,shards,diskUsed,heapPercent"

Reduce number of shards via shrink API:

# First disable writes and move all shards to one node
PUT /products/_settings
{
  "settings": {
    "index.routing.allocation.require._name": "es-node-01",
    "index.blocks.write": true
  }
}

# Shrink to 1 shard
POST /products/_shrink/products_shrunk
{
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 1,
    "index.routing.allocation.require._name": null,
    "index.blocks.write": null
  }
}

refresh_interval

Elasticsearch by default does refresh every second — creates a new Lucene segment from in-memory buffer and makes documents searchable. Each refresh is file operations, segment creation, IO load.

For real-time search (chat, notifications) — keep 1s.

For analytics, logs, ETL — increase to 30s–300s:

PUT /logs-*/_settings
{
  "index.refresh_interval": "60s"
}

When bulk-loading data — disable temporarily:

PUT /products/_settings
{
  "index.refresh_interval": "-1"
}

# Load data...

PUT /products/_settings
{
  "index.refresh_interval": "1s"
}

POST /products/_refresh

Indexing speed gain with refresh_interval: -1 vs 1s — 3–5x.

Merge Policy and forcemerge

Lucene periodically merges small segments into large ones (merge). This frees space from deleted documents and speeds up search (fewer segments = fewer iterations). By default it happens in background but creates IO load.

For read-only indexes (archive data, completed rolling indexes) — force merge to 1 segment:

POST /logs-2024.01.01/_forcemerge?max_num_segments=1

After forcemerge, searching the index is significantly faster, and size decreases 20–40% by removing tombstone records for deleted docs.

Don't run forcemerge on actively indexed indexes — creates huge IO load.

Replicas during indexing

Replica is a synchronous copy of a shard on another node. Each written document is indexed into primary + all replica shards. With 2 replicas — tripled disk load during indexing.

When bulk-loading data into a new index:

// 1. Set 0 replicas during load
PUT /products/_settings
{ "index.number_of_replicas": 0 }

// 2. Load data
// ...

// 3. Restore replicas
PUT /products/_settings
{ "index.number_of_replicas": 1 }

Speed gain — 2–3x with 1 replica, 3–4x with 2 replicas.

Bulk API

Indexing one document at a time is an anti-pattern. Batch size depends on document size: target 5–15 MB batches.

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk, parallel_bulk

es = Elasticsearch([...])

def generate_actions(data):
    for item in data:
        yield {
            "_index": "products",
            "_id": item["id"],
            "_source": item,
        }

# Parallel bulk with multiple threads
success, errors = 0, 0
for ok, info in parallel_bulk(
    es,
    generate_actions(data),
    thread_count=4,
    chunk_size=500,
    max_chunk_bytes=10 * 1024 * 1024,  # 10 MB
    raise_on_error=False
):
    if ok:
        success += 1
    else:
        errors += 1
        print(f"Error: {info}")

Query optimization

Filter vs. Query: use filter everywhere you don't need score. Filters are cached at shard level, don't affect scoring.

// Slow (scoring + no cache)
{
  "query": {
    "term": { "is_active": true }
  }
}

// Fast (no scoring + cached)
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "is_active": true } }
      ]
    }
  }
}

Wildcard and regexp — expensive operations, especially with leading wildcard (*term). Avoid or replace with edge N-gram analysis.

Deep pagination: from: 10000 — expensive. ES must collect 10,000 + size documents from each shard. Use search_after for pagination:

{
  "query": { "match_all": {} },
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ],
  "search_after": ["2024-01-15T10:00:00", "abc123"],
  "size": 20
}

Monitoring query performance

Profile API — detailed query execution breakdown:

POST /products/_search
{
  "profile": true,
  "query": {
    "match": { "title": "laptop" }
  }
}

Response contains breakdown per shard: query creation time, scoring, fetch. Helps find bottleneck.

Hot Threads API — what JVM is doing:

curl -u elastic:pw "http://localhost:9200/_nodes/hot_threads"

Heap and GC

When heap > 85%, aggressive GC kicks in, requests start lagging. Signs: GCOverheadLimit exceptions in logs, sharp throughput drop.

Check GC statistics:

curl -u elastic:pw "http://localhost:9200/_nodes/stats/jvm?pretty" | \
  jq '.nodes[] | {name: .name, heap_used_percent: .jvm.mem.heap_used_percent, gc_young: .jvm.gc.collectors.young.collection_time_in_millis}'

G1GC (default in JDK 14+) is the best choice for ES. In jvm.options:

-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

Timeline

Audit of existing cluster configuration with recommendations — 1 working day. Optimization of sharding, refresh_interval, bulk indexing — 2–3 days. Deep query optimization with profiling on real load — additionally 1–2 days.