Optimizing Elasticsearch Re-indexing for 1C-Bitrix
Optimizing Elasticsearch Re-indexing for 1C-Bitrix
On a project with 800,000 SKUs, the scheduled re-indexing from 1C took 14 hours. During that time, a queue of changes accumulated, search returned stale data, and the 1C import competed with re-indexing for resources. The goal: reduce a full re-index to 1–2 hours without stopping search.
Why Re-indexing Is Slow
Typical causes of slow re-indexing:
Sequential indexing via individual requests. Each document is sent as a separate PUT /bitrix_catalog/_doc/{id}. HTTP overhead, a TCP connection per request, 200ms response × 800,000 = 160,000 seconds.
Batch size in the Bulk API is too small. A batch size of 10 documents instead of the optimal 200–1,000.
Refresh after every batch. Forcing a POST /bitrix_catalog/_refresh after each batch is a performance killer. Refresh creates a new Lucene segment and blocks indexing for tens of milliseconds.
Incorrect refresh_interval during re-indexing. The default 1 second generates hundreds of thousands of small segments, and ES spends resources merging them.
Zero-Downtime Re-indexing Strategy
The key technique is indexing into a new index rather than overwriting the active one:
Current: bitrix_catalog_v1 <-- alias bitrix_catalog
New: bitrix_catalog_v2 <-- index here
After: switch alias to v2
// Create alias during initial setup
POST /_aliases
{
"actions": [
{ "add": { "index": "bitrix_catalog_v1", "alias": "bitrix_catalog" } }
]
}
Bitrix works with the bitrix_catalog alias. While re-indexing proceeds into v2, search continues to work through v1. After completion, atomically switch the alias.
Configuring Elasticsearch for Fast Indexing
Before re-indexing, temporarily change settings via the API:
PUT /bitrix_catalog_v2/_settings
{
"index": {
"refresh_interval": "-1",
"number_of_replicas": 0,
"translog.durability": "async",
"translog.sync_interval": "30s"
}
}
refresh_interval: -1 completely disables automatic refresh — documents are not available in search until an explicit refresh, but indexing speeds up 3–5x.
number_of_replicas: 0 disables replication during indexing. ES does not spend time copying shards.
translog.durability: async — the translog is flushed to disk on a timer rather than on every operation. The risk of losing the last 30 seconds of data on failure is acceptable for re-indexing, but not for production data.
After completion, restore settings:
PUT /bitrix_catalog_v2/_settings
{
"index": {
"refresh_interval": "5s",
"number_of_replicas": 1,
"translog.durability": "request"
}
}
Then force merge to reduce segment count:
POST /bitrix_catalog_v2/_forcemerge?max_num_segments=1
Optimizing the PHP Indexer
Determine the Bulk API batch size empirically. Target 5–15 MB per request:
$batchSize = 500; // documents per bulk request
$bulk = [];
foreach ($products as $product) {
$bulk[] = ['index' => ['_index' => 'bitrix_catalog_v2', '_id' => $product['ID']]];
$bulk[] = buildDocument($product);
if (count($bulk) >= $batchSize * 2) {
$client->bulk(['body' => $bulk]);
$bulk = [];
// Do NOT call _refresh here
}
}
if (!empty($bulk)) {
$client->bulk(['body' => $bulk]);
}
Parallel indexing via multiple processes — split the catalog by ID ranges:
# Process 1: IDs 1 - 200000
php index_products.php --from=1 --to=200000 &
# Process 2: IDs 200001 - 400000
php index_products.php --from=200001 --to=400000 &
# Process 3: IDs 400001 - 600000
php index_products.php --from=400001 --to=600000 &
wait
echo "Indexing complete"
Three parallel processes on a 3-node cluster provide linear speed improvement. The bottleneck is MySQL read throughput.
Switching the Alias
After indexing completes, atomically switch the alias:
POST /_aliases
{
"actions": [
{ "remove": { "index": "bitrix_catalog_v1", "alias": "bitrix_catalog" } },
{ "add": { "index": "bitrix_catalog_v2", "alias": "bitrix_catalog" } }
]
}
The operation is atomic — there is no moment between removing the old alias and adding the new one when the alias does not exist. Switching takes milliseconds, search is not interrupted.
Result
On a catalog of 800K SKUs after optimization: full re-index takes 1 hour 20 minutes (was 14 hours), incremental updates process 200–300 documents per second via Bulk API instead of 15–20 via individual requests.







