Configuring Elasticsearch Analyzers for 1C-Bitrix Search
Searching for "notebook" finds nothing when the product is named "Notebooks ASUS". "Dizel" doesn't find "Diesel". User enters "phone samsung" — zero results, though the catalog is full of Samsung smartphones. This is the problem of Elasticsearch analyzers: without proper tokenization and normalization, the index and search query speak different languages.
How Bitrix Connects to Elasticsearch
The search module (class Bitrix\Search\Elastic) uses the elasticsearch-php client. Connection settings are in the b_option table, group search, key elastic_*. The search index is named bitrix_search_[site_id] by default. Mapping and analyzer settings are passed when creating the index via Elasticsearch API.
Current analyzer settings can be viewed:
curl -s http://localhost:9200/bitrix_search_s1/_settings | python3 -m json.tool
curl -s http://localhost:9200/bitrix_search_s1/_mappings | python3 -m json.tool
Anatomy of an Analyzer
An analyzer in Elasticsearch is a chain of three components:
- Character filter — text preprocessing (removing HTML, replacing characters)
- Tokenizer — splitting into tokens (by spaces, n-grams, edge n-grams)
- Token filters — token transformation (lowercase, stemming, synonyms, transliteration)
For a Russian Bitrix catalog, you need at minimum: Russian language stemming (russian snowball), lowercase, ASCII folding (for transliteration).
Creating an Index with Proper Analyzers
Bitrix allows overriding mapping through search module settings. But it's more reliable to create the index manually with needed analyzers before reindexing:
curl -X DELETE http://localhost:9200/bitrix_search_s1
curl -X PUT http://localhost:9200/bitrix_search_s1 \
-H "Content-Type: application/json" \
-d '{
"settings": {
"analysis": {
"filter": {
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
},
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"bitrix_russian": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"russian_stop",
"russian_stemmer",
"asciifolding"
]
},
"bitrix_autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edge_ngram_filter"
]
}
}
}
},
"mappings": {
"properties": {
"body": {
"type": "text",
"analyzer": "bitrix_russian",
"search_analyzer": "bitrix_russian"
},
"title": {
"type": "text",
"analyzer": "bitrix_russian",
"fields": {
"autocomplete": {
"type": "text",
"analyzer": "bitrix_autocomplete",
"search_analyzer": "bitrix_russian"
}
}
}
}
}
}'
ASCII Folding and Transliteration
The asciifolding parameter in the filter chain solves some issues with Cyrillic in mixed input. It transforms é → e, ü → u. But for full transliteration "Samsung" → "Samsung", you need a custom char_filter with mapping:
"char_filter": {
"translit_filter": {
"type": "mapping",
"mappings": [
"Samsung => Samsung",
"Apple => Apple",
"Asus => Asus"
]
}
}
The list of mappings for popular brands needs to be created manually for your specific catalog — there's no automatic solution.
Testing the Analyzer
After creating the index, verify how the analyzer tokenizes real queries:
# How a word is indexed
curl -X POST "http://localhost:9200/bitrix_search_s1/_analyze" \
-H "Content-Type: application/json" \
-d '{"analyzer": "bitrix_russian", "text": "Notebooks ASUS i5"}'
# Expected tokens: ["notebook", "asus", "i5"]
# Stemming "notebooks" -> "notebook" allows finding "notebook" query
Reindexing After Analyzer Change
Changing the analyzer requires full reindexing — existing documents are indexed with the old analyzer.
In Bitrix admin panel: "Search" → "Reindex". For large sites, better via CLI agent or cron:
php -f /var/www/bitrix/bitrix/modules/search/tools/reindex.php
Or via API:
\Bitrix\Search\Elastic::reindexAll();
Full reindexing of a typical 50,000-product catalog takes 15–45 minutes. The old index works during this — Bitrix supports blue/green reindexing via temporary index with subsequent alias switch.







