Configuring Elasticsearch Analyzers for 1C-Bitrix Search

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1177
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Configuring Elasticsearch Analyzers for 1C-Bitrix Search

Searching for "notebook" finds nothing when the product is named "Notebooks ASUS". "Dizel" doesn't find "Diesel". User enters "phone samsung" — zero results, though the catalog is full of Samsung smartphones. This is the problem of Elasticsearch analyzers: without proper tokenization and normalization, the index and search query speak different languages.

How Bitrix Connects to Elasticsearch

The search module (class Bitrix\Search\Elastic) uses the elasticsearch-php client. Connection settings are in the b_option table, group search, key elastic_*. The search index is named bitrix_search_[site_id] by default. Mapping and analyzer settings are passed when creating the index via Elasticsearch API.

Current analyzer settings can be viewed:

curl -s http://localhost:9200/bitrix_search_s1/_settings | python3 -m json.tool
curl -s http://localhost:9200/bitrix_search_s1/_mappings | python3 -m json.tool

Anatomy of an Analyzer

An analyzer in Elasticsearch is a chain of three components:

  1. Character filter — text preprocessing (removing HTML, replacing characters)
  2. Tokenizer — splitting into tokens (by spaces, n-grams, edge n-grams)
  3. Token filters — token transformation (lowercase, stemming, synonyms, transliteration)

For a Russian Bitrix catalog, you need at minimum: Russian language stemming (russian snowball), lowercase, ASCII folding (for transliteration).

Creating an Index with Proper Analyzers

Bitrix allows overriding mapping through search module settings. But it's more reliable to create the index manually with needed analyzers before reindexing:

curl -X DELETE http://localhost:9200/bitrix_search_s1

curl -X PUT http://localhost:9200/bitrix_search_s1 \
  -H "Content-Type: application/json" \
  -d '{
  "settings": {
    "analysis": {
      "filter": {
        "russian_stop": {
          "type": "stop",
          "stopwords": "_russian_"
        },
        "russian_stemmer": {
          "type": "stemmer",
          "language": "russian"
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20
        }
      },
      "analyzer": {
        "bitrix_russian": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "russian_stop",
            "russian_stemmer",
            "asciifolding"
          ]
        },
        "bitrix_autocomplete": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "bitrix_russian",
        "search_analyzer": "bitrix_russian"
      },
      "title": {
        "type": "text",
        "analyzer": "bitrix_russian",
        "fields": {
          "autocomplete": {
            "type": "text",
            "analyzer": "bitrix_autocomplete",
            "search_analyzer": "bitrix_russian"
          }
        }
      }
    }
  }
}'

ASCII Folding and Transliteration

The asciifolding parameter in the filter chain solves some issues with Cyrillic in mixed input. It transforms ée, üu. But for full transliteration "Samsung" → "Samsung", you need a custom char_filter with mapping:

"char_filter": {
  "translit_filter": {
    "type": "mapping",
    "mappings": [
      "Samsung => Samsung",
      "Apple => Apple",
      "Asus => Asus"
    ]
  }
}

The list of mappings for popular brands needs to be created manually for your specific catalog — there's no automatic solution.

Testing the Analyzer

After creating the index, verify how the analyzer tokenizes real queries:

# How a word is indexed
curl -X POST "http://localhost:9200/bitrix_search_s1/_analyze" \
  -H "Content-Type: application/json" \
  -d '{"analyzer": "bitrix_russian", "text": "Notebooks ASUS i5"}'

# Expected tokens: ["notebook", "asus", "i5"]
# Stemming "notebooks" -> "notebook" allows finding "notebook" query

Reindexing After Analyzer Change

Changing the analyzer requires full reindexing — existing documents are indexed with the old analyzer.

In Bitrix admin panel: "Search" → "Reindex". For large sites, better via CLI agent or cron:

php -f /var/www/bitrix/bitrix/modules/search/tools/reindex.php

Or via API:

\Bitrix\Search\Elastic::reindexAll();

Full reindexing of a typical 50,000-product catalog takes 15–45 minutes. The old index works during this — Bitrix supports blue/green reindexing via temporary index with subsequent alias switch.