Optimization of search engine indexing in 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1177
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Optimizing Search Indexing in 1C-Bitrix

Optimizing Search Indexing for the Built-in Bitrix Module

The built-in Bitrix search operates through the b_search_content, b_search_content_stem, and b_search_stem tables. On a catalog of 30,000+ products, a full re-index takes 4–12 hours, the CSearchIndex::IndexAgent agent runs continuously on the server, and search quality remains mediocre: stemming cannot handle morphology properly, and relevance does not account for popularity or sales data.

The task of optimizing search indexing is addressed at two levels: tactical (speed up and stabilize the existing mechanism) and strategic (migrate to Elasticsearch when full-text search with facets is required).

Diagnosing the Current Indexing State

Examine the state of the index tables:

-- Search table sizes
SELECT
    table_name,
    ROUND(data_length / 1024 / 1024, 2) AS data_mb,
    ROUND(index_length / 1024 / 1024, 2) AS index_mb,
    table_rows
FROM information_schema.TABLES
WHERE table_schema = DATABASE()
  AND table_name LIKE 'b_search%'
ORDER BY data_length DESC;

The b_search_content_stem table on a large portal can occupy 2–5 GB. Running OPTIMIZE TABLE on it takes hours and blocks other queries — only do this during a maintenance window.

Check the progress of the current indexing:

SELECT SITE_ID, MODULE_ID, PARAM1,
       COUNT(*) as indexed_count,
       MAX(DATE_CHANGE) as last_indexed
FROM b_search_content
GROUP BY SITE_ID, MODULE_ID, PARAM1;

Optimizing Indexing Parameters

In the admin interface (Settings → Search → Settings):

Minimum word length: increase from 2 to 3–4. Two-letter words pollute the index and carry no search value in the vast majority of queries.

Stop words: add prepositions, conjunctions, and articles. For a Russian-language site, the basic list includes 50–80 words.

Indexed modules: disable indexing for modules whose content users do not need to search (forums if closed; blogs; document management). Each unnecessary module in the index increases table size and re-indexing time.

Number of elements per agent pass:

// In search module settings or via agent
CSearch::ReIndex($moduleId, $start, $finish, $step = 50);

The optimal step value is 50–100 elements. More — the agent runs longer than a single execution window and may be interrupted by PHP timeout. Less — too much overhead per launch.

Incremental vs. Full Indexing

Full re-indexing (ReIndexAll) — only for initial setup or after major catalog structure changes. In normal operation, incremental indexing via the agent should be active.

The CSearchIndex::IndexAgent agent works incrementally — it indexes elements modified since the last indexing run. The problem arises when DATE_CHANGE on elements is updated with every 1C synchronization (even when content has not changed). In this case, the agent effectively re-indexes the entire catalog on every sync.

Solution: in the 1C import handler, compare a hash of the significant fields before and after the update, and only update DATE_CHANGE when content has actually changed:

$oldHash = md5($oldElement['NAME'] . $oldElement['DETAIL_TEXT'] . implode(',', $oldProperties));
$newHash = md5($newElement['NAME'] . $newElement['DETAIL_TEXT'] . implode(',', $newProperties));

if ($oldHash !== $newHash) {
    // Update with DATE_CHANGE modification
} else {
    // Update stock/prices without triggering re-indexing
    $DB->Query("UPDATE b_iblock_element SET TIMESTAMP_X = TIMESTAMP_X WHERE ID = {$id}");
}

Configuring MySQL FULLTEXT Indexes

The built-in Bitrix search uses MySQL FULLTEXT indexes on the b_search_content table:

# In my.cnf for InnoDB FULLTEXT
innodb_ft_min_token_size = 3
innodb_ft_stopword_table = 'mydb/my_stopwords'
ft_query_expansion_limit = 20

After changing ft_min_token_size, a full FULLTEXT re-index is required (OPTIMIZE TABLE b_search_content) — schedule this during a maintenance window.

Cleaning Up Stale Index Records

On live sites, b_search_content accumulates records for deleted elements — "garbage" that increases index size and slows down search:

-- Find records for deleted iblock elements
SELECT sc.ID, sc.PARAM1, sc.PARAM2
FROM b_search_content sc
LEFT JOIN b_iblock_element ie ON ie.ID = CAST(sc.PARAM1 AS UNSIGNED)
WHERE sc.MODULE_ID = 'iblock'
  AND ie.ID IS NULL
LIMIT 10000;

Delete in batches of 1,000 records via an agent or cron script — never run a single DELETE on 100,000 records in production.

Monitoring Search Quality

After optimization, configure logging of search queries that return zero results:

SELECT QUERY, SITE_ID, RESULT_COUNT, DATE_SEARCH
FROM b_search_log
WHERE RESULT_COUNT = 0
ORDER BY DATE_SEARCH DESC
LIMIT 100;

Queries with zero results are a signal to expand content, add synonyms, or migrate to morphological search.

Migrating to Elasticsearch

If the built-in search cannot meet quality or performance requirements — migrate to Elasticsearch via the intec.search module or a custom integration. This is a separate service, but diagnostics and recommendations are provided as part of this engagement.

Result

Optimizing indexing parameters and the incremental indexing strategy reduces full catalog traversal time by 50–80%, brings agent load down to a level that does not impact primary requests, and improves search result quality through stop-word configuration and minimum token length tuning.