Setting up product deduplication during automatic filling in 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Setting up product deduplication during auto-populating 1С-Bitrix

Filling a catalog from multiple sources inevitably creates duplicates. The same product — "Bosch GSR 18V-50" — comes from three suppliers with different names, SKUs, and descriptions. Without deduplication, the catalog grows, filters show duplicates, and managers spend hours on manual cleanup. Let's look at deduplication mechanisms at the Bitrix level.

Deduplication levels

1. Exact match by key. The most reliable method. If a product has a unique external identifier — EAN, GTIN, manufacturer SKU — deduplication is trivial: check for an element with such XML_ID or property value PROPERTY_ARTICLE.

$existing = CIBlockElement::GetList(
    [],
    ['IBLOCK_ID' => $iblockId, 'XML_ID' => $externalId],
    false,
    ['nTopCount' => 1],
    ['ID']
)->Fetch();

if ($existing) {
    // Update existing
    (new CIBlockElement())->Update($existing['ID'], $arFields);
} else {
    // Create new
    (new CIBlockElement())->Add($arFields);
}

Problem: not all sources provide stable unique identifiers. Supplier SKU ≠ manufacturer SKU. One product may have 3–5 different SKUs from different suppliers.

2. Match by field combination. If there's no unique key — search by combination: name + brand + key characteristic (volume, weight, size).

$filter = [
    'IBLOCK_ID' => $iblockId,
    '%NAME' => $normalizedName,
    'PROPERTY_BRAND' => $brand,
];

Before comparing names, normalize them: convert to lower case, remove extra spaces, replace typographic characters.

3. Fuzzy matching. When names differ between suppliers: "Bosch GSR 18V-50 Professional" vs "Шуруповёрт Bosch GSR18V50". Use fuzzy comparison algorithms: similar_text(), Levenshtein distance, trigrams.

Normalization before comparison

Deduplication quality directly depends on normalization. Minimal transformation set:

  • Convert to lower case: mb_strtolower().
  • Remove special characters: parentheses, quotes, hyphens, slashes.
  • Remove stop words: "article", "art.", "code", "model".
  • Normalize spaces: multiple spaces → one.
  • Remove unit and size indicators from name (if stored in separate properties).
function normalizeName(string $name): string
{
    $name = mb_strtolower(trim($name));
    $name = preg_replace('/[()«»"\'\/\-]/', ' ', $name);
    $name = preg_replace('/\b(арт|артикул|код|модель)\b\.?/u', '', $name);
    $name = preg_replace('/\s+/', ' ', $name);
    return trim($name);
}

Merge strategy

When a duplicate is found — what to do with the data? Three strategies:

Strategy Logic When to use
Source priority Data from highest-priority source overwrites others Have one "reference" supplier
Field merging Empty fields filled from alternative source Different sources complement each other
Manual moderation Duplicate flagged, manager decides Critical data, few duplicates

In practice, a combination is most common: automatic merging for non-critical fields (description, photos) and flagging for manual review when prices or key characteristics differ.

Implementation in Bitrix

The info block element's XML_ID field is the primary deduplication tool. Indexed by default, search is fast. But for multi-source catalogs, one XML_ID isn't enough.

Recommended scheme: separate reference info block parser_external_ids with fields:

  • NAME — external identifier (supplier SKU).
  • PROPERTY_SOURCE — source (supplier name).
  • PROPERTY_ELEMENT_ID — ID of main catalog element.
  • PROPERTY_MATCH_TYPE — match type (exact, fuzzy, manual).

On import, the parser first searches for the external ID in the reference. If found — update the linked element. If not — check fuzzy match by name. If match found — create link in reference and update element. If not — create new.

Batch deduplication of existing catalog

If the catalog already contains duplicates — one-time cleanup is needed. Algorithm:

  1. Export all elements: ID, NAME, XML_ID, key properties.
  2. Normalize names.
  3. Group by normalized name + brand.
  4. In each group, select "master record" (most complete card, highest ID, or priority source).
  5. Transfer orders, links, properties from duplicates to master record.
  6. Deactivate duplicates (ACTIVE = 'N'), don't delete.

Don't delete duplicates immediately. Deactivate and leave for 2–4 weeks. If algorithm error is found — elements can be restored.