Setting up data transformation during parsing for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Setting up data transformation during parsing for 1С-Bitrix

Data from external sources never matches your catalog structure. Names in upper case, prices with currency symbol in the string, sizes in inches instead of centimeters, categories don't match info block sections. Without a transformation layer between the parser and importer, data enters the catalog "as is" — breaking filters, sorting, and card display.

Typical transformations

Text normalization:

  • Case conversion: ШУРУПОВЕРТ BOSCH GSR 18VШуруповёрт Bosch GSR 18V. Function mb_convert_case() with MB_CASE_TITLE mode works for most cases but breaks abbreviations. You need a whitelist of words that shouldn't transform (SKUs, brands).
  • Removing extra spaces, non-breaking spaces (\xC2\xA0), zero-width characters.
  • Replacing HTML entities: &&, "".

Price normalization:

  • Extracting a number from a string: "1 299,00 руб."1299.00. Regular expression: preg_replace('/[^\d,.]/', '', $price), then replace comma with period.
  • Currency conversion by fixed rate or via Central Bank API.
  • Rounding to kopecks: round($price, 2).

Characteristic normalization:

  • Units of measurement: "120 мм" → value 120, unit мм. Parsing via regex: /^([\d.,]+)\s*([а-яА-Яa-zA-Z]+)$/u.
  • Boolean values: "Да", "Yes", "+", "true", "1"Y for "checkbox" type properties in Bitrix.
  • List values: map external value to property variant XML_ID. Correspondence table in DB or config array.

Category mapping

Source category structure doesn't match info block sections. Solution — mapping table:

$categoryMap = [
    'Электроинструмент/Дрели'          => 15,  // Info block section ID
    'Электроинструмент/Шуруповёрты'     => 16,
    'Ручной инструмент/Отвёртки'        => 22,
];

$sectionId = $categoryMap[$externalCategory] ?? DEFAULT_SECTION_ID;

For new categories missing from mapping — put products in a "No category" section and log. Auto-creating sections is dangerous: one data error — and junk sections appear in the catalog.

Image processing

Images from external sources require processing before upload to Bitrix:

  • Resize — source provides 4000×3000 image, but catalog needs max 1200×1200. Use \Bitrix\Main\File\Image\Imagick or GD for resizing before upload.
  • Format — converting WebP to JPEG if Bitrix is configured without WebP support.
  • Watermarks — removal is impossible without quality loss, but can be cropped out if in corner.
  • Duplicates — one product with five identical photos. Comparison by file md5 before upload.

Pre-import validation

Transformed data must be validated before writing to info block:

Field Rule Action on violation
NAME Not empty, 3–255 characters Skip element, log
XML_ID Unique, not empty Skip (duplicate)
PRICE Number > 0 Set to 0, mark for review
SECTION_ID Existing section Place in "No category"
PREVIEW_PICTURE File exists, size < 10 MB Import without image

Validation is implemented as a separate pipeline stage — between transformation and import. Elements failing validation go to separate parser_rejected table with rejection reason.

Transformation configuration

Transform rules must be configurable, not hardcoded in code. Config format:

$transformRules = [
    'NAME' => [
        ['type' => 'trim'],
        ['type' => 'mb_title_case'],
        ['type' => 'max_length', 'value' => 255],
    ],
    'PRICE' => [
        ['type' => 'extract_number'],
        ['type' => 'multiply', 'value' => 1.2],  // 20% markup
        ['type' => 'round', 'value' => 2],
    ],
    'PROPERTY_WEIGHT' => [
        ['type' => 'extract_number'],
        ['type' => 'convert_unit', 'from' => 'kg', 'to' => 'g'],
    ],
];

Each rule is a separate transformer function. Rule chain is applied sequentially. This allows changing transform logic without modifying parser code.