Setting up data transformation during parsing for 1С-Bitrix
Data from external sources never matches your catalog structure. Names in upper case, prices with currency symbol in the string, sizes in inches instead of centimeters, categories don't match info block sections. Without a transformation layer between the parser and importer, data enters the catalog "as is" — breaking filters, sorting, and card display.
Typical transformations
Text normalization:
- Case conversion:
ШУРУПОВЕРТ BOSCH GSR 18V→Шуруповёрт Bosch GSR 18V. Functionmb_convert_case()withMB_CASE_TITLEmode works for most cases but breaks abbreviations. You need a whitelist of words that shouldn't transform (SKUs, brands). - Removing extra spaces, non-breaking spaces (
\xC2\xA0), zero-width characters. - Replacing HTML entities:
&→&,"→".
Price normalization:
- Extracting a number from a string:
"1 299,00 руб."→1299.00. Regular expression:preg_replace('/[^\d,.]/', '', $price), then replace comma with period. - Currency conversion by fixed rate or via Central Bank API.
- Rounding to kopecks:
round($price, 2).
Characteristic normalization:
- Units of measurement:
"120 мм"→ value120, unitмм. Parsing via regex:/^([\d.,]+)\s*([а-яА-Яa-zA-Z]+)$/u. - Boolean values:
"Да","Yes","+","true","1"→Yfor "checkbox" type properties in Bitrix. - List values: map external value to property variant XML_ID. Correspondence table in DB or config array.
Category mapping
Source category structure doesn't match info block sections. Solution — mapping table:
$categoryMap = [
'Электроинструмент/Дрели' => 15, // Info block section ID
'Электроинструмент/Шуруповёрты' => 16,
'Ручной инструмент/Отвёртки' => 22,
];
$sectionId = $categoryMap[$externalCategory] ?? DEFAULT_SECTION_ID;
For new categories missing from mapping — put products in a "No category" section and log. Auto-creating sections is dangerous: one data error — and junk sections appear in the catalog.
Image processing
Images from external sources require processing before upload to Bitrix:
-
Resize — source provides 4000×3000 image, but catalog needs max 1200×1200. Use
\Bitrix\Main\File\Image\Imagickor GD for resizing before upload. - Format — converting WebP to JPEG if Bitrix is configured without WebP support.
- Watermarks — removal is impossible without quality loss, but can be cropped out if in corner.
- Duplicates — one product with five identical photos. Comparison by file md5 before upload.
Pre-import validation
Transformed data must be validated before writing to info block:
| Field | Rule | Action on violation |
|---|---|---|
NAME |
Not empty, 3–255 characters | Skip element, log |
XML_ID |
Unique, not empty | Skip (duplicate) |
PRICE |
Number > 0 | Set to 0, mark for review |
SECTION_ID |
Existing section | Place in "No category" |
PREVIEW_PICTURE |
File exists, size < 10 MB | Import without image |
Validation is implemented as a separate pipeline stage — between transformation and import. Elements failing validation go to separate parser_rejected table with rejection reason.
Transformation configuration
Transform rules must be configurable, not hardcoded in code. Config format:
$transformRules = [
'NAME' => [
['type' => 'trim'],
['type' => 'mb_title_case'],
['type' => 'max_length', 'value' => 255],
],
'PRICE' => [
['type' => 'extract_number'],
['type' => 'multiply', 'value' => 1.2], // 20% markup
['type' => 'round', 'value' => 2],
],
'PROPERTY_WEIGHT' => [
['type' => 'extract_number'],
['type' => 'convert_unit', 'from' => 'kg', 'to' => 'g'],
],
];
Each rule is a separate transformer function. Rule chain is applied sequentially. This allows changing transform logic without modifying parser code.







