Parser & Auto-filling Services for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
FAQ
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Parsers and Auto-Population for 1C-Bitrix

XMLReader, not SimpleXML — that's where working with an 800 MB supplier catalog begins. SimpleXML pulls the entire file into memory, and PHP crashes with a fatal error at the 512 MB limit. XMLReader reads as a stream, node by node, consuming 20-30 MB regardless of file size. Every auto-population project we do on Bitrix starts with this detail.

What Parsing Actually Does

  • Initial catalog population — 15,000 product cards with descriptions, specifications, photos. Manually, that's three months of a content manager's work; a parser — one week including debugging.
  • Competitor price monitoring — data collection from Ozon, Wildberries, competitor websites. A competitor dropped the price on a popular item — you find out in two hours, not two weeks.
  • Supplier aggregation — five price lists in different formats (CSV in CP1251, XML in CommerceML, Excel with merged cells) turn into a unified catalog with a common infoblock property system.
  • Product card enrichment — pulling specifications, manuals, 3D models from manufacturer websites. Without this, a product card is an SEO dead end.
  • Assortment updates — products missing from the supplier's feed are deactivated via CIBlockElement::Update($ID, ['ACTIVE' => 'N']). New ones are created. The catalog stays synchronized.

Sources and Tools

Static websites — PHP (Goutte, Symfony DomCrawler) or Python (Scrapy, lxml). Speed: 50-100 pages/sec. Sufficient for catalogs without JS rendering.

SPAs and dynamic websites — Puppeteer or Playwright. Infinite scroll, AJAX filters, lazy-loaded images — a headless browser handles all of it. Speed drops to 1-10 pages/sec, but there's no alternative: the data only exists after JavaScript execution.

Supplier files:

  • Excel (XLS, XLSX) — PhpSpreadsheet. Be careful with merged cells and formulas — they break automatic mapping.
  • CSV — fgetcsv() with proper encoding. Suppliers love CP1251, BOM in UTF-8, and semicolons instead of commas. All of this needs to be detected and handled.
  • XML/YML — XMLReader for large files, SimpleXML for feeds up to 50 MB.
  • CommerceML — the standard exchange format with 1C. We parse import.xml and offers.xml, mapping to the infoblock structure.

API — supplier REST endpoints, marketplace APIs (Ozon Seller API, Wildberries API). We work within rate limits and handle pagination.

Auto-Population Pipeline

Four stages. Each can fail in its own way.

1. Collection. The parser crawls sources on a cron schedule. Raw data goes into an intermediate table — not directly into b_iblock_element. We log everything: how many pages were crawled, how many items were parsed, where we got a 403 or timeout. Without logs, debugging a parser is guesswork.

2. Normalization. This is where the main work happens:

  • Cleaning HTML tags, extra whitespace, Unicode garbage
  • Units of measurement: "mm" → "mm", "millimeters" → "mm", "millimeter" → "mm"
  • Mapping supplier categories → Bitrix infoblock sections. One supplier has "Laptops," another has "Laptops and Tablets," a third has "Notebooks" — all go into a single section
  • Deduplication by SKU, EAN/GTIN. One product from three suppliers shouldn't appear three times

3. Loading into Bitrix. Via CIBlockElement::Add() for new items, CIBlockElement::Update() for existing ones. Images: download, resize via CFile::ResizeImageGet(), convert to WebP. Properties — via CIBlockElement::SetPropertyValuesEx(). SEO meta via \Bitrix\Iblock\InheritedProperty\ElementValues. SEF URLs generated from transliterated names.

4. Updates. The key point — don't overwrite manual edits by content managers. We only update price, stock, and active status. Descriptions and photos that were manually refined are flagged with UF_MANUAL_EDIT in element properties and skipped during import. Products missing from the feed are deactivated but not deleted.

Competitor Price Monitoring

A separate subsystem with its own specifics:

Parameter How It Works
Frequency From once daily to every 2 hours — depends on market volatility
Matching By SKU, EAN, fuzzy name comparison via Levenshtein distance
Storage Custom vendor_price_monitor table with history, not infoblock
Alerts Telegram/email when a competitor's price deviates by more than X%
Auto-rules "Keep price 3% below the lowest competitor, but not below cost + 15%"

The result — a dashboard: your product vs competitors, price history, trends. The manager sees where they can raise the price without losing position and where they need to react.

CSV/XML Import Module

For supplier files — a custom module with an admin panel:

  • Configurable mapping: "column B in the file → BRAND property of the infoblock"
  • Auto-detect encoding (CP1251, UTF-8, UTF-16) via mb_detect_encoding() with verification
  • Image download by URL with a queue — to avoid saturating the connection
  • Incremental updates by row hash: row changed — update; unchanged — skip
  • Cron schedule, report: created 145, updated 892, errors 3 (with details)

Large files: CSV processed in batches of 1000 rows via fgetcsv(), XML streamed via XMLReader, background execution via the Bitrix agent queue — no PHP timeouts.

Legal Considerations

  • robots.txt — we respect it. Crawl-delay — we comply
  • Request frequency — 1-2 per second, no more. No need to DDoS someone else's site
  • Manufacturer content — we use it. Unique authored texts — we don't copy
  • Personal data — we don't collect

Our Process

  1. Prototype — a parser for 1-2 sources in 2-3 days. We assess data quality and potential pitfalls (Cloudflare protection, CAPTCHA, dynamic loading).
  2. Development — the full pipeline: parser → normalization → import into Bitrix → admin panel for management.
  3. Testing — we run it on the full catalog volume, checking edge cases (empty fields, broken HTML, corrupted images).
  4. Launch — set up cron, error monitoring via Telegram bot.
  5. Support — a competitor redesigned their layout? We update the CSS selectors in the parser.

Timelines

Task Timeline
Parser for a single site (static HTML) 3-5 days
Parser for an SPA site (Puppeteer/Playwright, bypass protection) 1-2 weeks
CSV/XML import module for Bitrix 1-2 weeks
Price monitoring system (5-10 competitors) 2-4 weeks
Comprehensive auto-population system 4-8 weeks
Parser support and adaptation subscription-based