Parser & Auto-filling Services for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Showing 30 of 45 servicesAll 1626 services
FAQ
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1173
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    745
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Parsers and Auto-Population for 1C-Bitrix

XMLReader, not SimpleXML — that's where working with an 800 MB supplier catalog begins. SimpleXML pulls the entire file into memory, and PHP crashes with a fatal error at the 512 MB limit. XMLReader reads as a stream, node by node, consuming 20-30 MB regardless of file size. Every auto-population project we do on Bitrix starts with this detail.

What Parsing Actually Does

  • Initial catalog population — 15,000 product cards with descriptions, specifications, photos. Manually, that's three months of a content manager's work; a parser — one week including debugging.
  • Competitor price monitoring — data collection from Ozon, Wildberries, competitor websites. A competitor dropped the price on a popular item — you find out in two hours, not two weeks.
  • Supplier aggregation — five price lists in different formats (CSV in CP1251, XML in CommerceML, Excel with merged cells) turn into a unified catalog with a common infoblock property system.
  • Product card enrichment — pulling specifications, manuals, 3D models from manufacturer websites. Without this, a product card is an SEO dead end.
  • Assortment updates — products missing from the supplier's feed are deactivated via CIBlockElement::Update($ID, ['ACTIVE' => 'N']). New ones are created. The catalog stays synchronized.

Sources and Tools

Static websites — PHP (Goutte, Symfony DomCrawler) or Python (Scrapy, lxml). Speed: 50-100 pages/sec. Sufficient for catalogs without JS rendering.

SPAs and dynamic websites — Puppeteer or Playwright. Infinite scroll, AJAX filters, lazy-loaded images — a headless browser handles all of it. Speed drops to 1-10 pages/sec, but there's no alternative: the data only exists after JavaScript execution.

Supplier files:

  • Excel (XLS, XLSX) — PhpSpreadsheet. Be careful with merged cells and formulas — they break automatic mapping.
  • CSV — fgetcsv() with proper encoding. Suppliers love CP1251, BOM in UTF-8, and semicolons instead of commas. All of this needs to be detected and handled.
  • XML/YML — XMLReader for large files, SimpleXML for feeds up to 50 MB.
  • CommerceML — the standard exchange format with 1C. We parse import.xml and offers.xml, mapping to the infoblock structure.

API — supplier REST endpoints, marketplace APIs (Ozon Seller API, Wildberries API). We work within rate limits and handle pagination.

Auto-Population Pipeline

Four stages. Each can fail in its own way.

1. Collection. The parser crawls sources on a cron schedule. Raw data goes into an intermediate table — not directly into b_iblock_element. We log everything: how many pages were crawled, how many items were parsed, where we got a 403 or timeout. Without logs, debugging a parser is guesswork.

2. Normalization. This is where the main work happens:

  • Cleaning HTML tags, extra whitespace, Unicode garbage
  • Units of measurement: "mm" → "mm", "millimeters" → "mm", "millimeter" → "mm"
  • Mapping supplier categories → Bitrix infoblock sections. One supplier has "Laptops," another has "Laptops and Tablets," a third has "Notebooks" — all go into a single section
  • Deduplication by SKU, EAN/GTIN. One product from three suppliers shouldn't appear three times

3. Loading into Bitrix. Via CIBlockElement::Add() for new items, CIBlockElement::Update() for existing ones. Images: download, resize via CFile::ResizeImageGet(), convert to WebP. Properties — via CIBlockElement::SetPropertyValuesEx(). SEO meta via \Bitrix\Iblock\InheritedProperty\ElementValues. SEF URLs generated from transliterated names.

4. Updates. The key point — don't overwrite manual edits by content managers. We only update price, stock, and active status. Descriptions and photos that were manually refined are flagged with UF_MANUAL_EDIT in element properties and skipped during import. Products missing from the feed are deactivated but not deleted.

Competitor Price Monitoring

A separate subsystem with its own specifics:

Parameter How It Works
Frequency From once daily to every 2 hours — depends on market volatility
Matching By SKU, EAN, fuzzy name comparison via Levenshtein distance
Storage Custom vendor_price_monitor table with history, not infoblock
Alerts Telegram/email when a competitor's price deviates by more than X%
Auto-rules "Keep price 3% below the lowest competitor, but not below cost + 15%"

The result — a dashboard: your product vs competitors, price history, trends. The manager sees where they can raise the price without losing position and where they need to react.

CSV/XML Import Module

For supplier files — a custom module with an admin panel:

  • Configurable mapping: "column B in the file → BRAND property of the infoblock"
  • Auto-detect encoding (CP1251, UTF-8, UTF-16) via mb_detect_encoding() with verification
  • Image download by URL with a queue — to avoid saturating the connection
  • Incremental updates by row hash: row changed — update; unchanged — skip
  • Cron schedule, report: created 145, updated 892, errors 3 (with details)

Large files: CSV processed in batches of 1000 rows via fgetcsv(), XML streamed via XMLReader, background execution via the Bitrix agent queue — no PHP timeouts.

Legal Considerations

  • robots.txt — we respect it. Crawl-delay — we comply
  • Request frequency — 1-2 per second, no more. No need to DDoS someone else's site
  • Manufacturer content — we use it. Unique authored texts — we don't copy
  • Personal data — we don't collect

Our Process

  1. Prototype — a parser for 1-2 sources in 2-3 days. We assess data quality and potential pitfalls (Cloudflare protection, CAPTCHA, dynamic loading).
  2. Development — the full pipeline: parser → normalization → import into Bitrix → admin panel for management.
  3. Testing — we run it on the full catalog volume, checking edge cases (empty fields, broken HTML, corrupted images).
  4. Launch — set up cron, error monitoring via Telegram bot.
  5. Support — a competitor redesigned their layout? We update the CSS selectors in the parser.

Timelines

Task Timeline
Parser for a single site (static HTML) 3-5 days
Parser for an SPA site (Puppeteer/Playwright, bypass protection) 1-2 weeks
CSV/XML import module for Bitrix 1-2 weeks
Price monitoring system (5-10 competitors) 2-4 weeks
Comprehensive auto-population system 4-8 weeks
Parser support and adaptation subscription-based