Parser Development for 1C-Bitrix: Where to Start?
XMLReader, not SimpleXML — the choice of tool determines the project's fate. SimpleXML loads the entire XML into memory, and with an 800 MB supplier file, PHP will crash with a fatal error on a 512 MB limit. XMLReader processes streamingly, node by node, consuming 20–30 MB — 30 times more efficient. This detail starts any parser development for Bitrix. With over 10 years of Bitrix development and 50+ parser projects delivered, we know the pitfalls. Contact us to start your parser development today.
What Problems Does Parsing Solve?
- Primary catalog filling — 15,000 cards with descriptions, characteristics, photos. Manually, that's three months of content manager work; a parser takes a week with debugging.
- Competitor price monitoring — collecting data from Ozon, Wildberries, competitor sites. A competitor drops the price on a hot item — you find out in two hours, not two weeks.
- Supplier aggregation — five price lists in different formats (CSV with CP1251, XML in CommerceML, Excel with merged cells) become a single catalog with a unified property system.
- Card enrichment — pulling characteristics, instructions, 3D models from manufacturer sites. Without this, a product card is an SEO empty shell.
- Assortment update — products missing from the supplier feed are deactivated via
CIBlockElement::Update($ID, ['ACTIVE' => 'N']). New ones are created. The catalog stays synchronized.
What Tools Do We Use in Parser Development?
Static websites — PHP (Goutte, Symfony DomCrawler) or Python (Scrapy, lxml). Speed: 50–100 pages/sec. Sufficient for catalogs without JS rendering.
SPA and dynamic websites — Puppeteer or Playwright. Infinite scroll, AJAX filters, lazy-load images — headless browser handles it all. Speed drops to 1–10 pages/sec, but there is no alternative: data exists only after JavaScript execution.
Supplier files:
- Excel (XLS, XLSX) — PhpSpreadsheet. Beware of merged cells and formulas — they break automatic mapping.
- CSV —
fgetcsv()with correct encoding. Suppliers love CP1251, BOM in UTF-8, and semicolons instead of commas. All need detection and handling. - XML/YML — XMLReader for large files, SimpleXML for feeds up to 50 MB.
- CommerceML — standard exchange format with 1C. We parse
import.xmlandoffers.xml, map to information block structure.
API — Supplier REST endpoints, marketplace APIs (Ozon Seller API, Wildberries API). We work within rate limits, handle pagination.
How Is the Auto-Population Pipeline Structured?
Four stages. Each can break in its own way.
-
Collection. Parser crawls sources via cron schedule. Raw data goes to an intermediate table — not directly into
b_iblock_element. Log everything: pages visited, elements parsed, where we got 403 or timeout. Without logs, debugging a parser is like fortune-telling. -
Normalization. Main work here:
- Clean HTML tags, extra spaces, Unicode garbage
- Units: "mm" → "mm", "millimeters" → "mm", "миллиметр" → "mm"
- Map supplier categories to Bitrix information block sections. One supplier has "Notebooks", another "Notebooks and tablets", third "Laptops" — all into one section
- Deduplication by SKU, EAN/GTIN. One product from three suppliers should not appear three times
-
Load into Bitrix. Via
CIBlockElement::Add()for new elements,CIBlockElement::Update()for existing. Images: download, resize viaCFile::ResizeImageGet(), convert to WebP. Properties viaCIBlockElement::SetPropertyValuesEx(). SEO meta via\Bitrix\Iblock\InheritedProperty\ElementValues. SEF URLs generated from name transliteration. -
Update. Key point — not overwrite manual edits by content manager. Update only price, stock, activity. Description and photos manually edited are flagged with
UF_MANUAL_EDITproperty and skipped during import. Products missing from feed are deactivated, not deleted.
Why Is Competitor Price Monitoring Necessary?
A separate subsystem with its own specifics:
| Parameter | How It Works |
|---|---|
| Frequency | From once a day to every 2 hours — depends on market volatility |
| Matching | By SKU, EAN, fuzzy name comparison via Levenshtein distance |
| Storage | Separate vendor_price_monitor table with history, not information blocks |
| Alerts | Telegram/email when competitor price deviation exceeds X% |
| Auto-rules | "Keep price 3% below competitor minimum, but not below cost + 15%" |
Result — dashboard: your product vs competitors, price history, trends. The manager sees where to raise price without losing position, and where to react.
CSV/XML Import Module: Customization for Your Format
For supplier files — custom module with admin panel:
- Configurable mapping: "column B in file → BRAND property of information block"
- Auto-detect encoding (CP1251, UTF-8, UTF-16) via
mb_detect_encoding()with validation - Download images from URL with queue — to avoid channel saturation
- Incremental update by row hash: row changed — update, no — skip
- Cron schedule, report: created 145, updated 892, errors 3 (with details)
Large files: CSV processed in batches of 1000 rows via fgetcsv() (10 times faster than row-by-row), XML streamed via XMLReader, background execution via Bitrix agent queue — no PHP timeouts.
Legal Aspects to Consider
-
robots.txt— respect it. Crawl-delay — comply. - Request frequency — 1–2 per second, no more. Don't DDoS someone else's site.
- Manufacturer content — use it. Unique author texts — don't copy.
- Personal data — don't collect.
What Is Included in a Turnkey Parser Development?
| Component | Description |
|---|---|
| Prototype | Parser for 1–2 sources in 2–3 days to assess data quality |
| Main parser | Full data collection from one source (static/dynamic) |
| Bitrix import module | Normalization, loading, update, mapping admin panel |
| Price monitoring | If needed — collection and alert system (up to 10 competitors) |
| Documentation | Architecture description, selector update instructions |
| Support | 3-month guarantee for uninterrupted operation, fix for donor layout changes |
How We Work and Deadlines
- Prototype — parser for 1–2 sources in 2–3 days. Assess data quality, pitfalls (Cloudflare protection, captcha, dynamic loading).
- Development — full pipeline: parser → normalization → import into Bitrix → admin panel for management.
- Testing — run on full catalog volume, check edge cases (empty fields, malformed HTML, broken images).
- Launch — configure cron, error monitoring via Telegram bot.
- Support — competitor changed layout? Update CSS selectors in parser.
| Task | Deadlines |
|---|---|
| Single site parser (static HTML) | 3–5 days |
| SPA site parser (Puppeteer/Playwright, bypass protection) | 1–2 weeks |
| CSV/XML import module for Bitrix | 1–2 weeks |
| Price monitoring system (5–10 competitors) | 2–4 weeks |
| Comprehensive auto-population system | 4–8 weeks |
| Parser support and adaptation | by subscription |
Get in touch for a free consultation — we will analyze your data sources and propose the optimal parser architecture. Request a project assessment today and get a fixed deadline. We guarantee stable parser operation and full support throughout the usage period.







