Data Parsing from Yandex.Market for 1C-Bitrix
Yandex.Market doesn't provide public API for bulk product export. Partner API (Content API) gives access only to sellers for own data. This means to fill 1C-Bitrix catalog with Market data, only parsing remains — with its technical limitations, legal risks, and engineering challenges.
What Exactly Gets Parsed
Product card on Yandex.Market contains:
- Title and description — text, often generated from characteristics.
- Characteristics — structured key-value pairs (weight, sizes, material).
- Prices and seller offers — dynamic data, updated in real time.
- Images — from 1 to 15 photos of different resolution.
- Reviews and rating — user content.
- Category — rubric tree in Market, not matching your catalog structure.
For catalog filling 1C-Bitrix usually need title, description, characteristics and images. Parsing prices makes little sense — change several times daily.
Technical Implementation
Yandex.Market — single-page application. HTML page contains minimum markup, main data loads via internal API calls and renders on client. This means regular HTTP request via cURL returns empty shell.
Two parsing approaches:
1. Headless browser (Puppeteer, Playwright). Renders JavaScript, waits for data load, extracts DOM. Reliable, but slow — 3–5 seconds per page. For 10,000 product catalog this is 8–14 hours continuous parsing.
2. Internal API interception. Market loads data via XHR requests to internal endpoints. If reproduce these requests with needed headers and cookies — get JSON without page rendering. 10–20x faster, but response format changes without warning.
In practice combination used: headless browser for initial analysis and session token retrieval, direct API requests — for bulk export.
Protection Bypass
Yandex actively blocks automated requests:
- SmartCaptcha — appears after 50–200 requests from one IP.
- Fingerprinting — analyzes TLS fingerprint, headers, behavioral patterns.
- Rate limiting — strict limits on request frequency.
For stable parsing need:
- Proxy server rotation (residential proxies, not datacenter).
- Request delay randomization (2–10 seconds).
- User-Agent and other header rotation.
- Captcha handling — via recognition services or manual queue.
Without proxy rotation Market parsing doesn't work. Single IP blocked within hour.
Data Mapping to Bitrix Infoblock
Market data structure doesn't match your catalog structure. Need transformation layer:
| Yandex.Market | Bitrix Infoblock | Note |
|---|---|---|
title |
NAME |
Trim to 255 characters |
description |
DETAIL_TEXT |
HTML → clean tags or keep |
specs[] |
PROPERTY_* |
Mapping by characteristic name |
images[] |
DETAIL_PICTURE + MORE_PHOTO |
Download and save locally |
categoryPath |
IBLOCK_SECTION_ID |
Map via correspondence table |
modelId |
XML_ID |
Unique ID for deduplication |
Market characteristics — flat list. Infoblock properties — typed fields. Need mapping table: "Weight, g" → PROPERTY_WEIGHT (type: number), "Color" → PROPERTY_COLOR (type: list, value search).
Loading into Bitrix
Recommended path — intermediate storage. Parser stores data in separate table or JSON files. Separate script reads intermediate data and imports via infoblock API:
CIBlockElement::Add($arFields);
CIBlockElement::SetPropertyValuesEx($elementId, $iblockId, $propertyValues);
Direct import from parser dangerous: if parser breaks mid-way — catalog has partially filled cards.
For catalogs over 5,000 products use \Bitrix\Iblock\ElementTable::add() — D7 API works faster than old API and supports batch operations.
Keeping Data Current
First import — half the task. Market data updates: characteristics change, photos added, products discontinued.
Update strategies:
- Full reimport — reparse entire catalog, compare with current data, update changes. Fits for catalogs up to 5,000 items.
- Incremental — parse only categories with noticed changes (via RSS feed or update date). Harder to implement, but saves resources.
- On trigger — update specific product on manager request via admin interface.
| Catalog Size | Update Strategy | Frequency | Estimated Time |
|---|---|---|---|
| Up to 1,000 products | Full reimport | Weekly | 2–4 hours |
| 1,000–10,000 | Incremental | Daily | 4–8 hours |
| Over 10,000 | Incremental + trigger | On schedule | 8–24 hours |
Legal Aspect
Parsing Yandex.Market violates service terms. Yandex may block IP, account and theoretically claim damages. In practice claims against parsers rare, but using parsed descriptions and photos "as is" — risk. Recommended rephrase descriptions and check image licenses.







