Parsing data from Yandex.Market for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Data Parsing from Yandex.Market for 1C-Bitrix

Yandex.Market doesn't provide public API for bulk product export. Partner API (Content API) gives access only to sellers for own data. This means to fill 1C-Bitrix catalog with Market data, only parsing remains — with its technical limitations, legal risks, and engineering challenges.

What Exactly Gets Parsed

Product card on Yandex.Market contains:

  • Title and description — text, often generated from characteristics.
  • Characteristics — structured key-value pairs (weight, sizes, material).
  • Prices and seller offers — dynamic data, updated in real time.
  • Images — from 1 to 15 photos of different resolution.
  • Reviews and rating — user content.
  • Category — rubric tree in Market, not matching your catalog structure.

For catalog filling 1C-Bitrix usually need title, description, characteristics and images. Parsing prices makes little sense — change several times daily.

Technical Implementation

Yandex.Market — single-page application. HTML page contains minimum markup, main data loads via internal API calls and renders on client. This means regular HTTP request via cURL returns empty shell.

Two parsing approaches:

1. Headless browser (Puppeteer, Playwright). Renders JavaScript, waits for data load, extracts DOM. Reliable, but slow — 3–5 seconds per page. For 10,000 product catalog this is 8–14 hours continuous parsing.

2. Internal API interception. Market loads data via XHR requests to internal endpoints. If reproduce these requests with needed headers and cookies — get JSON without page rendering. 10–20x faster, but response format changes without warning.

In practice combination used: headless browser for initial analysis and session token retrieval, direct API requests — for bulk export.

Protection Bypass

Yandex actively blocks automated requests:

  • SmartCaptcha — appears after 50–200 requests from one IP.
  • Fingerprinting — analyzes TLS fingerprint, headers, behavioral patterns.
  • Rate limiting — strict limits on request frequency.

For stable parsing need:

  • Proxy server rotation (residential proxies, not datacenter).
  • Request delay randomization (2–10 seconds).
  • User-Agent and other header rotation.
  • Captcha handling — via recognition services or manual queue.

Without proxy rotation Market parsing doesn't work. Single IP blocked within hour.

Data Mapping to Bitrix Infoblock

Market data structure doesn't match your catalog structure. Need transformation layer:

Yandex.Market Bitrix Infoblock Note
title NAME Trim to 255 characters
description DETAIL_TEXT HTML → clean tags or keep
specs[] PROPERTY_* Mapping by characteristic name
images[] DETAIL_PICTURE + MORE_PHOTO Download and save locally
categoryPath IBLOCK_SECTION_ID Map via correspondence table
modelId XML_ID Unique ID for deduplication

Market characteristics — flat list. Infoblock properties — typed fields. Need mapping table: "Weight, g" → PROPERTY_WEIGHT (type: number), "Color" → PROPERTY_COLOR (type: list, value search).

Loading into Bitrix

Recommended path — intermediate storage. Parser stores data in separate table or JSON files. Separate script reads intermediate data and imports via infoblock API:

CIBlockElement::Add($arFields);
CIBlockElement::SetPropertyValuesEx($elementId, $iblockId, $propertyValues);

Direct import from parser dangerous: if parser breaks mid-way — catalog has partially filled cards.

For catalogs over 5,000 products use \Bitrix\Iblock\ElementTable::add() — D7 API works faster than old API and supports batch operations.

Keeping Data Current

First import — half the task. Market data updates: characteristics change, photos added, products discontinued.

Update strategies:

  • Full reimport — reparse entire catalog, compare with current data, update changes. Fits for catalogs up to 5,000 items.
  • Incremental — parse only categories with noticed changes (via RSS feed or update date). Harder to implement, but saves resources.
  • On trigger — update specific product on manager request via admin interface.
Catalog Size Update Strategy Frequency Estimated Time
Up to 1,000 products Full reimport Weekly 2–4 hours
1,000–10,000 Incremental Daily 4–8 hours
Over 10,000 Incremental + trigger On schedule 8–24 hours

Legal Aspect

Parsing Yandex.Market violates service terms. Yandex may block IP, account and theoretically claim damages. In practice claims against parsers rare, but using parsed descriptions and photos "as is" — risk. Recommended rephrase descriptions and check image licenses.