Parsing product descriptions for 1C-Bitrix content

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Parsing Product Descriptions to Populate 1C-Bitrix

Empty or duplicated descriptions mean zero SEO value and poor conversion. But writing copy manually for 5,000 products is not feasible. Parsing descriptions from manufacturer or distributor websites is a practical way to populate a catalog quickly — provided the process is well organized and content duplication is avoided.

Sources for descriptions

The choice of source determines both quality and legal risk:

Manufacturer's website — the most relevant content, but often protected against scraping and may contain copyright. Use as a basis for rewriting, not as the final text.

Aggregators (Yandex Market, OZON, Wildberries) — large volume of descriptions, standardized format. The same legal caveats apply.

Authorized distributors — more permissive about content reuse, often actively interested in its distribution.

Manufacturer databases (Icecat, Synccentric) — a legal option with an API, paid, but provides clean, licensed data.

Extracting and cleaning text

The description on the source site typically lives inside <div class="description"> or a similar container. Extract it via DomCrawler:

$description = $crawler->filter('.product-description')->text();

After extraction, mandatory cleaning steps:

  • strip_tags() to remove HTML (or filter allowed tags)
  • Remove promotional inserts like "Buy in our store"
  • Normalize whitespace and line breaks
  • Remove competitor brand mentions

If HTML formatting must be preserved (bold, lists), use Symfony\Component\DomCrawler\Crawler::html() and filter through HTMLPurifier.

Writing to 1C-Bitrix fields

1C-Bitrix separates descriptions into two fields:

  • PREVIEW_TEXT — short description (for listings)
  • DETAIL_TEXT — full description (for the product card)

When parsing a long description: first paragraph → PREVIEW_TEXT, full text → DETAIL_TEXT. The text type is set via PREVIEW_TEXT_TYPE and DETAIL_TEXT_TYPE (values: text or html).

Updating an element:

$el = new CIBlockElement();
$el->Update($elementId, [
    'PREVIEW_TEXT' => $shortDesc,
    'PREVIEW_TEXT_TYPE' => 'html',
    'DETAIL_TEXT' => $fullDesc,
    'DETAIL_TEXT_TYPE' => 'html',
]);

Handling already populated cards

Do not overwrite blindly — managers may have manually improved descriptions. Add the following logic:

  1. If DETAIL_TEXT is empty — write the parsed text
  2. If it is populated — set the DESCRIPTION_SOURCE = parsed property only on initial population
  3. On subsequent parser runs — skip elements without the DESCRIPTION_SOURCE flag (meaning the text was edited manually)

Work timeline

Phase Duration
Analyzing sources, defining extraction structure 2–4 hours
Developing the description parser 1–2 days
Text cleaning and normalization logic 4–8 hours
Infoblock integration, protecting manual edits 4–6 hours
Testing on 100–200 items 4 hours

Total: 4–6 working days. If AI-based rewriting via GPT API is needed after parsing, add 1–2 more days for integration.