Parsing Product Descriptions to Populate 1C-Bitrix
Empty or duplicated descriptions mean zero SEO value and poor conversion. But writing copy manually for 5,000 products is not feasible. Parsing descriptions from manufacturer or distributor websites is a practical way to populate a catalog quickly — provided the process is well organized and content duplication is avoided.
Sources for descriptions
The choice of source determines both quality and legal risk:
Manufacturer's website — the most relevant content, but often protected against scraping and may contain copyright. Use as a basis for rewriting, not as the final text.
Aggregators (Yandex Market, OZON, Wildberries) — large volume of descriptions, standardized format. The same legal caveats apply.
Authorized distributors — more permissive about content reuse, often actively interested in its distribution.
Manufacturer databases (Icecat, Synccentric) — a legal option with an API, paid, but provides clean, licensed data.
Extracting and cleaning text
The description on the source site typically lives inside <div class="description"> or a similar container. Extract it via DomCrawler:
$description = $crawler->filter('.product-description')->text();
After extraction, mandatory cleaning steps:
-
strip_tags()to remove HTML (or filter allowed tags) - Remove promotional inserts like "Buy in our store"
- Normalize whitespace and line breaks
- Remove competitor brand mentions
If HTML formatting must be preserved (bold, lists), use Symfony\Component\DomCrawler\Crawler::html() and filter through HTMLPurifier.
Writing to 1C-Bitrix fields
1C-Bitrix separates descriptions into two fields:
-
PREVIEW_TEXT— short description (for listings) -
DETAIL_TEXT— full description (for the product card)
When parsing a long description: first paragraph → PREVIEW_TEXT, full text → DETAIL_TEXT. The text type is set via PREVIEW_TEXT_TYPE and DETAIL_TEXT_TYPE (values: text or html).
Updating an element:
$el = new CIBlockElement();
$el->Update($elementId, [
'PREVIEW_TEXT' => $shortDesc,
'PREVIEW_TEXT_TYPE' => 'html',
'DETAIL_TEXT' => $fullDesc,
'DETAIL_TEXT_TYPE' => 'html',
]);
Handling already populated cards
Do not overwrite blindly — managers may have manually improved descriptions. Add the following logic:
- If
DETAIL_TEXTis empty — write the parsed text - If it is populated — set the
DESCRIPTION_SOURCE = parsedproperty only on initial population - On subsequent parser runs — skip elements without the
DESCRIPTION_SOURCEflag (meaning the text was edited manually)
Work timeline
| Phase | Duration |
|---|---|
| Analyzing sources, defining extraction structure | 2–4 hours |
| Developing the description parser | 1–2 days |
| Text cleaning and normalization logic | 4–8 hours |
| Infoblock integration, protecting manual edits | 4–6 hours |
| Testing on 100–200 items | 4 hours |
Total: 4–6 working days. If AI-based rewriting via GPT API is needed after parsing, add 1–2 more days for integration.







