Automatically filling the news section from 1C-Bitrix RSS feeds

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Auto-Filling the News Section from RSS Feeds in 1C-Bitrix

A news section without regular publications loses search rankings and audience. RSS aggregation is a quick way to establish a publication stream from relevant sources: industry media, partner press releases, manufacturer news. The task is technically straightforward, but requires a sound architecture to ensure content uniqueness and proper attribution.

Fetching and Parsing RSS

RSS is an XML format with a standard structure. Each entry (<item>) contains title, link, description, pubDate, and author. Atom feeds (<entry>) use different tag names, but the logic is the same.

Parsing via SimpleXML:

$rss = simplexml_load_file($feedUrl);
foreach ($rss->channel->item as $item) {
    $this->processItem([
        'title'   => (string)$item->title,
        'link'    => (string)$item->link,
        'content' => (string)$item->children('content', true)->encoded ?: (string)$item->description,
        'pubDate' => strtotime((string)$item->pubDate),
        'guid'    => (string)$item->guid,
    ]);
}

<content:encoded> contains the full article text (if the source provides it); <description> is typically a summary.

Record Deduplication

The same item may appear in multiple feeds or be republished. Deduplicate by guid (the unique identifier of an RSS entry):

$existing = CIBlockElement::GetList([], [
    'IBLOCK_ID' => NEWS_IBLOCK_ID,
    '=PROPERTY_RSS_GUID' => $item['guid']
])->Fetch();
if ($existing) continue; // already imported

The RSS_GUID property is type S with IS_REQUIRED = N. An alternative for better performance: store processed GUIDs in a separate table or a Redis Set.

Storage in the News Info Block

A standard news info block with additional properties for RSS aggregation:

  • RSS_GUID — entry GUID for deduplication
  • RSS_SOURCE — source ID or name (for attribution)
  • ORIGINAL_URL — link to the original (for the canonical tag and the "source" link)
  • AUTO_IMPORTED — auto-import flag (Y/N) to distinguish from manual publications

The publication date from RSS → ACTIVE_FROM of the element. This is important for correct news sorting.

Content Processing: Cleanup and Rewriting

Publishing RSS content without processing is duplication and invites search engine penalties. Processing options:

Minimum: publish the summary (description) with a "read more" link to the original. This is legitimate aggregation — not duplication.

Intermediate: sanitize HTML (HTMLPurifier), remove internal links to the source, rephrase the introduction and headline.

Full AI rewrite: send content:encoded to GPT with instructions to rewrite in a different style. Expensive for high-frequency feeds, justified for important content.

Configuring Multiple Feeds

Source configuration in a RssSources Highload block:

  • UF_URL — feed URL
  • UF_NAME — source name
  • UF_IBLOCK_ID — target info block for import
  • UF_SECTION_ID — section for imported items
  • UF_ACTIVE — enabled/disabled
  • UF_INTERVAL — polling interval in minutes
  • UF_LAST_CHECK — timestamp of last check
  • UF_PROCESSING — processing type (excerpt / full / ai_rewrite)

Project Timeline

Phase Duration
RSS reader development with RSS 2.0 and Atom support 4–8 hours
Deduplication, storage in info block 4–8 hours
Content processing (HTML sanitization) 4 hours
Admin interface for source management 4–8 hours
Scheduling, monitoring 2–4 hours

Total: 3–5 working days. Adding AI rewriting adds 1–2 days.