Auto-Filling the News Section from RSS Feeds in 1C-Bitrix
A news section without regular publications loses search rankings and audience. RSS aggregation is a quick way to establish a publication stream from relevant sources: industry media, partner press releases, manufacturer news. The task is technically straightforward, but requires a sound architecture to ensure content uniqueness and proper attribution.
Fetching and Parsing RSS
RSS is an XML format with a standard structure. Each entry (<item>) contains title, link, description, pubDate, and author. Atom feeds (<entry>) use different tag names, but the logic is the same.
Parsing via SimpleXML:
$rss = simplexml_load_file($feedUrl);
foreach ($rss->channel->item as $item) {
$this->processItem([
'title' => (string)$item->title,
'link' => (string)$item->link,
'content' => (string)$item->children('content', true)->encoded ?: (string)$item->description,
'pubDate' => strtotime((string)$item->pubDate),
'guid' => (string)$item->guid,
]);
}
<content:encoded> contains the full article text (if the source provides it); <description> is typically a summary.
Record Deduplication
The same item may appear in multiple feeds or be republished. Deduplicate by guid (the unique identifier of an RSS entry):
$existing = CIBlockElement::GetList([], [
'IBLOCK_ID' => NEWS_IBLOCK_ID,
'=PROPERTY_RSS_GUID' => $item['guid']
])->Fetch();
if ($existing) continue; // already imported
The RSS_GUID property is type S with IS_REQUIRED = N. An alternative for better performance: store processed GUIDs in a separate table or a Redis Set.
Storage in the News Info Block
A standard news info block with additional properties for RSS aggregation:
-
RSS_GUID— entry GUID for deduplication -
RSS_SOURCE— source ID or name (for attribution) -
ORIGINAL_URL— link to the original (for the canonical tag and the "source" link) -
AUTO_IMPORTED— auto-import flag (Y/N) to distinguish from manual publications
The publication date from RSS → ACTIVE_FROM of the element. This is important for correct news sorting.
Content Processing: Cleanup and Rewriting
Publishing RSS content without processing is duplication and invites search engine penalties. Processing options:
Minimum: publish the summary (description) with a "read more" link to the original. This is legitimate aggregation — not duplication.
Intermediate: sanitize HTML (HTMLPurifier), remove internal links to the source, rephrase the introduction and headline.
Full AI rewrite: send content:encoded to GPT with instructions to rewrite in a different style. Expensive for high-frequency feeds, justified for important content.
Configuring Multiple Feeds
Source configuration in a RssSources Highload block:
-
UF_URL— feed URL -
UF_NAME— source name -
UF_IBLOCK_ID— target info block for import -
UF_SECTION_ID— section for imported items -
UF_ACTIVE— enabled/disabled -
UF_INTERVAL— polling interval in minutes -
UF_LAST_CHECK— timestamp of last check -
UF_PROCESSING— processing type (excerpt/full/ai_rewrite)
Project Timeline
| Phase | Duration |
|---|---|
| RSS reader development with RSS 2.0 and Atom support | 4–8 hours |
| Deduplication, storage in info block | 4–8 hours |
| Content processing (HTML sanitization) | 4 hours |
| Admin interface for source management | 4–8 hours |
| Scheduling, monitoring | 2–4 hours |
Total: 3–5 working days. Adding AI rewriting adds 1–2 days.







