Parsing news feeds for automatic filling of 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1626 services

Parsing news feeds for automatic filling of 1C-Bitrix

Medium

~1-2 weeks

Frequently Asked Questions

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

B2B ADVANCE company website development
1298
Website development for FIXPER company
889
Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
638
Development based on 1C Enterprise for MIRSANBEL
788
Website development on CRM Bitrix24 for DOLBIMBY
689
Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
1021

Show more works

Parsing news feeds for auto-filling 1C-Bitrix

A news section on a 1C-Bitrix site that updates once a month is worse than its absence. Search engines see an abandoned resource, users lose trust. Auto-filling via news feed parsing solves the task of regular content updates, but requires proper implementation — otherwise you get duplicates, broken layout, and uniqueness issues.

Data sources

News feeds are available in several formats:

RSS/Atom feeds — standardized XML with title, description, link, date. Supported by most news outlets and blogs. Most reliable source.
News aggregator APIs — NewsAPI, Mediastack, Currents API. Structured JSON, paid rates for commercial use.
HTML pages — parsing source sites directly. Unreliable: layout changes, bot protection, legal risks.

For auto-filling Bitrix sites, RSS feeds are the optimal balance of reliability and simplicity. Start with them.

Architecture of RSS parser

A news parser for Bitrix consists of three layers:

1. Fetcher. Retrieves RSS feeds from a list of URLs. Uses file_get_contents with context or cURL with timeouts. Each feed is parsed via SimpleXMLElement or the SimplePie library.

$xml = simplexml_load_string($rssContent);
foreach ($xml->channel->item as $item) {
    $title = (string)$item->title;
    $link  = (string)$item->link;
    $date  = strtotime((string)$item->pubDate);
    $desc  = (string)$item->description;
}

2. Processor. Cleans HTML tags from descriptions, downloads and saves images, normalizes dates, determines category by keywords or source.

3. Importer. Creates elements in the Bitrix infoblock via CIBlockElement::Add(). Checks for duplicates by XML_ID (usually article URL or GUID from feed).

Storage in infoblock

News in Bitrix is stored in a standard structure infoblock. Recommended mapping:

RSS field	Infoblock field	Type
`title`	`NAME`	String
`link`	`PROPERTY_SOURCE_URL`	Link
`description`	`PREVIEW_TEXT`	HTML/text
`content:encoded`	`DETAIL_TEXT`	HTML
`pubDate`	`ACTIVE_FROM`	Date
`guid` / `link`	`XML_ID`	String (for deduplication)
`category`	`IBLOCK_SECTION_ID`	Section link
`enclosure` / `media:content`	`PREVIEW_PICTURE`	File

XML_ID — mandatory field. Without it, the parser creates duplicates on each run. Use md5 hash of article URL as XML_ID — this guarantees uniqueness even if GUID changes in the feed.

Content processing

Raw HTML from RSS is unsuitable for publication. Typical problems:

External images — image links point to source site. If it becomes unavailable, images disappear. Solution: download images to /upload/ during import.
Third-party scripts and iframes — feeds may contain widgets, counters, embedded videos. Use strip_tags() with whitelist of allowed tags or the HTMLPurifier library.
Relative links — links like /article/123 without domain. Convert to absolute by adding source domain.
Encoding — feeds may arrive in UTF-8, Windows-1251, ISO-8859-1. Detect encoding via mb_detect_encoding() and convert to UTF-8.

Scheduling and cron

Parser runs via cron. Frequency depends on news type:

Breaking news (news agencies) — every 15–30 minutes.
Industry news — every 1–2 hours.
Blogs and analytics — 1–2 times daily.

Cron task calls PHP script that includes Bitrix core:

$_SERVER['DOCUMENT_ROOT'] = '/home/bitrix/www';
require $_SERVER['DOCUMENT_ROOT'] . '/bitrix/modules/main/include/prolog_before.php';
CModule::IncludeModule('iblock');

Alternative — Bitrix agent (b_agent), but for long operations cron is more reliable: agents have execution time limits and block each other.

Deduplication and quality control

In addition to checking XML_ID, recommend:

Date filter — don't import news older than N days. Otherwise, when connecting a new feed, catalog fills with outdated content.
Minimum length — discard entries with description shorter than 100 characters. This removes technical entries and announcements without content.
Stop-words — filter news by keywords irrelevant to site topic.
Source limit — no more than N news per day from one feed, so one active source doesn't push out the rest.

Automatic categorization

Simplest approach — "source → infoblock section" mapping. All TechCrunch news goes to "Technology" section, RBK — to "Economics".

More flexible approach — classification by keywords in title and text. Rules array like:

$rules = [
    'Technology' => ['AI', 'blockchain', 'startup', 'app'],
    'Finance'    => ['stocks', 'rate', 'investment', 'IPO'],
];

For 10+ categories and serious accuracy requirements — connect external classifier (OpenAI API, Yandex GPT) or trained model.

Legal aspects

Publishing others' news "as is" violates copyright. Acceptable options:

Publish title + first 2–3 sentences with source link (fair dealing citation).
Automatic rewrite via LLM (GPT, YandexGPT) — legally questionable, but used in practice.
Use feeds with open license (Creative Commons, government sources).

1C Bitrix presentation 1C Bitrix24 presentation 1C Enterprise presentation