ETL Process Development for 1C-Bitrix
Standard import via the 1C-Bitrix admin panel works fine for one-time catalog loads. For regular synchronization with external sources — ERP, 1C, warehouse systems, marketplaces — you need full-featured ETL processes with data transformation, error handling, and monitoring. Otherwise, after a month you'll discover that 3% of products have incorrect stock levels, and nobody knows about it.
ETL Architecture for Bitrix
ETL (Extract, Transform, Load) built on top of 1C-Bitrix revolves around several layers:
Extract — retrieving data from the source. Sources include:
- Files (CSV, XML, JSON, YML) — via FTP/SFTP or HTTP
- REST API of external systems (1C, SAP, Salesforce)
- Direct DB access (MySQL, MSSQL, PostgreSQL) via PDO
- Message queues (RabbitMQ, Kafka)
Transform — mapping data to Bitrix structure: field mapping, format normalization, validation.
Load — writing to Bitrix via D7 API or direct SQL queries for high volumes.
Product Loading: Performance
For loading products via the standard Bitrix API, we use \Bitrix\Iblock\ElementTable and CCatalogProduct. For loading 10,000+ products — key settings:
// Disable unnecessary handlers during import
define('STOP_STATISTICS', true);
define('NO_AGENT_STATISTIC', 'Y');
define('DisableEventsCheck', true);
// Disable search index — rebuild at the end
\CSearch::DisableIndex();
// Load an iblock element
$el = new \CIBlockElement();
$result = $el->Add([
'IBLOCK_ID' => CATALOG_IBLOCK_ID,
'NAME' => $item['name'],
'CODE' => $item['code'],
'ACTIVE' => 'Y',
'PROPERTY_VALUES' => [
'VENDOR_CODE' => $item['vendor_code'],
'WEIGHT' => $item['weight'],
],
]);
For volumes > 50,000 elements, direct calls to CIBlockElement::Add degrade due to event cascades and search index updates. Switch to direct INSERTs into b_iblock_element, b_iblock_element_property, b_catalog_product tables, followed by a full index rebuild.
Incremental Synchronization
Full re-write every N hours is expensive. Incremental ETL works only with changed records:
// Record the start time of synchronization
$syncStartTime = new \Bitrix\Main\Type\DateTime();
// Request only items changed since the last sync from the source
$changedItems = $source->getChangedSince($this->getLastSyncTime());
// After successful sync, update the timestamp
$this->setLastSyncTime($syncStartTime);
Table for storing synchronization state:
CREATE TABLE etl_sync_state (
source_name VARCHAR(64) PRIMARY KEY,
last_sync_at TIMESTAMP,
last_sync_status VARCHAR(16), -- success/error/running
records_processed INTEGER,
errors_count INTEGER
);
Data Transformation
Transformation is the hardest part: each source has its own data model. Typical tasks:
Category mapping: the source may have a flat list with a parent_id field, while Bitrix uses a section tree. You need to build the tree, match by code, or create entries while storing the mapping external_id → iblock_section_id.
Price normalization: the source may provide prices with or without VAT, in different currencies. You need to recalculate with exchange rates and store in the correct price type.
HTML cleanup: descriptions from 1C often contain unreadable formatting. Run them through DOMDocument and strip unwanted tags.
Deduplication: if the source doesn't guarantee unique SKUs — you need logic to merge duplicates.
Row-Level Error Handling
The ETL process must not stop because of a single invalid record:
foreach ($items as $item) {
try {
$transformed = $this->transform($item);
$this->load($transformed);
$this->stats->incrementSuccess();
} catch (\Bitrix\Main\ArgumentException $e) {
// Data validation error — log and continue
$this->logger->warning('Validation failed', [
'external_id' => $item['id'],
'error' => $e->getMessage(),
]);
$this->stats->incrementError($item['id'], $e->getMessage());
} catch (\Exception $e) {
// Unexpected error — log and continue
$this->logger->error('Load failed', ['item' => $item['id'], 'error' => $e->getMessage()]);
$this->stats->incrementError($item['id'], $e->getMessage());
}
}
After synchronization — a report: how many were created, updated, skipped with errors. If errors > 5% — alert.
Memory Management for Large Volumes
PHP easily runs out of memory when processing 100,000 records. Rules:
- Read data in chunks, don't load the entire file into an array
- Use generators for iterating over CSV/XML
- Explicitly call
unset()after processing each chunk - Flush the Bitrix ORM cache:
\Bitrix\Main\ORM\Data\DataManager::cleanCache() - Monitor
memory_get_usage()— log when approaching the limit
// Generator for reading a large CSV
function readCsvChunks(string $file, int $chunkSize = 500): \Generator {
$handle = fopen($file, 'r');
$header = fgetcsv($handle);
$chunk = [];
while (($row = fgetcsv($handle)) !== false) {
$chunk[] = array_combine($header, $row);
if (count($chunk) >= $chunkSize) {
yield $chunk;
$chunk = [];
}
}
if ($chunk) yield $chunk;
fclose($handle);
}
Agents vs Cron vs Queue
Bitrix Agents (b_agent) — for small tasks (up to 1,000 records per run). Triggered by site visits, unreliable under low traffic.
Cron — more reliable for regular synchronizations. The script runs independently of traffic:
*/30 * * * * php -f /var/www/bitrix/etl/sync_products.php >> /var/log/etl.log 2>&1
Queue (RabbitMQ/Redis Queue) — for event-driven ETL, when the source publishes change events. Enables processing high-frequency changes without data loss.
ETL Monitoring
| Metric | Source | Alert |
|---|---|---|
| Time of last successful sync | etl_sync_state |
> N hours behind schedule |
| Error rate of records | Sync log | > 5% |
| Sync execution time | Log | Exceeds planned window |
| Record count discrepancy | Compare source vs Bitrix | > 1% |
Development Stages
| Stage | Contents | Timeline |
|---|---|---|
| Source analysis | Data structure, formats, schedule | 3–5 days |
| Extract connectors | Connecting to sources, retrieving data | 1 week |
| Transformation | Mapping, normalization, validation | 1–2 weeks |
| Loading into Bitrix | API or direct SQL, performance optimization | 1–2 weeks |
| Error handling and monitoring | Logging, alerts, reports | 3–5 days |
| Testing | Load tests, edge cases | 1 week |
Total: 6–12 weeks depending on the number of sources and transformation complexity.







