Development of ETL processes for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1175
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

ETL Process Development for 1C-Bitrix

Standard import via the 1C-Bitrix admin panel works fine for one-time catalog loads. For regular synchronization with external sources — ERP, 1C, warehouse systems, marketplaces — you need full-featured ETL processes with data transformation, error handling, and monitoring. Otherwise, after a month you'll discover that 3% of products have incorrect stock levels, and nobody knows about it.

ETL Architecture for Bitrix

ETL (Extract, Transform, Load) built on top of 1C-Bitrix revolves around several layers:

Extract — retrieving data from the source. Sources include:

  • Files (CSV, XML, JSON, YML) — via FTP/SFTP or HTTP
  • REST API of external systems (1C, SAP, Salesforce)
  • Direct DB access (MySQL, MSSQL, PostgreSQL) via PDO
  • Message queues (RabbitMQ, Kafka)

Transform — mapping data to Bitrix structure: field mapping, format normalization, validation.

Load — writing to Bitrix via D7 API or direct SQL queries for high volumes.

Product Loading: Performance

For loading products via the standard Bitrix API, we use \Bitrix\Iblock\ElementTable and CCatalogProduct. For loading 10,000+ products — key settings:

// Disable unnecessary handlers during import
define('STOP_STATISTICS', true);
define('NO_AGENT_STATISTIC', 'Y');
define('DisableEventsCheck', true);

// Disable search index — rebuild at the end
\CSearch::DisableIndex();

// Load an iblock element
$el = new \CIBlockElement();
$result = $el->Add([
    'IBLOCK_ID' => CATALOG_IBLOCK_ID,
    'NAME' => $item['name'],
    'CODE' => $item['code'],
    'ACTIVE' => 'Y',
    'PROPERTY_VALUES' => [
        'VENDOR_CODE' => $item['vendor_code'],
        'WEIGHT' => $item['weight'],
    ],
]);

For volumes > 50,000 elements, direct calls to CIBlockElement::Add degrade due to event cascades and search index updates. Switch to direct INSERTs into b_iblock_element, b_iblock_element_property, b_catalog_product tables, followed by a full index rebuild.

Incremental Synchronization

Full re-write every N hours is expensive. Incremental ETL works only with changed records:

// Record the start time of synchronization
$syncStartTime = new \Bitrix\Main\Type\DateTime();

// Request only items changed since the last sync from the source
$changedItems = $source->getChangedSince($this->getLastSyncTime());

// After successful sync, update the timestamp
$this->setLastSyncTime($syncStartTime);

Table for storing synchronization state:

CREATE TABLE etl_sync_state (
    source_name VARCHAR(64) PRIMARY KEY,
    last_sync_at TIMESTAMP,
    last_sync_status VARCHAR(16), -- success/error/running
    records_processed INTEGER,
    errors_count INTEGER
);

Data Transformation

Transformation is the hardest part: each source has its own data model. Typical tasks:

Category mapping: the source may have a flat list with a parent_id field, while Bitrix uses a section tree. You need to build the tree, match by code, or create entries while storing the mapping external_id → iblock_section_id.

Price normalization: the source may provide prices with or without VAT, in different currencies. You need to recalculate with exchange rates and store in the correct price type.

HTML cleanup: descriptions from 1C often contain unreadable formatting. Run them through DOMDocument and strip unwanted tags.

Deduplication: if the source doesn't guarantee unique SKUs — you need logic to merge duplicates.

Row-Level Error Handling

The ETL process must not stop because of a single invalid record:

foreach ($items as $item) {
    try {
        $transformed = $this->transform($item);
        $this->load($transformed);
        $this->stats->incrementSuccess();
    } catch (\Bitrix\Main\ArgumentException $e) {
        // Data validation error — log and continue
        $this->logger->warning('Validation failed', [
            'external_id' => $item['id'],
            'error' => $e->getMessage(),
        ]);
        $this->stats->incrementError($item['id'], $e->getMessage());
    } catch (\Exception $e) {
        // Unexpected error — log and continue
        $this->logger->error('Load failed', ['item' => $item['id'], 'error' => $e->getMessage()]);
        $this->stats->incrementError($item['id'], $e->getMessage());
    }
}

After synchronization — a report: how many were created, updated, skipped with errors. If errors > 5% — alert.

Memory Management for Large Volumes

PHP easily runs out of memory when processing 100,000 records. Rules:

  • Read data in chunks, don't load the entire file into an array
  • Use generators for iterating over CSV/XML
  • Explicitly call unset() after processing each chunk
  • Flush the Bitrix ORM cache: \Bitrix\Main\ORM\Data\DataManager::cleanCache()
  • Monitor memory_get_usage() — log when approaching the limit
// Generator for reading a large CSV
function readCsvChunks(string $file, int $chunkSize = 500): \Generator {
    $handle = fopen($file, 'r');
    $header = fgetcsv($handle);
    $chunk = [];
    while (($row = fgetcsv($handle)) !== false) {
        $chunk[] = array_combine($header, $row);
        if (count($chunk) >= $chunkSize) {
            yield $chunk;
            $chunk = [];
        }
    }
    if ($chunk) yield $chunk;
    fclose($handle);
}

Agents vs Cron vs Queue

Bitrix Agents (b_agent) — for small tasks (up to 1,000 records per run). Triggered by site visits, unreliable under low traffic.

Cron — more reliable for regular synchronizations. The script runs independently of traffic:

*/30 * * * * php -f /var/www/bitrix/etl/sync_products.php >> /var/log/etl.log 2>&1

Queue (RabbitMQ/Redis Queue) — for event-driven ETL, when the source publishes change events. Enables processing high-frequency changes without data loss.

ETL Monitoring

Metric Source Alert
Time of last successful sync etl_sync_state > N hours behind schedule
Error rate of records Sync log > 5%
Sync execution time Log Exceeds planned window
Record count discrepancy Compare source vs Bitrix > 1%

Development Stages

Stage Contents Timeline
Source analysis Data structure, formats, schedule 3–5 days
Extract connectors Connecting to sources, retrieving data 1 week
Transformation Mapping, normalization, validation 1–2 weeks
Loading into Bitrix API or direct SQL, performance optimization 1–2 weeks
Error handling and monitoring Logging, alerts, reports 3–5 days
Testing Load tests, edge cases 1 week

Total: 6–12 weeks depending on the number of sources and transformation complexity.