Configuring content protection from 1C-Bitrix parsing

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1177
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Content Scraping Protection Configuration in 1C-Bitrix

Competitors scrape your catalog: they collect prices, descriptions, specifications, and use them for price monitoring or copy them to their own site. Complete protection is impossible — if a human can see the data, a program can too. The goal is to make scraping expensive enough that it loses its economic justification.

Defense in depth

Protection is built in several layers. Each layer catches a different type of scraper.

Layer 1 — nginx rate limiting. The first line of defense, does not load PHP:

# /etc/nginx/conf.d/rate-limit.conf

# Zone for IP-based limiting
limit_req_zone $binary_remote_addr zone=catalog:10m rate=20r/m;
limit_req_zone $binary_remote_addr zone=search:10m  rate=5r/m;

# Apply to catalog pages
location /catalog/ {
    limit_req zone=catalog burst=40 nodelay;
    limit_req_status 429;
    # ... other directives
}

location /search/ {
    limit_req zone=search burst=10 nodelay;
    limit_req_status 429;
}

Layer 2 — User-Agent analysis. In 1C-Bitrix via init.php:

// /local/php_interface/init.php
$blockedAgents = [
    'python-requests', 'scrapy', 'curl/', 'wget/',
    'Go-http-client', 'Java/', 'PhantomJS', 'Headless',
];

$ua = strtolower($_SERVER['HTTP_USER_AGENT'] ?? '');
foreach ($blockedAgents as $bad) {
    if (str_contains($ua, strtolower($bad))) {
        header('HTTP/1.1 403 Forbidden');
        exit;
    }
}

Primitive scrapers drop out immediately. Advanced ones spoof their User-Agent — use the following layers.

Layer 3 — behavioral analysis. Real users do not request 200 catalog pages in 5 minutes. Request counter per IP in Redis:

namespace Local\Security;

class RateLimiter
{
    private const WINDOW   = 300;  // 5 minutes
    private const LIMIT    = 100;  // catalog requests
    private const BAN_TIME = 3600; // ban for one hour

    public static function check(string $ip): bool
    {
        $redis = \Bitrix\Main\Data\Cache::createInstance();
        // Simplified: use 1C-Bitrix cache
        $key    = 'ratelimit_catalog_' . md5($ip);
        $count  = (int)(\Bitrix\Main\Application::getInstance()
                        ->getManagedCache()->get($key) ?? 0);

        if ($count > self::LIMIT) {
            // Log and block
            self::banIp($ip);
            return false;
        }

        \Bitrix\Main\Application::getInstance()->getManagedCache()->set(
            $key,
            $count + 1,
            self::WINDOW
        );

        return true;
    }

    private static function banIp(string $ip): void
    {
        // Add to 1C-Bitrix ban table (b_stop_list)
        \CStopList::Add([
            'SITE_ID'   => SITE_ID,
            'IP_ADDR'   => $ip,
            'ACTIVE'    => 'Y',
            'REASON'    => 'Automatic ban: suspected scraping',
        ]);
    }
}

Layer 4 — CAPTCHA on suspicious requests. When the counter reaches 70% of the limit — display a challenge instead of content. In 1C-Bitrix this integrates via Cloudflare Turnstile or the built-in CAPTCHA mechanism.

Price protection via JavaScript

Prices are not rendered in HTML but are loaded via AJAX after the page renders. Simple HTML scrapers receive the page without prices:

// In the product card template instead of the price:
<span class="product-price js-price-loader" data-product-id="<?= $arResult['ID'] ?>">
    <span class="skeleton">----</span>
</span>
// Load prices after DOMContentLoaded
const priceElements = document.querySelectorAll('.js-price-loader');
if (priceElements.length) {
    const ids = [...priceElements].map(el => el.dataset.productId);

    fetch('/local/ajax/prices.php', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json', 'X-Requested-With': 'XMLHttpRequest' },
        body: JSON.stringify({ ids }),
    })
    .then(r => r.json())
    .then(data => {
        priceElements.forEach(el => {
            const price = data.prices[el.dataset.productId];
            if (price) el.innerHTML = price.formatted;
        });
    });
}

Headless browsers (Playwright, Puppeteer) can bypass this, but require significantly more resources per scrape — the cost grows.

Honeypot links

Hidden links in the HTML, invisible to humans (display: none), but indexed by scrapers:

<a href="/honeypot/trap-page/?ref=bot" style="display:none" aria-hidden="true"><!-- noindex --></a>
// /honeypot/trap-page/index.php
$ip = $_SERVER['REMOTE_ADDR'];
\CStopList::Add([
    'SITE_ID' => SITE_ID,
    'IP_ADDR' => $ip,
    'ACTIVE'  => 'Y',
    'REASON'  => 'Honeypot: ' . $_SERVER['REQUEST_URI'],
]);
// Serve an infinite stream of junk data or 403
header('HTTP/1.1 403 Forbidden');

Image protection via X-Accel-Redirect

Images are served through PHP with validation, while nginx performs the efficient file delivery:

location /protected-uploads/ {
    internal; # not directly accessible from outside
    alias /var/www/upload/;
}
header('X-Accel-Redirect: /protected-uploads/' . $relativePath);
header('Content-Type: image/jpeg');
exit;

What to choose for a specific project

Threat Solution Complexity
Simple curl/wget scraper nginx rate limit + UA filter Low
Browser-emulating scraper Rate limit + behavioral analysis Medium
Headless browser JS price loading + CAPTCHA High
Industrial scraping Cloudflare Bot Management Requires CDN

Rate limiting + UA filter + honeypot cuts off 90% of unwanted scrapers within 1–2 days of work.