Content Scraping Protection Configuration in 1C-Bitrix
Competitors scrape your catalog: they collect prices, descriptions, specifications, and use them for price monitoring or copy them to their own site. Complete protection is impossible — if a human can see the data, a program can too. The goal is to make scraping expensive enough that it loses its economic justification.
Defense in depth
Protection is built in several layers. Each layer catches a different type of scraper.
Layer 1 — nginx rate limiting. The first line of defense, does not load PHP:
# /etc/nginx/conf.d/rate-limit.conf
# Zone for IP-based limiting
limit_req_zone $binary_remote_addr zone=catalog:10m rate=20r/m;
limit_req_zone $binary_remote_addr zone=search:10m rate=5r/m;
# Apply to catalog pages
location /catalog/ {
limit_req zone=catalog burst=40 nodelay;
limit_req_status 429;
# ... other directives
}
location /search/ {
limit_req zone=search burst=10 nodelay;
limit_req_status 429;
}
Layer 2 — User-Agent analysis. In 1C-Bitrix via init.php:
// /local/php_interface/init.php
$blockedAgents = [
'python-requests', 'scrapy', 'curl/', 'wget/',
'Go-http-client', 'Java/', 'PhantomJS', 'Headless',
];
$ua = strtolower($_SERVER['HTTP_USER_AGENT'] ?? '');
foreach ($blockedAgents as $bad) {
if (str_contains($ua, strtolower($bad))) {
header('HTTP/1.1 403 Forbidden');
exit;
}
}
Primitive scrapers drop out immediately. Advanced ones spoof their User-Agent — use the following layers.
Layer 3 — behavioral analysis. Real users do not request 200 catalog pages in 5 minutes. Request counter per IP in Redis:
namespace Local\Security;
class RateLimiter
{
private const WINDOW = 300; // 5 minutes
private const LIMIT = 100; // catalog requests
private const BAN_TIME = 3600; // ban for one hour
public static function check(string $ip): bool
{
$redis = \Bitrix\Main\Data\Cache::createInstance();
// Simplified: use 1C-Bitrix cache
$key = 'ratelimit_catalog_' . md5($ip);
$count = (int)(\Bitrix\Main\Application::getInstance()
->getManagedCache()->get($key) ?? 0);
if ($count > self::LIMIT) {
// Log and block
self::banIp($ip);
return false;
}
\Bitrix\Main\Application::getInstance()->getManagedCache()->set(
$key,
$count + 1,
self::WINDOW
);
return true;
}
private static function banIp(string $ip): void
{
// Add to 1C-Bitrix ban table (b_stop_list)
\CStopList::Add([
'SITE_ID' => SITE_ID,
'IP_ADDR' => $ip,
'ACTIVE' => 'Y',
'REASON' => 'Automatic ban: suspected scraping',
]);
}
}
Layer 4 — CAPTCHA on suspicious requests. When the counter reaches 70% of the limit — display a challenge instead of content. In 1C-Bitrix this integrates via Cloudflare Turnstile or the built-in CAPTCHA mechanism.
Price protection via JavaScript
Prices are not rendered in HTML but are loaded via AJAX after the page renders. Simple HTML scrapers receive the page without prices:
// In the product card template instead of the price:
<span class="product-price js-price-loader" data-product-id="<?= $arResult['ID'] ?>">
<span class="skeleton">----</span>
</span>
// Load prices after DOMContentLoaded
const priceElements = document.querySelectorAll('.js-price-loader');
if (priceElements.length) {
const ids = [...priceElements].map(el => el.dataset.productId);
fetch('/local/ajax/prices.php', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-Requested-With': 'XMLHttpRequest' },
body: JSON.stringify({ ids }),
})
.then(r => r.json())
.then(data => {
priceElements.forEach(el => {
const price = data.prices[el.dataset.productId];
if (price) el.innerHTML = price.formatted;
});
});
}
Headless browsers (Playwright, Puppeteer) can bypass this, but require significantly more resources per scrape — the cost grows.
Honeypot links
Hidden links in the HTML, invisible to humans (display: none), but indexed by scrapers:
<a href="/honeypot/trap-page/?ref=bot" style="display:none" aria-hidden="true"><!-- noindex --></a>
// /honeypot/trap-page/index.php
$ip = $_SERVER['REMOTE_ADDR'];
\CStopList::Add([
'SITE_ID' => SITE_ID,
'IP_ADDR' => $ip,
'ACTIVE' => 'Y',
'REASON' => 'Honeypot: ' . $_SERVER['REQUEST_URI'],
]);
// Serve an infinite stream of junk data or 403
header('HTTP/1.1 403 Forbidden');
Image protection via X-Accel-Redirect
Images are served through PHP with validation, while nginx performs the efficient file delivery:
location /protected-uploads/ {
internal; # not directly accessible from outside
alias /var/www/upload/;
}
header('X-Accel-Redirect: /protected-uploads/' . $relativePath);
header('Content-Type: image/jpeg');
exit;
What to choose for a specific project
| Threat | Solution | Complexity |
|---|---|---|
| Simple curl/wget scraper | nginx rate limit + UA filter | Low |
| Browser-emulating scraper | Rate limit + behavioral analysis | Medium |
| Headless browser | JS price loading + CAPTCHA | High |
| Industrial scraping | Cloudflare Bot Management | Requires CDN |
Rate limiting + UA filter + honeypot cuts off 90% of unwanted scrapers within 1–2 days of work.







