Setting up anti-parsing protection bypass for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1177
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Configuring Bypass of Anti-Parsing Protection for 1C-Bitrix

A data source updated its protection — and the parser that had been working for months stopped retrieving content. Instead of HTML with prices, the page returns a CAPTCHA, JavaScript challenge, or empty body. This is the reality of industrial parsing: defense systems evolve, and parsers must adapt. Let's review the main types of protections and technical approaches to handling them.

Types of Protections and Their Indicators

JavaScript Challenge (Cloudflare, DataDome). The server returns HTTP 503 with JS code that must execute in the browser and set a cookie cf_clearance or datadome. Indicator: body contains <noscript> and window._cf_chl_opt or similar obfuscated script.

Rate Limiting. HTTP 429 or 403 after N requests per period. Can be by IP, by cookie-session, or by fingerprint. Indicator: requests work the first few minutes, then get blocked.

Browser Fingerprinting. The server checks TLS fingerprint (JA3), HTTP header order, presence of JavaScript API (navigator, canvas). Regular cURL with default settings has a characteristic JA3 that differs from browser fingerprints.

Honeypot Links. Links hidden via CSS (display:none, visibility:hidden) that only bots click. Navigating to such a link results in instant IP ban.

Headless Browser for JavaScript Challenge

When the source requires JS execution, Bitrix HttpClient is powerless — it doesn't execute JavaScript. Solution — headless browser.

Puppeteer / Playwright run as a separate service (Node.js), with the Bitrix parser calling it via HTTP API. Scheme:

  1. PHP-parser sends URL to internal service: http://localhost:3000/render?url=...
  2. Node.js-service opens the page in Chromium, waits for JS execution, retrieves cookies and rendered HTML.
  3. Returns HTML and cookies to PHP.
  4. PHP-parser uses the obtained cookies for subsequent requests via regular HttpClient — JS Challenge provides a cookie for 15-30 minutes.

This avoids running each request through a browser (slow and resource-intensive) and instead gets a "pass" once and uses it for a series of regular HTTP requests.

Important: headless browser must be masked. Standard Puppeteer is detected by navigator.webdriver = true, absence of plugins, characteristic window sizes. Use puppeteer-extra with stealth plugin or equivalent for Playwright.

TLS Fingerprint Rotation

To bypass fingerprinting, it's not enough to rotate IP. You must rotate TLS fingerprint. In PHP/cURL this is done via options:

  • CURLOPT_SSLVERSION — sets TLS version.
  • CURLOPT_SSL_CIPHER_LIST — sets cipher order, forming JA3.

The curl-impersonate library (cURL fork) allows emulating TLS fingerprints of specific browsers (Chrome, Firefox, Safari). Installed on the server as a replacement for standard cURL.

CAPTCHA Handling

If the source shows CAPTCHA, options are:

  • Recognition Service (2Captcha, Anti-Captcha) — parser sends image, receives answer via API, submits in form. Cost: $2-3 per 1000 solutions. Delay: 10-30 seconds.
  • Reduce Frequency — often CAPTCHA appears as a reaction to rate limiting. Reducing request frequency and rotating proxies may eliminate CAPTCHA entirely.

Integration with 2Captcha from PHP-parser:

$taskId = file_get_contents("http://2captcha.com/in.php?key={$apiKey}&method=base64&body=" . base64_encode($captchaImage));
// Waiting for solution (polling)
$result = file_get_contents("http://2captcha.com/res.php?key={$apiKey}&action=get&id={$taskId}");

Honeypot Protection

Before following a link, check computed styles of the element: display, visibility, opacity, position (outside viewport). If parser works via DOM (DOMDocument in PHP), check inline-styles and classes. If via headless-browser — use getComputedStyle() to verify visibility.

What We Configure in One Day

  1. Diagnosis of protection type on specific source.
  2. Setup of headless-renderer (if JS Challenge) or header rotation (if fingerprinting).
  3. Integration with Bitrix parser — retrieval of cookies/HTML.
  4. Testing on real source, fine-tuning delays.
  5. Documentation of protection behavior for further support.