Setting up proxy rotation for 1С-Bitrix parser
After 200-300 requests from one IP, the source starts returning 429 Too Many Requests or CAPTCHA. This is standard protection: Cloudflare, DataDome, PerimeterX — all track request frequency by IP. The only way to bypass rate-limiting in industrial parsing is proxy rotation. Let's integrate a proxy pool into a Bitrix parser.
Types of proxy and what to choose
- Datacenter (DC) — cheap ($1-3/IP), fast, but easily identified by ASN. Good for sources without serious protection.
- Residential — real provider IPs, cost $5-15/GB traffic. Not identified as proxy. Needed for sources with Cloudflare Enterprise.
- Mobile — mobile operator IPs. Most "clean", but most expensive and slowest.
For most auto-populate tasks in Bitrix, a pool of 20-50 datacenter proxies is enough. Residential — if source actively blocks.
Rotation architecture
Bitrix parser usually uses \Bitrix\Main\Web\HttpClient or cURL directly. Proxy is set via connection options. Task — before each request, select next proxy from pool.
Pool storage — config file or table:
// /local/php_interface/parser/proxy_pool.php
return [
['host' => '185.1.2.3', 'port' => 8080, 'user' => 'u1', 'pass' => 'p1', 'type' => 'http'],
['host' => '185.1.2.4', 'port' => 8080, 'user' => 'u2', 'pass' => 'p2', 'type' => 'socks5'],
// ...
];
Rotator class:
class ProxyRotator
{
private array $pool;
private array $failed = [];
private int $index = 0;
public function next(): ?array
{
$attempts = count($this->pool);
while ($attempts-- > 0) {
$proxy = $this->pool[$this->index % count($this->pool)];
$this->index++;
$key = $proxy['host'] . ':' . $proxy['port'];
if (!isset($this->failed[$key]) || $this->failed[$key] < time()) {
return $proxy;
}
}
return null; // all proxies in cooldown
}
public function markFailed(array $proxy, int $cooldownSec = 300): void
{
$key = $proxy['host'] . ':' . $proxy['port'];
$this->failed[$key] = time() + $cooldownSec;
}
}
Rotation strategies:
- Round-robin — simplest, proxies used in sequence. Works with homogeneous pool.
- Random — random selection. Reduces pattern predictability for anti-bot systems.
- Sticky per source — one proxy pinned to one source domain for N minutes. Imitates real user, reduces block probability.
For catalog parsing, recommend sticky per source with rotation every 50-100 requests or on 429/403.
Integration with Bitrix HttpClient
$proxy = $rotator->next();
$http = new \Bitrix\Main\Web\HttpClient();
$http->setProxy($proxy['host'], $proxy['port'], $proxy['user'], $proxy['pass']);
$http->setTimeout(15);
$http->setStreamTimeout(30);
$response = $http->get($url);
if ($http->getStatus() === 429 || $http->getStatus() === 403) {
$rotator->markFailed($proxy, 600);
// retry with different proxy
}
When using cURL directly — options CURLOPT_PROXY, CURLOPT_PROXYUSERPWD, CURLOPT_PROXYTYPE (CURLPROXY_HTTP or CURLPROXY_SOCKS5).
Monitoring pool health
Proxies die — license expires, IP gets banned, provider shuts down. Need regular health-check. Agent running hourly checks each proxy with request to https://httpbin.org/ip. Result — update status in config (active/dead). Dead proxies auto-excluded from rotation.
Log stats per proxy: successful requests count, 429/403 count, average response time. Identify "slow" proxies and exclude before full death.
Additional measures
- Request delay — 1-5 seconds random pause. Even with proxy rotation, machine-gun request frequency looks suspicious.
- User-Agent rotation — pool of 10-20 actual UA strings, switches with proxy.
-
Referer and headers — send
Accept-Language,Accept-Encoding,Refererfrom previous page. Without them request looks like bot.
What we setup in one day
- Proxy pool file/table.
-
ProxyRotatorclass with round-robin and cooldown for failed. - Integration with parser
HttpClient. - Health-check agent for pool.
- Statistics logging per proxy.







