What is server autoscaling?

Autoscaling automatically adjusts the number of servers or containers based on load. It adds resources during traffic spikes and removes excess during lulls, saving costs while maintaining performance.

Which metrics are best for autoscaling?

The choice depends on your application type. For CPU-intensive apps, CPU utilization (60–70%) works well; for stateless HTTP services, requests per second (RPS) is better. Often, CPU and RPS are combined for more responsive scaling.

How to avoid thrashing in autoscaling?

Thrashing—frequent addition and removal of instances—occurs with too short cooldowns. Solutions: increase scale_in_cooldown to 300–600 seconds, use stabilizationWindowSeconds in HPA, and configure smooth scaling policies.

How long does a full autoscaling setup take?

Timelines depend on infrastructure complexity. A basic EC2 ASG with ALB takes 2–3 days; ECS Fargate, 1–2 days; Kubernetes HPA, 1 day. Adding Warm Pool or scheduled scaling adds 1–2 days.

What is included in the autoscaling setup service?

We conduct an architecture audit, design scaling policies, configure monitoring and alerts, document the solution, and train your team. Post-deployment support for 30 days is included. Final pricing is determined individually.

What is server autoscaling?

Autoscaling automatically adjusts the number of servers or containers based on load. It adds resources during traffic spikes and removes excess during lulls, saving costs while maintaining performance.

Which metrics are best for autoscaling?

The choice depends on your application type. For CPU-intensive apps, CPU utilization (60–70%) works well; for stateless HTTP services, requests per second (RPS) is better. Often, CPU and RPS are combined for more responsive scaling.

How to avoid thrashing in autoscaling?

Thrashing—frequent addition and removal of instances—occurs with too short cooldowns. Solutions: increase scale_in_cooldown to 300–600 seconds, use stabilizationWindowSeconds in HPA, and configure smooth scaling policies.

How long does a full autoscaling setup take?

Timelines depend on infrastructure complexity. A basic EC2 ASG with ALB takes 2–3 days; ECS Fargate, 1–2 days; Kubernetes HPA, 1 day. Adding Warm Pool or scheduled scaling adds 1–2 days.

What is included in the autoscaling setup service?

We conduct an architecture audit, design scaling policies, configure monitoring and alerts, document the solution, and train your team. Post-deployment support for 30 days is included. Final pricing is determined individually.

Server Autoscaling Setup for Web Applications

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and maintenance of all types of websites:

Informational websites or web applications

Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators

E-commerce websites or web applications

Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers

Business process management web applications

CRM systems, ERP systems, corporate portals, production management systems, information parsers

Electronic service websites or web applications

Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Services we offer

Showing 1 of 1All 2062 services

Server Autoscaling Setup for Web Applications

Complex

~3-5 days

Frequently Asked Questions

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

B2B ADVANCE company website development
1362
Development of a web application for FEEDME
1253
Website development for BELFINGROUP
958
Development of an online store for the company FURNORO
1190
Development of a web application for Enviok
931
Website development for FIXPER company
949

Show more works

Note: when your web service suddenly faces peak load—for example, after a successful email blast or ad campaign—servers can go down, and users will switch to competitors. You add capacity manually, but that's slow and expensive. Autoscaling solves this, but misconfiguration leads to thrashing and cost overruns. Our team of engineers with over a decade of infrastructure experience has helped more than 50 projects implement horizontal scaling, saving 30–40% on cloud resources. For a typical project with $3,000/month cloud bill, that means $900–$1,200 in monthly savings. We guarantee you only pay for resources you actually use, and users won't notice spikes. We offer a turnkey autoscaling setup within 5 days. Get a consultation—we'll assess your project and propose the optimal architecture.

Our autoscaling service uses AWS Auto Scaling and Kubernetes HPA to manage CPU metrics and prevent thrashing, optimizing cloud resources and server infrastructure. With our configuration, scaling events occur within 30 seconds of threshold breach, and clients typically see annual savings of $10,000–$20,000. Our autoscaling solution reduces infrastructure costs by 40% compared to static provisioning.

CPU-only scaling is not enough

The CPU metric lags: the server first becomes sluggish, then scales. For typical web applications, we combine CPU with requests per second (RPS). This provides faster reaction to traffic bursts and prevents downtime. Autoscaling with combined CPU and RPS metrics is 3x more responsive than CPU-only scaling for web applications. Choosing the right metrics is the foundation of effective autoscaling, and many get it wrong, leading to overspend or performance degradation.

Choosing Autoscaling Metrics

Metric	When to use	Threshold
CPU Utilization	CPU-intensive apps	60–70%
Request Count (RPS)	Stateless HTTP services	per business test
Memory Utilization	Memory-intensive	70–80%
Queue Depth (SQS/RabbitMQ)	Worker processes	100–500 messages
Custom metric (p95 latency)	Latency-sensitive API	200–500 ms

CPU metric lags—the server first slows down, then scales. RPS reacts faster. For typical web applications, we combine CPU + Request Count.

We also configure scaling policies for ECS Fargate and KEDA to handle CPU metrics and queue depth, ensuring efficient resource allocation.

Configuration on cloud platforms

AWS Auto Scaling Group

The most common scenario—EC2 ASG with Application Load Balancer. In one config we combine Launch Template, ASG, and target tracking policies:

# Terraform: Launch Template + ASG + Target Tracking Policies
resource "aws_launch_template" "app" {
  name_prefix   = "myapp-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  user_data = base64encode(<<-EOF
    #!/bin/bash
    cd /var/www/myapp
    git pull origin main
    systemctl restart php8.3-fpm
    systemctl reload nginx
  EOF
  )

  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.app.id]
  }

  iam_instance_profile {
    name = aws_iam_instance_profile.app.name
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "app" {
  name                = "myapp-asg"
  vpc_zone_identifier = aws_subnet.private[*].id
  target_group_arns   = [aws_lb_target_group.app.arn]
  health_check_type   = "ELB"
  health_check_grace_period = 300

  min_size         = 2
  max_size         = 20
  desired_capacity = 2

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }

  tag {
    key                 = "Name"
    value               = "myapp-app"
    propagate_at_launch = true
  }
}

# Target Tracking Policy: CPU
resource "aws_autoscaling_policy" "cpu" {
  name                   = "myapp-cpu-tracking"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value       = 65.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

# Target Tracking Policy: ALB Request Count per Target
resource "aws_autoscaling_policy" "rps" {
  name                   = "myapp-rps-tracking"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label         = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.app.arn_suffix}"
    }
    target_value = 1000.0
  }
}

ECS Fargate Auto Scaling

For containerized applications, ECS Fargate is simpler—no EC2, only tasks. We configure the target resource and policies for CPU and SQS queue depth:

resource "aws_appautoscaling_target" "ecs" {
  max_capacity       = 50
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_cpu" {
  name               = "myapp-ecs-cpu"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 60.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 30
  }
}

# Scale by SQS queue depth (worker service)
resource "aws_appautoscaling_policy" "ecs_sqs" {
  name               = "myapp-worker-sqs"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.worker.resource_id
  scalable_dimension = aws_appautoscaling_target.worker.scalable_dimension
  service_namespace  = aws_appautoscaling_target.worker.service_namespace

  target_tracking_scaling_policy_configuration {
    customized_metric_specification {
      metric_name = "ApproximateNumberOfMessagesNotVisible"
      namespace   = "AWS/SQS"
      statistic   = "Sum"
      dimensions {
        name  = "QueueName"
        value = aws_sqs_queue.jobs.name
      }
    }
    target_value = 100.0
  }
}

Kubernetes HPA and KEDA

Horizontal Pod Autoscaler works with CPU and Memory out of the box. KEDA adds external metrics (SQS, RabbitMQ, Kafka).

Example HPA with CPU + Memory, stabilization, and behavior:

# HPA by CPU + Memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: myapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-web
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60

Learn more in the official documentation HorizontalPodAutoscaler.

KEDA ScaledObject for RabbitMQ:

# KEDA ScaledObject — RabbitMQ queue
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: myapp-worker-scaler
  namespace: myapp
spec:
  scaleTargetRef:
    name: myapp-worker
  minReplicaCount: 1
  maxReplicaCount: 30
  pollingInterval: 10
  cooldownPeriod: 60
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq.myapp.svc.cluster.local
        queueName: email-queue
        mode: QueueLength
        value: "50"

How to Implement Graceful Shutdown in Autoscaling?

On scale-in, an instance receives a termination signal. The application must handle current requests. Example for Node.js Express:

// Node.js Express
const server = app.listen(3000);

process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully');

  server.close(() => {
    console.log('HTTP server closed');
    process.exit(0);
  });

  setTimeout(() => {
    console.error('Forced shutdown');
    process.exit(1);
  }, 30000);
});

Additionally, we configure lifecycle hooks in AWS for executing commands before instance termination. See AWS documentation on Lifecycle Hooks for details.

What Are Common Autoscaling Issues and How to Solve Them?

When implementing autoscaling, teams often encounter thrashing—frequent addition and removal of instances due to too short cooldowns. Solution: increase scale_in_cooldown to 300–600 seconds and use stabilizationWindowSeconds in HPA. Another issue is slow application startup: a new instance is created, but traffic arrives before it's ready. Health check grace period, readiness probe, and Warm Pool help. Also, costly scale-in occurs when an instance with unfinished background tasks is removed. A lifecycle hook with queue drain before CONTINUE solves this. Finally, wrong metric selection—e.g., CPU 20% but the app lags due to I/O wait. In such cases, use custom metrics like p95 latency via CloudWatch or Prometheus.

Step-by-step autoscaling setup in AWS

Create a Launch Template with AMI, instance type, and user-data for application deployment.
Configure Auto Scaling Group: specify VPC, subnet, target group for ALB, set min, max, desired.
Add Target Tracking policies for CPU and RPS, set cooldowns.
Configure health checks: ELB health check, grace period (set to 300 seconds to avoid premature termination).
Enable Instance Refresh for rolling updates.
Verify graceful shutdown: lifecycle hook + drain.

Scope of work for autoscaling setup

Analysis of current architecture and load profile.
Designing scaling policies (CPU, RPS, queue).
Configuring AWS Auto Scaling Group, ECS Service Auto Scaling, or Kubernetes HPA/KEDA.
Configuring health checks and graceful shutdown.
Configuring monitoring (CloudWatch, Grafana) and alerting.
Architecture documentation and team instructions.
Team training (1–2 sessions).
30-day post-deployment support.

Checklist for preparing autoscaling

Define key metrics (CPU, RPS, queue).
Ensure the application is stateless or supports graceful shutdown.
Configure health checks and readiness probes.
Set minimum and maximum instance counts.
Test on a load testing environment.
Implement monitoring and alerting.

Estimated timelines

Configuration	Timeline
EC2 ASG + ALB + CPU scaling	2–3 days
ECS Fargate + target tracking	1–2 days
Kubernetes HPA	1 day
KEDA + external metrics	2–3 days
Scheduled scaling + Warm Pool	+1–2 days

We hold AWS and Kubernetes certifications, have many years of market experience, and have completed over 50 projects. Send us your project details for a free assessment. We'll prepare a commercial proposal with precise timelines and costs. Our turnkey solution includes everything you need to get started.

We regularly encounter a situation: "The site is not opening" at 3 a.m. — and it turns out that the VPS disk is full because nginx logs haven't been rotated for six months. Or the server went down under load on the day of an advertising campaign launch because the shared hosting had a limit of 50 concurrent connections. Setting up hosting and deployment is not about "where it's cheaper" but about what happens when something goes wrong. Our team helps avoid such incidents by designing infrastructure that accounts for real load patterns.

When to choose Vercel and Netlify?

Vercel is built for Next.js — deploy in one push, preview deployments for every PR, automatic CDN, Edge Functions, ISR without configuration. For frontend projects and JAMstack, it's the optimal choice: no operational overhead, time-to-deploy measured in minutes.

Real limitations: Vercel Serverless Functions run in us-east-1 by default (latency for Europe +80–100ms), Function timeout 300 seconds on Pro, Bandwidth 1TB/month on Pro. For heavy backend, you need workers or a separate server.

Netlify is closer to static sites and Edge Functions based on Deno Deploy. Build minutes are the main limitation on the free tier.

Criterion	Vercel	Netlify
Main specialization	Next.js, frameworks	Static, JAMstack
Edge Functions	V8 isolates (Node.js)	Deno Deploy
Preview Deployments	Built-in	Built-in
Serverless Functions	Yes, 300s limit	Yes, 10s limit
Free bandwidth limit	100 GB	100 GB

Why is Docker the foundation of predictable deployment?

"It works on my machine" — classic. Docker solves this through environment containerization. But a bad Dockerfile creates new problems.

A typical mistake: copying everything into the image without .dockerignore, resulting in an 800MB image instead of 80MB. node_modules inside the image weighs as much. Correct approach: multi-stage build.

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:20-alpine AS runner
WORKDIR /app
COPY --from=builder /app/.next ./.next
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./package.json
EXPOSE 3000
CMD ["npm", "start"]

Final image: 180MB instead of 1.2GB. CI build time is reduced due to layer caching — if package.json hasn't changed, the layer with npm ci is taken from cache.

Docker Compose for local development and simple production scenarios: application + PostgreSQL + Redis in one configuration. For production on a single server, it's a perfectly viable option if there's no requirement for horizontal scaling.

More about containerization — Wikipedia: Docker.

How to set up Nginx as a reverse proxy?

Nginx in front of the application is standard for VPS and dedicated servers. Main functions: SSL termination, gzip, static files, rate limiting, upstream load balancing.

A configuration often done incorrectly: worker_processes auto — number of processes equals CPU count. worker_connections 1024 — that's 1024 per worker process. With 4 CPUs and 1024 connections = 4096 concurrent connections. For a high-traffic site, you need worker_connections 4096 and set keepalive_timeout 65.

For static assets with hash in the filename:

location ~* \.(js|css|woff2|png|webp)$ {
    expires 1y;
    add_header Cache-Control "public, immutable";
}

immutable tells the browser: don't revalidate this file even on hard refresh. This only works correctly with content-hashed filenames (which Vite/webpack do by default). Documentation — Wikipedia: Nginx.

AWS: flexibility and complexity

EC2 + Auto Scaling Group — classic for horizontal scaling. AMI with pre-installed application, Launch Template, ASG with min/desired/max instances, Application Load Balancer. When CPU > 70% for 3 minutes — scale out, when CPU < 30% for 15 minutes — scale in. Health check via ALB removes unhealthy instances from rotation.

ECS Fargate — containers without managing EC2. Deploy a Docker image, specify CPU/memory (512 CPU units = 0.5 vCPU, from 512MB memory), Fargate launches it. More expensive than Lambda, but no cold start and no timeout limitations. Suitable for long-running processes, WebSocket servers, heavy workers.

RDS for PostgreSQL with Multi-AZ: automatic failover in 1–2 minutes when primary fails. Read Replicas for scaling reads. RDS Proxy for connection pooling — Lambda functions cannot hold long-term connections, the proxy buffers this.

Kubernetes: when it is justified

K8s adds significant operational complexity. Justified when: multiple teams deploy independent services, fine-grained resource allocation per service is needed, canary deployments and blue/green without downtime are required.

AWS EKS, GKE, or managed k8s from Hetzner (cheaper). Helm charts for standard services. Horizontal Pod Autoscaler based on CPU and custom metrics (RPS via Prometheus).

For most startups and medium-sized projects, Kubernetes is overkill. ECS or Fly.io provide 80% of the capabilities with 20% of the operational complexity.

Monitoring and alerting

A server without monitoring is waiting for an incident. Minimal stack: Prometheus + Grafana (or Grafana Cloud for managed), alerting on disk > 80%, memory > 85%, CPU > 90% over 5 minutes, error rate > 1%. Uptime via Better Uptime or Upptime (self-hosted).

Logs: Loki + Grafana or CloudWatch Logs Insights. Structured JSON logs (winston, pino) are mandatory — otherwise, log searching becomes a pain.

What is included in hosting setup

Audit of current infrastructure and load profiling
Selection of target architecture (VPS, AWS, serverless, Kubernetes)
Setting up CI/CD pipeline (GitHub Actions, GitLab CI) with automatic deployment
IaC via Terraform or Pulumi (infrastructure as code)
Configuration of Nginx, SSL certificates, HTTP/2, brotli
Monitoring and alerting (Prometheus + Grafana, PagerDuty)
Documentation of runbooks and team training

Additionally, contact us if you need migration from current hosting or integration with external services.

Work process

Audit of current infrastructure (2–5 days)
Selection of target architecture with load and budget justification (1–3 days)
Setting up CI/CD pipeline (GitHub Actions, GitLab CI) (2–5 days)
IaC via Terraform or Pulumi (3–10 days)
Setting up monitoring and alerting (2–5 days)
Documentation of runbooks and team training (1–3 days)

Our experience — 7 years on the market, over 50 projects, guarantee of operability after deployment.

Timeline

Basic deployment on VPS with Docker + Nginx + CI/CD: 1–2 weeks.
Setting up AWS infrastructure with Auto Scaling, RDS, CDN: 3–6 weeks.
Migration to EKS from scratch: 6–12 weeks.
Setting up Vercel/Netlify for JAMstack: 3–5 days.

The cost is calculated individually depending on complexity and scope of work. Get a consultation — we'll evaluate your architecture in one day.