Web Application Server Autoscaling Setup

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    815

Web Application Server Autoscaling Setup

Autoscaling automatically adds servers during load spikes and removes them during quiet periods. The main goal is to avoid overpaying for idle capacity and not lose requests during traffic peaks.

Scaling metrics

Proper metric selection determines autoscaling quality:

Metric When to use Threshold
CPU Utilization CPU-intensive applications 60–70%
Request Count (RPS) Stateless HTTP services per business test
Memory Utilization Memory-intensive 70–80%
Queue Depth (SQS/RabbitMQ) Worker processes 100–500 messages
Custom metric (p95 latency) Latency-sensitive API 200–500 ms

CPU metric lags — server first slows, then scales. RPS metric reacts faster. For typical web apps combine CPU + Request Count.

AWS Auto Scaling Group

Most common scenario — EC2 ASG with Application Load Balancer.

# Terraform: Launch Template + ASG
resource "aws_launch_template" "app" {
  name_prefix   = "myapp-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  user_data = base64encode(<<-EOF
    #!/bin/bash
    cd /var/www/myapp
    git pull origin main
    systemctl restart php8.3-fpm
    systemctl reload nginx
  EOF
  )

  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.app.id]
  }

  iam_instance_profile {
    name = aws_iam_instance_profile.app.name
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "app" {
  name                = "myapp-asg"
  vpc_zone_identifier = aws_subnet.private[*].id
  target_group_arns   = [aws_lb_target_group.app.arn]
  health_check_type   = "ELB"
  health_check_grace_period = 300

  min_size         = 2
  max_size         = 20
  desired_capacity = 2

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }

  tag {
    key                 = "Name"
    value               = "myapp-app"
    propagate_at_launch = true
  }
}

# Target Tracking Policy: CPU
resource "aws_autoscaling_policy" "cpu" {
  name                   = "myapp-cpu-tracking"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value       = 65.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

Kubernetes HPA and KEDA

Kubernetes Horizontal Pod Autoscaler works with CPU and Memory out of the box. KEDA adds external metrics: SQS, RabbitMQ, Kafka, Prometheus.

# HPA by CPU + Memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: myapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-web
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60

Scheduled Scaling

For predictable peaks (nightly sends, scheduled sales) add scheduled actions.

import boto3

client = boto3.client('application-autoscaling', region_name='eu-west-1')

# Scale up before peak (Friday 18:00 UTC)
client.put_scheduled_action(
    ServiceNamespace='ecs',
    ResourceId='service/myapp-cluster/myapp-web',
    ScalableDimension='ecs:service:DesiredCount',
    ScheduledActionName='scale-up-friday-evening',
    Schedule='cron(0 18 ? * FRI *)',
    ScalableTargetAction={
        'MinCapacity': 10,
        'MaxCapacity': 50,
    }
)

Graceful Shutdown

When scaling down, instance receives shutdown signal. Application should finish current requests.

// Node.js Express
const server = app.listen(3000);

process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully');

  server.close(() => {
    console.log('HTTP server closed');
    process.exit(0);
  });

  // Force shutdown after 30 seconds
  setTimeout(() => {
    console.error('Forced shutdown');
    process.exit(1);
  }, 30000);
});

Common problems

Thrashing — constant adding/removing instances due to short cooldown. Solution: increase scale_in_cooldown to 300–600 seconds.

Slow application startup — instance created but traffic arrives before ready. Solution: health check grace period + readiness probe + Warm Pool.

Expensive scale-in — removing instance with unfinished background tasks. Solution: lifecycle hook + drain queue before CONTINUE.

Wrong metric — CPU 20%, but app slows due to I/O wait. Solution: custom metric (p95 latency via CloudWatch or Prometheus).

Timeline

Configuration Timeline
EC2 ASG + ALB + CPU scaling 2–3 days
ECS Fargate + target tracking 1–2 days
Kubernetes HPA 1 day
KEDA + external metrics 2–3 days
Scheduled scaling + Warm Pool +1–2 days