Web Application Server Autoscaling Setup
Autoscaling automatically adds servers during load spikes and removes them during quiet periods. The main goal is to avoid overpaying for idle capacity and not lose requests during traffic peaks.
Scaling metrics
Proper metric selection determines autoscaling quality:
| Metric | When to use | Threshold |
|---|---|---|
| CPU Utilization | CPU-intensive applications | 60–70% |
| Request Count (RPS) | Stateless HTTP services | per business test |
| Memory Utilization | Memory-intensive | 70–80% |
| Queue Depth (SQS/RabbitMQ) | Worker processes | 100–500 messages |
| Custom metric (p95 latency) | Latency-sensitive API | 200–500 ms |
CPU metric lags — server first slows, then scales. RPS metric reacts faster. For typical web apps combine CPU + Request Count.
AWS Auto Scaling Group
Most common scenario — EC2 ASG with Application Load Balancer.
# Terraform: Launch Template + ASG
resource "aws_launch_template" "app" {
name_prefix = "myapp-"
image_id = data.aws_ami.ubuntu.id
instance_type = "t3.medium"
user_data = base64encode(<<-EOF
#!/bin/bash
cd /var/www/myapp
git pull origin main
systemctl restart php8.3-fpm
systemctl reload nginx
EOF
)
network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.app.id]
}
iam_instance_profile {
name = aws_iam_instance_profile.app.name
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "app" {
name = "myapp-asg"
vpc_zone_identifier = aws_subnet.private[*].id
target_group_arns = [aws_lb_target_group.app.arn]
health_check_type = "ELB"
health_check_grace_period = 300
min_size = 2
max_size = 20
desired_capacity = 2
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
tag {
key = "Name"
value = "myapp-app"
propagate_at_launch = true
}
}
# Target Tracking Policy: CPU
resource "aws_autoscaling_policy" "cpu" {
name = "myapp-cpu-tracking"
autoscaling_group_name = aws_autoscaling_group.app.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 65.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
Kubernetes HPA and KEDA
Kubernetes Horizontal Pod Autoscaler works with CPU and Memory out of the box. KEDA adds external metrics: SQS, RabbitMQ, Kafka, Prometheus.
# HPA by CPU + Memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: myapp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-web
minReplicas: 2
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 4
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
Scheduled Scaling
For predictable peaks (nightly sends, scheduled sales) add scheduled actions.
import boto3
client = boto3.client('application-autoscaling', region_name='eu-west-1')
# Scale up before peak (Friday 18:00 UTC)
client.put_scheduled_action(
ServiceNamespace='ecs',
ResourceId='service/myapp-cluster/myapp-web',
ScalableDimension='ecs:service:DesiredCount',
ScheduledActionName='scale-up-friday-evening',
Schedule='cron(0 18 ? * FRI *)',
ScalableTargetAction={
'MinCapacity': 10,
'MaxCapacity': 50,
}
)
Graceful Shutdown
When scaling down, instance receives shutdown signal. Application should finish current requests.
// Node.js Express
const server = app.listen(3000);
process.on('SIGTERM', () => {
console.log('SIGTERM received, shutting down gracefully');
server.close(() => {
console.log('HTTP server closed');
process.exit(0);
});
// Force shutdown after 30 seconds
setTimeout(() => {
console.error('Forced shutdown');
process.exit(1);
}, 30000);
});
Common problems
Thrashing — constant adding/removing instances due to short cooldown. Solution: increase scale_in_cooldown to 300–600 seconds.
Slow application startup — instance created but traffic arrives before ready. Solution: health check grace period + readiness probe + Warm Pool.
Expensive scale-in — removing instance with unfinished background tasks. Solution: lifecycle hook + drain queue before CONTINUE.
Wrong metric — CPU 20%, but app slows due to I/O wait. Solution: custom metric (p95 latency via CloudWatch or Prometheus).
Timeline
| Configuration | Timeline |
|---|---|
| EC2 ASG + ALB + CPU scaling | 2–3 days |
| ECS Fargate + target tracking | 1–2 days |
| Kubernetes HPA | 1 day |
| KEDA + external metrics | 2–3 days |
| Scheduled scaling + Warm Pool | +1–2 days |







