Autoscaling Setup for Mobile App Servers

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Autoscaling Setup for Mobile App Servers
Medium
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1054
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Mobile App Server Autoscaling Setup

Mobile traffic is unpredictable. A push notification to 500,000 devices, an App Store feature, or a mention in a 300k-subscriber Telegram channel—and your server handling 200 rps suddenly receives 3000 rps in 30 seconds. Without autoscaling, you get 503 errors and poor app store reviews.

Types of Autoscaling and When to Use Each

Horizontal Pod Autoscaler (HPA) in Kubernetes—adds pods during load increases, removes them during decreases. Basic metric is CPU utilization, but for mobile APIs: latency p99, request queue depth, or custom Prometheus metrics work better.

Vertical Pod Autoscaler (VPA)—adjusts pod requests/limits. Useful for JVM services where memory grows during heap warmup. However, VPA requires pod restart during resource changes—unsuitable for stateful services.

Cluster Autoscaler—adds or removes Kubernetes nodes in the cloud (AWS EC2, GCP GKE, Azure AKS). Works alongside HPA: HPA wants 5 pods, but no capacity exists—Cluster Autoscaler provisions a node.

KEDA (Kubernetes Event-Driven Autoscaling)—scales based on external metrics: RabbitMQ queue length, Kafka lag, Redis Streams message count. For a mobile app with push notification queues: workers scale by task count, not CPU.

HPA Configuration for Mobile APIs

Problem with standard CPU-based scaling: during request spikes, CPU rises, HPA decides to add a pod (15–30 seconds), the pod starts (another 10–30 seconds), passes readiness probe. Total: 30–60 seconds before handling traffic. Some mobile clients receive 503 during this window.

Solutions:

  • Predictive scaling—scale out before expected peaks (push notification sent → scale immediately)
  • ScaleUp faster, ScaleDown slowerscaleUp.stabilizationWindowSeconds: 0 (instant scale up), scaleDown.stabilizationWindowSeconds: 300 (wait 5 minutes before reducing to avoid thrashing)
  • Minreplicas: 2—never drop below 2 pods so rolling updates cause no downtime
spec:
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300

Cold Start Issues with Mobile Traffic

Go and Node.js start in 1–3 seconds—acceptable. JVM applications (Spring Boot)—10–20 seconds. Lambda (serverless)—cold start 500ms–3 seconds depending on runtime and package size.

For JVM: keep minimum 2 pods always warm. GraalVM Native Image—starts in 0.1–0.3 seconds, but requires reflection configuration. Spring Boot 3 + GraalVM Native is a production-ready combination.

For serverless (AWS Lambda, Google Cloud Functions): Provisioned Concurrency keeps N instances warm. More expensive, but eliminates cold starts for those instances.

Real case: a news iOS app. After publishing an editorial push—40,000 simultaneous opens in 2 minutes. Single pod on 2 vCPU handled 400 rps. HPA set to CPU 60% meant pod was added after peak passed. Solution: KEDA using CloudWatch metrics (SQS queue message count for push notifications)—autoscaling to 8 pods triggered before traffic arrived. Zero 503 errors in the next three campaigns.

Timeline: HPA + Cluster Autoscaler setup for one service—2–4 days. KEDA with custom metrics + predictive scaling—1–2 weeks.