Implementing Fallback Logic When AI Service Unavailable in Mobile App
OpenAI returns 503 roughly every few weeks — during peak hours or incidents. For mobile app where AI assistant is part of core user flow, this means white screen or crash without fallback.
Degradation Levels
Proper fallback isn't one stub — it's degradation cascade:
Level 1: Retry with backoff. Transient errors (429 Rate Limit, 503, timeout) — retry with exponential backoff. Three attempts: 1s, 3s, 9s apart. If all fail — move to level 2.
Level 2: Provider switch. If primary is OpenAI, fallback is Anthropic Claude API or Google Gemini. Different providers' answers vary in style, but quality comparable for most tasks. Store secondary provider keys in server config.
Level 3: Local model. For critical flows — small local model (Phi-3.5-mini via llama.cpp, ~2.2 GB). Lower quality than GPT-4o but works offline. On iOS runs via MLModel or llama.swift.
Level 4: Static answers. FAQs and common questions — from cache or database. User gets useful answer unaware AI unavailable.
Circuit Breaker Implementation
Circuit Breaker pattern prevents cascading load on degrading service: tracks failures, opens circuit after threshold, periodic recovery attempts.
UX During Degradation
Users shouldn't see technical errors. On static answer fallback — show normal UI unmarked. On full unavailability — "Assistant temporarily unavailable, try again in minutes" instead of raw Error 503.
Degradation indicator useful for internal analytics: log each fallback with level and cause.
Timeline Estimates
Basic retry with backoff — 1 day. Full cascade with circuit breaker and two providers — 2–3 days.







