AI Assistant Development in Mobile Application Based on GPT-4/GPT-4o
GPT-4o — multimodal model: accepts text, images and audio in single API call. Changes assistant architecture vs GPT-4-turbo: instead of separate OCR + text + voice pipelines — one gpt-4o endpoint with content type array. Mobile app not using this loses half the model's value.
OpenAI API Integration: What Really Matters
Basic call — via POST /v1/chat/completions. On iOS official openai-swift package or thin URLSession wrapper most convenient — heavy HTTP client dependency unnecessary.
Key parameters for mobile assistant:
let request = ChatCompletionRequest(
model: "gpt-4o",
messages: conversationHistory,
stream: true, // streaming — mandatory for UX
maxTokens: 1024,
temperature: 0.7
)
Streaming — not option but requirement. User waiting 5–8 seconds silence before response closes app. With stream: true first token arrives in 300–500ms, text appears character-by-character. iOS implementation via URLSession + AsyncBytes or EventSource for SSE.
GPT-4o multimodality. Image transmission:
let message = ChatMessage(role: .user, content: [
.text("What's shown in this screenshot?"),
.imageURL(base64Image: imageBase64, detail: .auto)
])
detail: .auto — model chooses between low (85 tokens) and high (up to 1700 tokens) by task. For document analysis better high, for quick answers — low.
Context and Token Management
GPT-4o has 128K token context window. But sending full dialog history in every request — mistake hitting cost and latency. Correct strategy: sliding window with summarization.
When history exceeds threshold (e.g., 4000 tokens), last N messages preserved fully, earlier — replaced with summary generated via separate call with gpt-4o-mini (20x cheaper). Summary stored as system message at history start.
Token counting via tiktoken on server or heuristic: ~4 characters ≈ 1 token for English, ~2–3 characters ≈ 1 token for Cyrillic.
Error Handling and Rate Limits
OpenAI API returns 429 Too Many Requests on rate limit exceed. On mobile client need exponential backoff with jitter:
func retryWithBackoff<T>(maxAttempts: Int = 3, operation: () async throws -> T) async throws -> T {
var attempt = 0
while attempt < maxAttempts {
do {
return try await operation()
} catch APIError.rateLimitExceeded {
let delay = Double.random(in: 1.0...2.0) * pow(2.0, Double(attempt))
try await Task.sleep(nanoseconds: UInt64(delay * 1_000_000_000))
attempt += 1
}
}
throw APIError.maxRetriesExceeded
}
Timeout on streaming request set at read level (timeout per chunk), not whole request — otherwise long responses cut off.
API Key Security
OpenAI API key can't be hardcoded in mobile app — extractable from binary in minutes. Correct scheme: mobile client authenticates on own backend, backend proxies OpenAI requests with key from environment variables. Additionally — user-level rate limiting.
Implementation Process
Audit requirements: which modalities needed (text only, images, voice), need server proxy, dialog history requirements (how much store, sync between devices).
Development: API client → streaming UI → history management → multimodality → error handling → server proxy.
Timeline Estimates
Text assistant with streaming and history — 1–2 weeks. With images, voice, server proxy and context management — 3–5 weeks.







