AI Assistant Development in Mobile Application Based on Claude (Anthropic)
Claude — models from Anthropic with one of largest context windows among commercially available LLM. Claude 3.5 Sonnet supports 200K token context, meaning for mobile assistant ability to load entire conversation for weeks or large document in single request without chunking. Changes approaches to dialog history management.
Anthropic Messages API: Structure and Specifics
Anthropic API structurally similar to OpenAI but important differences. Claude system prompts — separate system parameter, not system role message in messages array. Critical: passing system prompt inside messages degrades instruction following quality.
struct AnthropicRequest: Encodable {
let model: String // "claude-3-5-sonnet-20241022"
let maxTokens: Int // required parameter, no default
let system: String // system prompt — separate
let messages: [Message]
let stream: Bool
enum CodingKeys: String, CodingKey {
case model, system, messages, stream
case maxTokens = "max_tokens"
}
}
max_tokens in Anthropic API — required parameter without default. Forgetting to pass returns 400 error. Differs from OpenAI where max_tokens optional.
Authentication: x-api-key header (not Authorization: Bearer). API versioning via anthropic-version: 2023-06-01. Without this header — 400 Bad Request.
Streaming via SSE
Claude supports streaming via Server-Sent Events. Stream structure differs from OpenAI: content_block_start, content_block_delta, content_block_stop, message_delta events — each contains own fields.
On iOS handling SSE via URLSession + AsyncBytes:
for try await line in response.bytes.lines {
guard line.hasPrefix("data: ") else { continue }
let jsonString = String(line.dropFirst(6))
guard jsonString != "[DONE]" else { break }
if let data = jsonString.data(using: .utf8),
let event = try? JSONDecoder().decode(StreamEvent.self, from: data),
event.type == "content_block_delta" {
let delta = event.delta?.text ?? ""
await MainActor.run { self.appendText(delta) }
}
}
Important handling all event types, not just content_block_delta — message_delta contains stop_reason info (e.g., max_tokens) needed to show user.
Large Context: Advantages and Mobile Limitations
200K tokens — roughly 150,000 words or ~500 pages text. For mobile assistant ability work with full document without RAG pipeline. User attached contract PDF — can pass wholly in context and ask questions.
Downside: large context = long time-to-first-token. At 50K tokens request first token can arrive in 3–5 seconds even on good connection. On mobile need progress indicator appearing immediately, before first token, otherwise user thinks app hung.
Cost also grows linearly with context — for apps with user billing important considering when designing token counter UI.
Vision: Image Transmission
Claude 3.5 Sonnet supports images via base64 in content block:
let imageContent = ContentBlock(
type: "image",
source: ImageSource(
type: "base64",
mediaType: "image/jpeg",
data: imageBase64
)
)
Limitation: maximum 20 images per request, each up to 5 MB. On mobile before sending compress image to reasonable size — UIGraphicsImageRenderer or BitmapFactory.Options with inSampleSize.
Implementation Process
Key questions before starting: need document work (PDF, images), expected dialog volume, need server proxy (yes — mandatory, API key not stored in app).
Implementation: Anthropic API client → streaming UI → history management considering 200K limit → optional file handling.
Timeline Estimates
Basic text assistant — 1–2 weeks. With document and image support, server proxy — 3–4 weeks.







