Building an AI Assistant with YandexGPT in a Mobile Application
YandexGPT is the practical choice for an AI assistant when requirements include data processing on servers in Russia, high-quality Russian language support, and integration with the Yandex ecosystem (search, maps, marketplace). For applications targeting the Russian market with strict data localization requirements, this is not just a preference—it's compliance.
Yandex Foundation Models API
YandexGPT is accessible through the Yandex Cloud Foundation Models API. Base URL: https://llm.api.cloud.yandex.net/foundationModels/v1/completion.
Authentication uses an IAM token (for user applications) or service account API key (for server proxies). IAM tokens live for 12 hours and require renewal—they are not used directly on mobile clients.
Request structure:
struct YandexGPTRequest: Encodable {
let modelUri: String // "gpt://{folder_id}/yandexgpt/latest"
let completionOptions: CompletionOptions
let messages: [YandexMessage]
}
struct CompletionOptions: Encodable {
let stream: Bool
let temperature: Double // 0..1
let maxTokens: String // string, not number—API quirk
}
Important: maxTokens is passed as a string, not a number. This violates the principle of least surprise and periodically breaks auto-generated clients.
modelUri is constructed as gpt://{folder_id}/{model_name}/{version}. The folder_id is the Yandex Cloud folder identifier and must be stored on the server, not in the app.
Synchronous and Asynchronous Modes
YandexGPT supports two modes:
-
synchronous (
/completion) — wait for full response, maximum 60 seconds -
asynchronous (
/completionAsync) — receiveoperation_id, then poll for result
For a mobile assistant with real-time display, you need the streaming mode (stream: true in synchronous request). The server returns chunked response with partial results. Each chunk is complete JSON with accumulated text (not a delta, but full text on each step). This is important: when rendering, replace the previous text with the new one, don't append like in OpenAI.
// Each chunk contains FULL text, not a delta
// Correct rendering:
func handleChunk(_ response: YandexCompletionResponse) {
let fullText = response.result.alternatives.first?.message.text ?? ""
DispatchQueue.main.async {
self.currentMessage = fullText // replace, not append
}
}
YandexGPT Lite vs Pro
| Parameter | YandexGPT Lite | YandexGPT Pro |
|---|---|---|
| Response quality | Basic | Higher, especially on long instructions |
| Speed | Faster | Slower |
| Cost | Cheaper | More expensive |
| Context | 8192 tokens | 8192 tokens |
For most mobile assistant tasks (helper, FAQ, text processing), Lite is sufficient. Pro is justified for complex analytical tasks and working with long documents.
Embeddings API (/textEmbedding) is useful for semantic search in a local knowledge base—model text-search-query/latest for queries, text-search-doc/latest for documents.
Integration with Yandex SpeechKit
For voice input/output in a Russian application—Yandex SpeechKit: the best quality Russian speech among available market services. SDK for iOS and Android is available through CocoaPods/Maven.
STT via WebSocket: wss://stt.api.cloud.yandex.net/speech/v3/stt:streamingRecognize—streaming recognition with partial results. TTS via REST with voice selection (alena, filipp, jane—SSML is supported).
Workflow
Start: configure Yandex Cloud account, create service account, assign ai.languageModels.user role, set up server proxy for secure credentials storage.
Development: API client → streaming UI accounting for full text in chunks → history management → optional SpeechKit integration.
Timeline Estimates
Text assistant with streaming—1–2 weeks. With voice via SpeechKit and server proxy—3–4 weeks.







