YandexGPT API Integration
YandexGPT is an LLM from Yandex via Yandex Cloud (Yandex Foundation Models service). Key advantages for the Russian market: data is processed in Russia (152-ФЗ compliance), integration with other Yandex Cloud services, native-level Russian language support.
Access Setup
# Required:
# 1. Yandex Cloud account with payment account
# 2. Folder (folder_id)
# 3. IAM token or service account API key
import requests
import json
FOLDER_ID = "your-folder-id"
IAM_TOKEN = "your-iam-token" # Updated every 12 hours
# Or API_KEY for service account
Synchronous Request via REST API
def yandexgpt_chat(
prompt: str,
model: str = "yandexgpt",
temperature: float = 0.1,
max_tokens: int = 2000,
) -> str:
url = "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"
headers = {
"Authorization": f"Api-Key {API_KEY}",
"x-folder-id": FOLDER_ID,
}
body = {
"modelUri": f"gpt://{FOLDER_ID}/{model}",
"completionOptions": {
"stream": False,
"temperature": temperature,
"maxTokens": max_tokens,
},
"messages": [
{"role": "user", "text": prompt}
]
}
response = requests.post(url, headers=headers, json=body)
response.raise_for_status()
return response.json()["result"]["alternatives"][0]["message"]["text"]
# With system prompt
def yandexgpt_with_system(system: str, user_prompt: str) -> str:
url = "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"
body = {
"modelUri": f"gpt://{FOLDER_ID}/yandexgpt",
"completionOptions": {"stream": False, "temperature": 0.1, "maxTokens": 2000},
"messages": [
{"role": "system", "text": system},
{"role": "user", "text": user_prompt}
]
}
response = requests.post(
url,
headers={"Authorization": f"Api-Key {API_KEY}", "x-folder-id": FOLDER_ID},
json=body,
)
return response.json()["result"]["alternatives"][0]["message"]["text"]
Async via Official SDK
from yandex_cloud_ml_sdk import YCloudML
sdk = YCloudML(folder_id=FOLDER_ID, auth=API_KEY)
model = sdk.models.completions("yandexgpt")
# Synchronously
result = model.configure(temperature=0.5).run("Tell me about Moscow")
# Async
result = await model.configure(temperature=0.5).run_async("Request")
# Streaming
for event in model.configure(temperature=0.5).run_stream("Long request"):
print(event.alternatives[0].text, end="")
YandexGPT Models
| Model | Description | Context |
|---|---|---|
| yandexgpt | Main model, balance quality/speed | 32K |
| yandexgpt-lite | Lightweight version, faster and cheaper | 32K |
| yandexgpt-32k | Long context | 32K |
Yandex Embeddings
# text-search-doc — for document indexing
# text-search-query — for search queries
def get_yandex_embedding(text: str, embedding_type: str = "text-search-doc") -> list[float]:
response = requests.post(
"https://llm.api.cloud.yandex.net/foundationModels/v1/textEmbedding",
headers={"Authorization": f"Api-Key {API_KEY}", "x-folder-id": FOLDER_ID},
json={
"modelUri": f"emb://{FOLDER_ID}/{embedding_type}",
"text": text,
}
)
return response.json()["embedding"]
Practical Case Study
Government enterprise with requirement to process data in Russia. System for automatic response to citizen requests. YandexGPT was chosen for:
- Data doesn't leave Russia (152-ФЗ)
- Integration with Yandex SpeechKit for voice input
- Quality in Russian language
Timeline
- Basic REST integration: 1–2 days
- SDK integration with async/streaming: 2–3 days
- Integration with other Yandex Cloud services: 1 week







