LLM Deployment on Yandex Cloud (DataSphere)

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1566 services

LLM Deployment on Yandex Cloud (DataSphere)

Medium

~3-5 days

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1305
Development of a web application for FEEDME
1214
Website development for BELFINGROUP
916
Development of an online store for the company FURNORO
1144
B2B Advance company logo design
608
Development of a web application for Enviok
881

Show more works

Deploy LLM on Yandex Cloud

Yandex Cloud is the leading domestic cloud provider with GPU instances, Yandex ML Platform, and its own LLM (YandexGPT). For Russian companies with data residency and import substitution requirements.

GPU instances in Yandex Cloud

GPU clusters: g2 (Tesla V100 32GB) and g3 (A100 80GB):

# Создание VM с GPU через YC CLI
yc compute instance create \
  --name llm-server \
  --zone ru-central1-a \
  --platform gpu-standard-v3 \
  --gpus 1 \
  --memory 48GB \
  --cores 14 \
  --core-fraction 100 \
  --image-family ubuntu-2204-lts-gpu \
  --image-folder-id standard-images \
  --disk-type network-ssd \
  --disk-size 300GB \
  --network-interface subnet-name=default,nat-ip-version=ipv4 \
  --ssh-key ~/.ssh/id_rsa.pub

vLLM configuration on YC VM

# Установка после SSH на VM
sudo apt-get update && sudo apt-get install -y python3-pip
pip install vllm

# Загрузка модели из Yandex Object Storage
aws s3 sync s3://my-bucket/models/mistral-7b/ /data/models/mistral-7b/ \
  --endpoint-url https://storage.yandexcloud.net \
  --profile yandex

# Запуск сервера
python -m vllm.entrypoints.openai.api_server \
  --model /data/models/mistral-7b/ \
  --tensor-parallel-size 1 \
  --max-model-len 8192 \
  --max-num-seqs 128 \
  --port 8000 \
  --host 0.0.0.0

Yandex Object Storage for models

import boto3

# Yandex Object Storage совместим с S3 API
s3 = boto3.client(
    "s3",
    endpoint_url="https://storage.yandexcloud.net",
    aws_access_key_id=os.getenv("YC_ACCESS_KEY"),
    aws_secret_access_key=os.getenv("YC_SECRET_KEY"),
    region_name="ru-central1"
)

# Загрузка файлов модели
for file in model_files:
    s3.upload_file(
        Filename=f"/local/models/{file}",
        Bucket="llm-models-bucket",
        Key=f"mistral-7b/{file}",
        ExtraArgs={"StorageClass": "COLD"}  # для редко используемых версий
    )

YandexGPT API

To use Yandex's own models:

import requests

def call_yandexgpt(prompt: str, folder_id: str, api_key: str) -> str:
    url = "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"

    payload = {
        "modelUri": f"gpt://{folder_id}/yandexgpt-lite/latest",
        "completionOptions": {
            "stream": False,
            "temperature": 0.6,
            "maxTokens": 2000
        },
        "messages": [
            {
                "role": "system",
                "text": "Ты полезный помощник."
            },
            {
                "role": "user",
                "text": prompt
            }
        ]
    }

    response = requests.post(
        url,
        headers={
            "Authorization": f"Api-Key {api_key}",
            "x-folder-id": folder_id
        },
        json=payload
    )
    return response.json()["result"]["alternatives"][0]["message"]["text"]

Yandex DataSphere for ML development

DataSphere — a managed Jupyter environment with GPUs on demand:

# В ноутбуке DataSphere
#!g1.1  # директива для использования V100

import torch
print(torch.cuda.get_device_name(0))  # Tesla V100-SXM2-32GB

# Обучение или fine-tuning модели
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    fp16=True,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
)

Load balancing via Application Load Balancer

# Создание target group из GPU VM
yc alb target-group create llm-targets \
  --target subnet-name=default,ip-address=10.0.0.10 \
  --target subnet-name=default,ip-address=10.0.0.11

# Backend group
yc alb backend-group create llm-backends \
  --http-backend name=vllm-backend,port=8000,target-group-id=xxx,healthcheck-path=/health

# HTTP router
yc alb http-router create llm-router \
  --virtual-host name=llm,authority=llm.company.ru \
  --route name=api,path-prefix=/v1,backend-group-id=xxx

Monitoring via Yandex Monitoring

Built-in integration: VM metrics (CPU, memory, GPU utilization via DCGM exporter) automatically in Yandex Monitoring. Custom metrics via Unified Agent:

# /etc/yandex-unified-agent/config.yml
routes:
  - input:
      plugin: prometheus_puller
      config:
        url: http://localhost:8000/metrics
        pull_period: 15s
    output:
      plugin: yc_metrics
      config:
        folder_id: xxx
        iam_token_file: /etc/yandex-unified-agent/iam_token