Self-hosted deployment of Stable Diffusion
Self-hosted Stable Diffusion gives you complete control over generation: custom models, LoRA, no content policy restrictions on API services, and predictable costs at high volumes. For 5,000+ images per month, self-hosting is cheaper than the API.
Deployment options
Automatic1111 WebUI is the most popular, rich ecosystem of extensions:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
# Загружаем модель
wget -O models/Stable-diffusion/sd_xl_base_1.0.safetensors \
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
# Запуск с API
./webui.sh --api --listen --port 7860 --xformers
ComfyUI is a more flexible, node-based workflow, better for automation:
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py --listen 0.0.0.0 --port 8188
Docker deploy
# docker-compose.yml
version: "3.8"
services:
stable-diffusion:
image: universonic/stable-diffusion-webui:latest
ports:
- "7860:7860"
volumes:
- ./models:/app/stable-diffusion-webui/models
- ./outputs:/app/stable-diffusion-webui/outputs
environment:
- COMMANDLINE_ARGS=--api --xformers --medvram
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
Automatic1111 API client
import httpx
import base64
import json
class SDWebUIClient:
def __init__(self, base_url: str = "http://localhost:7860"):
self.base_url = base_url
async def txt2img(
self,
prompt: str,
negative_prompt: str = "low quality, blurry",
width: int = 1024,
height: int = 1024,
steps: int = 30,
cfg_scale: float = 7.0,
sampler: str = "DPM++ 2M Karras",
seed: int = -1
) -> bytes:
payload = {
"prompt": prompt,
"negative_prompt": negative_prompt,
"width": width,
"height": height,
"steps": steps,
"cfg_scale": cfg_scale,
"sampler_name": sampler,
"seed": seed,
"batch_size": 1
}
async with httpx.AsyncClient(timeout=120) as client:
response = await client.post(f"{self.base_url}/sdapi/v1/txt2img", json=payload)
result = response.json()
return base64.b64decode(result["images"][0])
async def img2img(self, init_image: bytes, prompt: str, denoising_strength: float = 0.7) -> bytes:
payload = {
"init_images": [base64.b64encode(init_image).decode()],
"prompt": prompt,
"denoising_strength": denoising_strength,
}
async with httpx.AsyncClient(timeout=120) as client:
response = await client.post(f"{self.base_url}/sdapi/v1/img2img", json=payload)
return base64.b64decode(response.json()["images"][0])
async def get_models(self) -> list[str]:
async with httpx.AsyncClient() as client:
response = await client.get(f"{self.base_url}/sdapi/v1/sd-models")
return [m["title"] for m in response.json()]
async def switch_model(self, model_title: str) -> None:
async with httpx.AsyncClient(timeout=60) as client:
await client.post(
f"{self.base_url}/sdapi/v1/options",
json={"sd_model_checkpoint": model_title}
)
Scaling under load
from celery import Celery
import redis
# Несколько GPU-воркеров
app = Celery("sd_tasks", broker="redis://localhost:6379/0")
app.conf.worker_concurrency = 1 # 1 задача на GPU воркер
app.conf.worker_prefetch_multiplier = 1
@app.task(queue="gpu_0")
def generate_on_gpu0(prompt: str, settings: dict) -> str:
client = SDWebUIClient("http://gpu0-server:7860")
return asyncio.run(client.txt2img(prompt, **settings))
@app.task(queue="gpu_1")
def generate_on_gpu1(prompt: str, settings: dict) -> str:
client = SDWebUIClient("http://gpu1-server:7860")
return asyncio.run(client.txt2img(prompt, **settings))
TCO: self-hosted vs API
| Объём | DALL-E 3 standard | FLUX Dev (Replicate) | Self-hosted (RTX 4090) |
|---|---|---|---|
| 1,000 images/month | $40 | $15 | $50 (amortization) |
| 10,000 images/month | $400 | $150 | $55 |
| 100,000 images/month | $4,000 | $1,500 | $100 |
Self-hosted (RTX 4090 ~$1800) breakeven point: ~15,000–20,000 images per month. Deployment time: basic single-GPU server – 1–2 days. Multi-GPU with balancing and monitoring – 1 week.







