Fine-tuning Stable Diffusion via DreamBooth
DreamBooth allows training SD on a specific subject (person, product, style, character) from 5–20 photographs. After training, the model generates the subject in arbitrary scenarios while maintaining recognizability.
Applications
- Branded product: sneakers in different scenes — outdoors, in the city, in a studio
- Avatars: a person's face in different styles — anime, oil painting, cartoon
- Character: a game character in new situations
- Artist style: transferring an artistic style to new scenes
Diffusers DreamBooth Training
pip install accelerate diffusers transformers bitsandbytes
# Training script for SDXL DreamBooth
accelerate launch train_dreambooth_lora_sdxl.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --instance_data_dir="./training_images" --output_dir="./dreambooth_output" --instance_prompt="a photo of sks person" --resolution=1024 --train_batch_size=1 --gradient_accumulation_steps=4 --learning_rate=1e-4 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=500 --seed=42 --mixed_precision="fp16"
Dataset Preparation
from PIL import Image
import os
def prepare_dreambooth_dataset(source_images, output_dir, target_size=1024):
os.makedirs(output_dir, exist_ok=True)
for i, img_path in enumerate(source_images):
img = Image.open(img_path).convert("RGB")
width, height = img.size
min_dim = min(width, height)
left = (width - min_dim) // 2
top = (height - min_dim) // 2
img_cropped = img.crop((left, top, left + min_dim, top + min_dim))
img_resized = img_cropped.resize((target_size, target_size), Image.LANCZOS)
img_resized.save(f"{output_dir}/{i:03d}.jpg", quality=95)
print(f"Prepared {len(source_images)} images in {output_dir}")
Hyperparameters and Tips
| Parameter | Recommendation | Effect |
|---|---|---|
| Training steps | 200–1000 | > 1000 — overfitting |
| Learning rate | 1e-4 to 1e-5 | Lower = more stable |
| Images | 5–20 | Different angles/lighting |
| Prior preservation | Yes | Prevents language drift |
| Batch size | 1–2 | Limited by VRAM |
Timeline: DreamBooth LoRA training (~500 steps on RTX 3090) — 15–30 minutes. Service with user image upload and avatar generation — 2–4 weeks.







