ControlNet for image composition control
ControlNet adds control conditions to Stable Diffusion: human pose, scene depth, object contours, normal map, and segmentation mask. Generation follows a specified structure with complete freedom of style according to the prompt.
Available ControlNet Models
| Type | Input data | Application |
|---|---|---|
| Canny | Canny Borders | Preserve Structure/Outlines |
| Depth | Depth Map (MiDaS) | 3D object location |
| OpenPose | Figure Skeleton (18 points) | Human Poses |
| SoftEdge | Soft Edges (HED) | Soft Stylization |
| Scribble | Sketch | Generate from Sketch |
| Segmentation | Semantic map | Scene object control |
| Normal Map | Normal Map | Detailed Surfaces |
| IP-Adapter | Reference Image | Style/Content Transfer |
Integration via diffusers
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import torch
import cv2
import numpy as np
from PIL import Image
import io
class ControlNetService:
def __init__(self, controlnet_type: str = "canny"):
model_map = {
"canny": "diffusers/controlnet-canny-sdxl-1.0",
"depth": "diffusers/controlnet-depth-sdxl-1.0",
"openpose": "thibaud/controlnet-openpose-sdxl-1.0",
}
controlnet = ControlNetModel.from_pretrained(
model_map[controlnet_type],
torch_dtype=torch.float16
)
self.pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
def generate_from_canny(
self,
input_image: bytes,
prompt: str,
negative_prompt: str = "low quality, blurry",
controlnet_strength: float = 0.8,
steps: int = 30
) -> bytes:
img = Image.open(io.BytesIO(input_image)).convert("RGB")
img_np = np.array(img)
# Canny edge detection
gray = cv2.cvtColor(img_np, cv2.COLOR_RGB2GRAY)
edges = cv2.Canny(gray, threshold1=100, threshold2=200)
control_image = Image.fromarray(edges)
result = self.pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=control_image,
controlnet_conditioning_scale=controlnet_strength,
num_inference_steps=steps,
guidance_scale=8.0
).images[0]
buf = io.BytesIO()
result.save(buf, format="PNG")
return buf.getvalue()
OpenPose — pose generation
from controlnet_aux import OpenposeDetector
class PoseControlledGenerator:
def __init__(self):
self.pose_detector = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
self.controlnet_service = ControlNetService("openpose")
def generate_from_pose(
self,
pose_reference: bytes, # Фото человека как референс позы
prompt: str,
style: str = "photorealistic"
) -> bytes:
ref_image = Image.open(io.BytesIO(pose_reference)).convert("RGB")
# Извлекаем скелет из референса
pose_map = self.pose_detector(ref_image, hand_and_face=True)
result = self.controlnet_service.pipe(
prompt=f"{prompt}, {style}",
image=pose_map,
controlnet_conditioning_scale=1.0,
num_inference_steps=30
).images[0]
buf = io.BytesIO()
result.save(buf, format="PNG")
return buf.getvalue()
Multi-ControlNet (multiple conditions)
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
# Canny + Depth одновременно
controlnets = [
ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16),
ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16)
]
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnets,
torch_dtype=torch.float16
).to("cuda")
result = pipe(
prompt="interior design, modern living room, photorealistic",
image=[canny_image, depth_image],
controlnet_conditioning_scale=[0.7, 0.5], # Веса каждого условия
num_inference_steps=30
).images[0]
Practical applications
Architectural visualization: ControlNet Depth + Canny from drawing → photorealistic render in the specified style.
Fashion: OpenPose model → generate clothes for a given pose without changing the body type.
Product design: SoftEdge sketch → several color variations of the product.
Brand Reimagining: Scribble logo sketch → full color final version.
Lead times: ControlNet API with a single condition type: 2–3 days. A service with multiple conditions and a web interface: 1–2 weeks.







