ControlNet for Generation Composition Control

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
ControlNet for Generation Composition Control
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

ControlNet for image composition control

ControlNet adds control conditions to Stable Diffusion: human pose, scene depth, object contours, normal map, and segmentation mask. Generation follows a specified structure with complete freedom of style according to the prompt.

Available ControlNet Models

Type Input data Application
Canny Canny Borders Preserve Structure/Outlines
Depth Depth Map (MiDaS) 3D object location
OpenPose Figure Skeleton (18 points) Human Poses
SoftEdge Soft Edges (HED) Soft Stylization
Scribble Sketch Generate from Sketch
Segmentation Semantic map Scene object control
Normal Map Normal Map Detailed Surfaces
IP-Adapter Reference Image Style/Content Transfer

Integration via diffusers

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import torch
import cv2
import numpy as np
from PIL import Image
import io

class ControlNetService:
    def __init__(self, controlnet_type: str = "canny"):
        model_map = {
            "canny": "diffusers/controlnet-canny-sdxl-1.0",
            "depth": "diffusers/controlnet-depth-sdxl-1.0",
            "openpose": "thibaud/controlnet-openpose-sdxl-1.0",
        }
        controlnet = ControlNetModel.from_pretrained(
            model_map[controlnet_type],
            torch_dtype=torch.float16
        )
        self.pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            controlnet=controlnet,
            torch_dtype=torch.float16
        ).to("cuda")

    def generate_from_canny(
        self,
        input_image: bytes,
        prompt: str,
        negative_prompt: str = "low quality, blurry",
        controlnet_strength: float = 0.8,
        steps: int = 30
    ) -> bytes:
        img = Image.open(io.BytesIO(input_image)).convert("RGB")
        img_np = np.array(img)

        # Canny edge detection
        gray = cv2.cvtColor(img_np, cv2.COLOR_RGB2GRAY)
        edges = cv2.Canny(gray, threshold1=100, threshold2=200)
        control_image = Image.fromarray(edges)

        result = self.pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            image=control_image,
            controlnet_conditioning_scale=controlnet_strength,
            num_inference_steps=steps,
            guidance_scale=8.0
        ).images[0]

        buf = io.BytesIO()
        result.save(buf, format="PNG")
        return buf.getvalue()

OpenPose — pose generation

from controlnet_aux import OpenposeDetector

class PoseControlledGenerator:
    def __init__(self):
        self.pose_detector = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
        self.controlnet_service = ControlNetService("openpose")

    def generate_from_pose(
        self,
        pose_reference: bytes,  # Фото человека как референс позы
        prompt: str,
        style: str = "photorealistic"
    ) -> bytes:
        ref_image = Image.open(io.BytesIO(pose_reference)).convert("RGB")

        # Извлекаем скелет из референса
        pose_map = self.pose_detector(ref_image, hand_and_face=True)

        result = self.controlnet_service.pipe(
            prompt=f"{prompt}, {style}",
            image=pose_map,
            controlnet_conditioning_scale=1.0,
            num_inference_steps=30
        ).images[0]

        buf = io.BytesIO()
        result.save(buf, format="PNG")
        return buf.getvalue()

Multi-ControlNet (multiple conditions)

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

# Canny + Depth одновременно
controlnets = [
    ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16),
    ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16)
]

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnets,
    torch_dtype=torch.float16
).to("cuda")

result = pipe(
    prompt="interior design, modern living room, photorealistic",
    image=[canny_image, depth_image],
    controlnet_conditioning_scale=[0.7, 0.5],  # Веса каждого условия
    num_inference_steps=30
).images[0]

Practical applications

Architectural visualization: ControlNet Depth + Canny from drawing → photorealistic render in the specified style.

Fashion: OpenPose model → generate clothes for a given pose without changing the body type.

Product design: SoftEdge sketch → several color variations of the product.

Brand Reimagining: Scribble logo sketch → full color final version.

Lead times: ControlNet API with a single condition type: 2–3 days. A service with multiple conditions and a web interface: 1–2 weeks.