AI Image Generation Bot Implementation in Mobile Applications
Image generation on mobile — always server-side task. Stable Diffusion on device technically possible on modern iPhone (Core ML) and flagship Android, but 30–60 second generation time and overheating make it unsuitable for production app. Work through API.
Choosing Generative Model
DALL-E 3 (OpenAI). Understands complex text descriptions well, strictly follows prompt. Fixed prices per image. High quality, especially for realistic scenes and illustrations. Limitation: cannot create portraits of real people, strict moderation.
Stable Diffusion via Replicate / Modal / Runpod. Open-source model, runs on rented GPU. Huge flexibility: LoRA adapters for stylization, ControlNet for composition control, inpainting. Cheaper than DALL-E on large volumes. Models: SDXL, Flux.1, SD 3.5.
Midjourney API. No official API — only via unofficial wrapper or Discord integration. Not recommended for production due to instability.
Ideogram / Recraft. Good at text in images (logos, posters) — DALL-E and SD traditionally struggle with this.
Asynchronous Generation: Mandatory Pattern
Generation takes 3–15 seconds depending on model and complexity. Mobile client cannot keep HTTP connection open — network timeout or app backgrounding will interrupt request.
Correct pattern: polling or WebSocket.
Polling:
POST /generate → { "task_id": "abc123", "status": "queued" }
GET /tasks/abc123 → { "status": "processing", "progress": 40 }
GET /tasks/abc123 → { "status": "done", "image_url": "..." }
WebSocket / SSE: server pushes updates as ready.
On Android — WorkManager for background requests with polling, so generation continues even if user backgrounds app. On iOS — URLSession background configuration.
// iOS: background download of result
let config = URLSessionConfiguration.background(withIdentifier: "image.generation")
let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
func pollTaskStatus(taskId: String) {
var components = URLComponents(string: "\(baseURL)/tasks/\(taskId)")!
let task = session.dataTask(with: components.url!) { [weak self] data, _, _ in
guard let result = try? JSONDecoder().decode(GenerationTask.self, from: data!) else { return }
if result.status == "done" {
self?.downloadAndDisplay(url: result.imageUrl!)
} else {
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
self?.pollTaskStatus(taskId: taskId)
}
}
}
task.resume()
}
Prompt Engineering in Mobile Bot
Mobile app users write short requests: "draw a cat," "logo for coffee shop." These are poor prompts for generation. Bot should enrich them before sending to model.
Two approaches:
LLM prompt enhancement. GPT-4o mini takes user request and transforms into detailed prompt with style, lighting, mood details.
PROMPT_ENHANCER = """
You help improve prompts for image generation.
Take a brief description and expand it into detailed prompt.
Add style, lighting, mood. Answer only in English —
that is Stable Diffusion language.
Maximum 200 words.
"""
Parametric UI. User selects style (realism / anime / oil / watercolor) via buttons, then writes description. Style is added to prompt programmatically. Simpler, more predictable.
Negative Prompt and Moderation
For Stable Diffusion negative prompt reduces artifacts. Base: blurry, low quality, deformed, extra limbs, watermark. For children/family apps add content filtering to negative prompt and safety checker server-side.
Before sending user request — OpenAI Moderation API or custom filter. Rejected requests logged.
Generation UI
Generation process should be visually interesting, not just spinner:
- Animated placeholder as progressbar or shimmer effect
- Show stages: "Analyzing request → Generating → Processing"
- After ready — smooth image appearance (fade in / reveal animation)
Result: buttons "Download," "Share," "Generate Again," "Edit Prompt." Gallery of previous generations in session.
Variations and Editing
DALL-E 3 supports inpainting — replacing part of image. Stable Diffusion via img2img allows iteration on base image. For mobile UI this means area selection tool (mask) over image — like eraser.
Implementation Process
Choose generative API for task and app style.
Backend: task queue, asynchronous generation, result storage.
Prompt enrichment via LLM.
Mobile UI with progress animation and gallery.
Test with real user requests, tune moderation.
Timeline Estimates
Basic bot with DALL-E 3 / Replicate API — 1 week. With style variation, img2img, prompt engineering and gallery — 3–4 weeks.







