Implementing AI Sound Effects Generation in a Mobile App
Sound effects by description — separate task from music generation. Need not melody track, but short (0.5–5 sec) specific sound: sword hit, rain noise, button click. ElevenLabs Sound Effects and AudioCraft (EnCodec + AudioGen) — main tools.
ElevenLabs Sound Effects API
Best by quality and integration simplicity. Accepts text description, returns mp3 in 2–8 seconds:
POST https://api.elevenlabs.io/v1/sound-generation
xi-api-key: <key>
Content-Type: application/json
{
"text": "A heavy metal sword hitting a stone floor with a sharp clang and short reverb",
"duration_seconds": 2.0,
"prompt_influence": 0.3
}
prompt_influence from 0 to 1: higher means literal prompt interpretation. For short effects (< 1 sec) set 0.7–0.9.
Response — binary mp3 in response body (not JSON with URL). On mobile:
// iOS: direct binary response download
func generateSoundEffect(description: String, duration: Double) async throws -> Data {
var request = URLRequest(url: URL(string: "https://api.elevenlabs.io/v1/sound-generation")!)
request.httpMethod = "POST"
request.setValue("audio/mpeg", forHTTPHeaderField: "Accept")
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
request.setValue(apiKey, forHTTPHeaderField: "xi-api-key")
request.httpBody = try JSONEncoder().encode(SoundGenRequest(
text: description, duration_seconds: duration, prompt_influence: 0.4
))
let (data, response) = try await URLSession.shared.data(for: request)
guard (response as? HTTPURLResponse)?.statusCode == 200 else {
throw SoundGenError.apiError
}
return data // mp3 bytes
}
Generation time small — simple activity indicator sufficient, no complex async polling needed.
Cache generated sounds
Same sound effect user may use multiple times — regenerating each time expensive and slow. Cache by prompt hash + duration:
// Android
class SoundEffectCache(private val cacheDir: File) {
private fun cacheKey(prompt: String, duration: Double): String =
"${prompt.hashCode()}_${(duration * 10).toInt()}.mp3"
fun getCached(prompt: String, duration: Double): File? {
val file = File(cacheDir, "sfx/${cacheKey(prompt, duration)}")
return if (file.exists()) file else null
}
fun saveToCache(prompt: String, duration: Double, data: ByteArray): File {
val dir = File(cacheDir, "sfx").also { it.mkdirs() }
val file = File(dir, cacheKey(prompt, duration))
file.writeBytes(data)
return file
}
}
UI patterns: library vs inline generation
Two UX approaches:
- Inline — user types description, generates immediately in-context (editor, game)
- Library — pre-generated effects catalog, user browses and selects
Inline simpler but slower UX. Library requires backend catalog management but instant playback.
Timeline
Basic generation + playback UI — 2–3 days. With cache, library UI, batch generation — 1–2 weeks.







