Development of AI-based music and audio generation systems
AI music generation automates the creation of background music, jingles, and sound effects for content, games, and advertising. It replaces stock music licensing and studio recording for those with basic requirements.
Platform Comparison
| Platform | API | Type | Manageability | License |
|---|---|---|---|---|
| Suno v4 | REST (limited) | Song + vocals | Text prompt | Varies by plan |
| Udio | REST | Song + vocals | High | Commercial |
| MusicGen (Meta) | Self-hosted | Instrumental | High | MIT/CC |
| AudioCraft | Self-hosted | Music + SFX | High | MIT |
| Stable Audio | REST/self | Instrumental | High | Commercial |
AudioCraft / MusicGen — self-hosted
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
import torch
class MusicGenerator:
def __init__(self, model_size: str = "medium"):
# Размеры: small (300M), medium (1.5B), large (3.3B), melody
self.model = MusicGen.get_pretrained(f"facebook/musicgen-{model_size}")
self.model.set_generation_params(
duration=30, # секунды (max 30 для standard, до 120 через chunking)
temperature=1.0, # 0.5–1.5
top_k=250,
top_p=0.0,
cfg_coef=3.0 # adherence to prompt
)
def generate(
self,
description: str,
duration: int = 30,
temperature: float = 1.0
) -> bytes:
self.model.set_generation_params(duration=duration, temperature=temperature)
wav = self.model.generate(
descriptions=[description],
progress=True
)
import io
import torchaudio
buf = io.BytesIO()
torchaudio.save(buf, wav[0].cpu(), sample_rate=32000, format="mp3")
return buf.getvalue()
def generate_with_melody(
self,
description: str,
melody_audio: bytes,
duration: int = 30
) -> bytes:
"""Генерируем музыку по мотивам референсной мелодии"""
import io
import torchaudio
melody_wav, sr = torchaudio.load(io.BytesIO(melody_audio))
model = MusicGen.get_pretrained("facebook/musicgen-melody")
model.set_generation_params(duration=duration)
wav = model.generate_with_chroma(
descriptions=[description],
melody_wavs=melody_wav.unsqueeze(0),
melody_sample_rate=sr,
progress=True
)
buf = io.BytesIO()
torchaudio.save(buf, wav[0].cpu(), sample_rate=32000, format="mp3")
return buf.getvalue()
Sound Effect Generation (AudioGen)
from audiocraft.models import AudioGen
sfx_model = AudioGen.get_pretrained("facebook/audiogen-medium")
sfx_model.set_generation_params(duration=5)
def generate_sound_effect(description: str, duration: float = 3.0) -> bytes:
sfx_model.set_generation_params(duration=duration)
wav = sfx_model.generate(descriptions=[description])
import io, torchaudio
buf = io.BytesIO()
torchaudio.save(buf, wav[0].cpu(), sample_rate=16000, format="wav")
return buf.getvalue()
# Примеры: "forest ambience with birds", "robot beeping", "door creaking"
Contextual Applications
| Application | Recommended Platform | Parameters |
|---|---|---|
| Background Music for Videos | MusicGen medium/large | "ambient, {mood}, {tempo}" |
| Jingle for advertising | Suno/Udio (with vocals) | Specific brand prompt |
| Game Sounds | AudioGen | Specific SFX Descriptions |
| Music for the mood of the scene | MusicGen melody | Reference + description |
| Podcast Intro/Outro | Stable Audio | "podcast intro, {genre}, 15 seconds" |
FastAPI service
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
music_gen = MusicGenerator("medium")
class MusicRequest(BaseModel):
description: str
duration: int = 30
temperature: float = 1.0
@app.post("/generate/music")
async def generate_music(req: MusicRequest):
audio = music_gen.generate(req.description, req.duration, req.temperature)
return Response(content=audio, media_type="audio/mpeg")
Delivery time: Self-hosted MusicGen API – 1–2 days. Platform with multiple models, queue, and CDN storage – 2–3 weeks.







