AI Music and Audio Generation System Development

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
AI Music and Audio Generation System Development
Medium
~1-2 weeks
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Development of AI-based music and audio generation systems

AI music generation automates the creation of background music, jingles, and sound effects for content, games, and advertising. It replaces stock music licensing and studio recording for those with basic requirements.

Platform Comparison

Platform API Type Manageability License
Suno v4 REST (limited) Song + vocals Text prompt Varies by plan
Udio REST Song + vocals High Commercial
MusicGen (Meta) Self-hosted Instrumental High MIT/CC
AudioCraft Self-hosted Music + SFX High MIT
Stable Audio REST/self Instrumental High Commercial

AudioCraft / MusicGen — self-hosted

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
import torch

class MusicGenerator:
    def __init__(self, model_size: str = "medium"):
        # Размеры: small (300M), medium (1.5B), large (3.3B), melody
        self.model = MusicGen.get_pretrained(f"facebook/musicgen-{model_size}")
        self.model.set_generation_params(
            duration=30,        # секунды (max 30 для standard, до 120 через chunking)
            temperature=1.0,    # 0.5–1.5
            top_k=250,
            top_p=0.0,
            cfg_coef=3.0        # adherence to prompt
        )

    def generate(
        self,
        description: str,
        duration: int = 30,
        temperature: float = 1.0
    ) -> bytes:
        self.model.set_generation_params(duration=duration, temperature=temperature)

        wav = self.model.generate(
            descriptions=[description],
            progress=True
        )

        import io
        import torchaudio
        buf = io.BytesIO()
        torchaudio.save(buf, wav[0].cpu(), sample_rate=32000, format="mp3")
        return buf.getvalue()

    def generate_with_melody(
        self,
        description: str,
        melody_audio: bytes,
        duration: int = 30
    ) -> bytes:
        """Генерируем музыку по мотивам референсной мелодии"""
        import io
        import torchaudio
        melody_wav, sr = torchaudio.load(io.BytesIO(melody_audio))

        model = MusicGen.get_pretrained("facebook/musicgen-melody")
        model.set_generation_params(duration=duration)

        wav = model.generate_with_chroma(
            descriptions=[description],
            melody_wavs=melody_wav.unsqueeze(0),
            melody_sample_rate=sr,
            progress=True
        )

        buf = io.BytesIO()
        torchaudio.save(buf, wav[0].cpu(), sample_rate=32000, format="mp3")
        return buf.getvalue()

Sound Effect Generation (AudioGen)

from audiocraft.models import AudioGen

sfx_model = AudioGen.get_pretrained("facebook/audiogen-medium")
sfx_model.set_generation_params(duration=5)

def generate_sound_effect(description: str, duration: float = 3.0) -> bytes:
    sfx_model.set_generation_params(duration=duration)
    wav = sfx_model.generate(descriptions=[description])

    import io, torchaudio
    buf = io.BytesIO()
    torchaudio.save(buf, wav[0].cpu(), sample_rate=16000, format="wav")
    return buf.getvalue()

# Примеры: "forest ambience with birds", "robot beeping", "door creaking"

Contextual Applications

Application Recommended Platform Parameters
Background Music for Videos MusicGen medium/large "ambient, {mood}, {tempo}"
Jingle for advertising Suno/Udio (with vocals) Specific brand prompt
Game Sounds AudioGen Specific SFX Descriptions
Music for the mood of the scene MusicGen melody Reference + description
Podcast Intro/Outro Stable Audio "podcast intro, {genre}, 15 seconds"

FastAPI service

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
music_gen = MusicGenerator("medium")

class MusicRequest(BaseModel):
    description: str
    duration: int = 30
    temperature: float = 1.0

@app.post("/generate/music")
async def generate_music(req: MusicRequest):
    audio = music_gen.generate(req.description, req.duration, req.temperature)
    return Response(content=audio, media_type="audio/mpeg")

Delivery time: Self-hosted MusicGen API – 1–2 days. Platform with multiple models, queue, and CDN storage – 2–3 weeks.