Implementing Voice Messages in Mobile Chat
Voice messages — technically the most demanding media feature in chat. Recording, encoding, upload, playback with waveform visualization, speed playback — each stage requires precise work with platform audio APIs.
Audio Recording
On iOS use AVAudioRecorder with codec settings. Optimal parameters for voice messages:
AVFormatIDKey: kAudioFormatMPEG4AAC
AVSampleRateKey: 16000 // sufficient for speech
AVNumberOfChannelsKey: 1 // mono
AVEncoderAudioQualityKey: AVAudioQuality.medium
AAC in mono 16 kHz gives ~20–30 KB per minute — compact and decodes natively on Android and in browsers. M4A format (AAC container) is supported natively on both platforms.
Request microphone permission early through AVAudioSession.requestRecordPermission(), not at recording button tap. If user refuses — Info.plist should have NSMicrophoneUsageDescription with clear explanation.
Important with AVAudioSession: before recording, activate session with .record or .playAndRecord category with .defaultToSpeaker option. Without explicit switch — recording may conflict with music playback through AirPods.
On Android — MediaRecorder with AudioSource.MIC, OutputFormat.MPEG_4, AudioEncoder.AAC. With Android 10+ need RECORD_AUDIO permission through ActivityResultContracts.RequestPermission(). MediaRecorder requires exact call order: setAudioSource → setOutputFormat → setAudioEncoder → prepare → start — wrong order means IllegalStateException at runtime, not compile time.
Waveform Visualization
This is what separates good implementation from mediocre. Drawing waveform in realtime during recording — more complex than it seems.
On iOS get amplitude through AVAudioRecorder.averagePower(forChannel: 0) with updateMeters() calls by timer every 50–100 ms. Value in dB from -160 to 0, normalize to 0..1: pow(10, power / 20). Draw through CAShapeLayer or SwiftUI Canvas — latter easier to animate without setNeedsDisplay.
On Android — MediaRecorder.getMaxAmplitude() returns 0–32767. Collect in array by timer through Handler.postDelayed(), draw through Canvas.drawRect() in custom View or through Compose Canvas.
On playback show waveform as static histogram with playhead position. Update position through AVPlayer.addPeriodicTimeObserver (iOS) or ExoPlayer.addListener with onPlaybackPositionChanged (Android) — not through UI timer.
Upload and Playback
Voice message usually 5–60 seconds — this is 2–200 KB in AAC. Upload like regular file, with one nuance: on iOS when playing from URL, switch AVAudioSession back to .playback or .playAndRecord category, otherwise sound goes to earpiece (receiver), not speaker.
Speed playback (1.5×, 2×) — through AVPlayer.rate = 1.5 on iOS and ExoPlayer.setPlaybackParameters(PlaybackParameters(1.5f)) on Android. Both APIs work without artifacts on speech thanks to pitch correction.
Client caching — mandatory. Requeesting server on each playback — poor UX. Save in Library/Caches (iOS) or getCacheDir() (Android) with cache size limit.
Common Mistakes
Don't finish AVAudioRecorder.stop() before trying to upload file — file will be incomplete or corrupted. On Android — MediaRecorder.stop() + release() before opening file for reading.
Using AudioRecord instead of MediaRecorder on Android without need — AudioRecord gives PCM without compression, files huge, needs manual AAC encoder through MediaCodec.
Timeframe
Recording, AAC encoding, upload, playback with progress — 2–3 days. Waveform in realtime + during playback — another 1–2 days. Cost calculated individually.







