How to record audio on iOS?

We use AVAudioRecorder with AAC, 16 kHz, mono. Before recording, request permission via AVAudioSession.requestRecordPermission() and set the category to .record or .playAndRecord.

How to visualize waveform in real time?

On iOS, get amplitude via averagePower(forChannel:) and draw using CAShapeLayer or SwiftUI Canvas. On Android, use getMaxAmplitude() and draw in a custom View or Compose Canvas.

Why did you choose AAC codec?

AAC provides good quality at low bitrate — about 20-30 KB per minute. It is natively supported on iOS, Android, and browsers, simplifying compatibility.

How to implement accelerated playback without artifacts?

On iOS, use AVPlayer.rate; on Android, ExoPlayer.setPlaybackParameters. These APIs support pitch correction, keeping the voice natural even at 2x speed.

What permissions are required for voice messages?

On iOS — NSMicrophoneUsageDescription in Info.plist; on Android — RECORD_AUDIO (since Android 6+) and, for version 10+, request via ActivityResultContracts.RequestPermission().

How to record audio on iOS?

We use AVAudioRecorder with AAC, 16 kHz, mono. Before recording, request permission via AVAudioSession.requestRecordPermission() and set the category to .record or .playAndRecord.

How to visualize waveform in real time?

On iOS, get amplitude via averagePower(forChannel:) and draw using CAShapeLayer or SwiftUI Canvas. On Android, use getMaxAmplitude() and draw in a custom View or Compose Canvas.

Why did you choose AAC codec?

AAC provides good quality at low bitrate — about 20-30 KB per minute. It is natively supported on iOS, Android, and browsers, simplifying compatibility.

How to implement accelerated playback without artifacts?

On iOS, use AVPlayer.rate; on Android, ExoPlayer.setPlaybackParameters. These APIs support pitch correction, keeping the voice natural even at 2x speed.

What permissions are required for voice messages?

On iOS — NSMicrophoneUsageDescription in Info.plist; on Android — RECORD_AUDIO (since Android 6+) and, for version 10+, request via ActivityResultContracts.RequestPermission().

Implementing Voice Messages in Mobile Chat: Recording, Visualization, Playback

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Services we offer

Showing 1 of 1All 1734 services

Implementing Voice Messages in Mobile Chat: Recording, Visualization, Playback

Medium

~2-3 days

Frequently Asked Questions

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
858
Development of a mobile application for XOOMER
743
Development of a mobile application for RHL
1160
Development of a mobile application for ZIPPY
1034
Development of a mobile application for Affhome
968
Development of a mobile application for the FLAVORS company
562

Show more works

Voice messages are the most technically demanding feature among chat media. Recording, encoding, uploading, playback with waveform visualization, accelerated playback — each step requires precise handling of the platform’s audio API. On average, we save you up to 40 hours of in-house development, and the implementation budget is calculated individually. Our approach, based on AAC, compresses audio 10 times better than WAV while preserving speech intelligibility. Getting smooth waveform visualization with no lag and correct audio session switching during screen rotation is particularly tricky. Our experience with 30+ projects shows that without a systematic approach, you easily run into session conflicts, corrupted files, or UX gaps. We implement voice messages turnkey: from designing the audio pipeline to testing on real devices. Get a free assessment of your project — contact our engineers.

How to Implement Voice Messages in a Mobile Chat?

Prepare permissions: request microphone access (NSMicrophoneUsageDescription on iOS, RECORD_AUDIO on Android) with a clear explanation to the user. On iOS — via AVAudioSession.requestRecordPermission(), on Android — via ActivityResultContracts.RequestPermission().
Configure audio session: on iOS, set category .record or .playAndRecord with option .defaultToSpeaker. On Android, initialize MediaRecorder with the correct call order.
Record: on iOS — AVAudioRecorder with AAC, 16 kHz, mono. On Android — MediaRecorder with AudioSource.MIC, OutputFormat.MPEG_4, AudioEncoder.AAC.
Waveform visualization: on iOS, get amplitude via averagePower(forChannel:) and draw using CAShapeLayer or SwiftUI Canvas. On Android — via getMaxAmplitude() and a custom View or Compose Canvas.
Send: upload the compressed M4A file to the server via REST or GraphQL.
Playback with caching: download and save to Library/Caches (iOS) or getCacheDir() (Android), play with speed control (1.5×, 2×) via AVPlayer.rate or ExoPlayer.setPlaybackParameters.

Why Waveform Visualization Matters for UX

Waveform visualization is what distinguishes a good implementation from a mediocre one. The user sees that recording is in progress, can estimate the length and dynamics of the message. Without the waveform, the message feels blind — it’s unclear if there’s silence or active speech. We draw the waveform in real time during recording and statically with a playhead during playback. On iOS, we get amplitude via AVAudioRecorder.averagePower(forChannel: 0) with updateMeters() calls on a timer every 50–100 ms. The value is in dB from -160 to 0, normalized to 0..1: pow(10, power / 20). Draw using CAShapeLayer or SwiftUI Canvas — the latter is easier to animate without setNeedsDisplay. On Android, MediaRecorder.getMaxAmplitude() returns a value 0–32767. We collect it into an array via Handler.postDelayed() and draw using Canvas.drawRect() in a custom View or through Compose Canvas.

Recording Audio: iOS vs Android

Parameter	iOS (AVAudioRecorder)	Android (MediaRecorder)
Format	AAC (MPEG4AAC)	AAC (MPEG_4)
Sample rate	16000 Hz	16000 Hz
Channels	Mono	Mono
Quality	medium	(default)
Permission	NSMicrophoneUsageDescription	RECORD_AUDIO (ActivityResultContracts)
Session	AVAudioSession (.record/.playAndRecord)	Handle prepare() errors

iOS

Optimal parameters:

AVFormatIDKey: kAudioFormatMPEG4AAC
AVSampleRateKey: 16000  // sufficient for speech
AVNumberOfChannelsKey: 1  // mono
AVEncoderAudioQualityKey: AVAudioQuality.medium

AAC mono 16 kHz gives ~20–30 KB per minute — compact and decodable on Android and in the browser. The M4A container (for AAC) is natively supported on both platforms. Permission for microphone should be requested in advance via AVAudioSession.requestRecordPermission(), not at the moment the record button is pressed. If the user declines, Info.plist must contain NSMicrophoneUsageDescription with a clear explanation. An important point with AVAudioSession: before starting recording, activate the session with category .record or .playAndRecord with option .defaultToSpeaker. If you don't make this switch explicitly, recording may conflict with music playback through AirPods. According to Apple’s documentation, you also need to handle interruptions (e.g., a phone call).

Android

MediaRecorder with AudioSource.MIC, OutputFormat.MPEG_4, AudioEncoder.AAC. On Android 10+, permission RECORD_AUDIO is required via ActivityResultContracts.RequestPermission(). MediaRecorder requires a precise order of calls: setAudioSource → setOutputFormat → setAudioEncoder → prepare → start — mixing up the order means an IllegalStateException at runtime, not compile time.

How to Accelerate Playback Without Losing Quality?

Accelerated playback (1.5×, 2×) is done via AVPlayer.rate = 1.5 on iOS and ExoPlayer.setPlaybackParameters(PlaybackParameters(1.5f)) on Android. Both APIs work without artifacts on speech thanks to pitch correction. Voice messages are typically 5–60 seconds — that’s 2–200 KB in AAC. Upload as a regular file, but with one nuance: on iOS, when playing from a URL, you need to switch the AVAudioSession back to category .playback or .playAndRecord, otherwise the sound will go to the earpiece, not the speaker. Caching on the client is mandatory. Requesting the server again at every playback is poor UX. Save to Library/Caches (iOS) or getCacheDir() (Android) with a limit on total cache size.

Common Mistakes

Mistake	Consequence	Solution
Not calling stop() before upload	Corrupted file	Call stop() + release() before reading
Using AudioRecord instead of MediaRecorder	Huge uncompressed files	Use MediaRecorder with AAC
Not switching audio session to playback	Sound in earpiece	Explicitly set category to playback

For voice compression, AAC provides 10 times better compression than WAV with virtually indistinguishable quality for speech. OPUS is even more efficient but requires an additional library on iOS. We choose AAC for its native support.

What’s Included

Analysis of current chat architecture
Design of audio pipeline (record → encode → upload → cache → playback)
Integration of recording/playback with waveform visualization
Configuration of accelerated playback and progress indication
Testing on real devices (iOS + Android)
Delivery of documentation and source code
Post-deployment support (2 weeks)

Timeline

Basic implementation (recording, encoding, upload, playback with progress) — 2–3 days. Real-time waveform + waveform during playback — additional 1–2 days. Cost is calculated individually. Get a consultation on voice message integration — our engineers will help assess the scope of work.

How to Choose a Camera Approach on Mobile Platforms?

Apps where users capture, listen, or watch are technically among the most demanding. We deal with this every day. Not because of API complexity, but due to hardware differences: on a flagship, the camera works perfectly; on a budget device with a non-standard Camera HAL, artifacts and failures occur. On iOS, stabilization differs between generations. Platform differences account for 80% of all media development complexity. Our experience: 7+ years in mobile media and over 40 implemented projects with camera, audio, and video.

What are the Differences Between CameraX, Camera2, and AVFoundation?

On Android, the Camera2 API was long the only adequate choice for custom cameras. It is a low-level API with CaptureRequest, CameraCharacteristics, ImageReader — powerful but verbose. Even a preview with correct aspect ratio and proper orientation takes several hundred lines of code.

CameraX (Jetpack) is a wrapper around Camera2 with automatic device adaptation. Preview, ImageCapture, ImageAnalysis, VideoCapture — four use cases that can be combined. It handles orientation, aspect ratio, and lifecycle for you: bind to a LifecycleOwner and forget about closing the camera when the app goes to background. In recent versions, CameraX includes Extensions API for bokeh, night mode, HDR — using native manufacturer algorithms via a unified interface.

When is Camera2 needed directly?: RAW capture via ImageFormat.RAW_SENSOR, manual control of ISO/shutter speed/focus, or when CameraX Extensions API is not supported and a custom ML pipeline in ImageAnalysis is required.

On iOS, AVFoundation is the only path for a custom camera. AVCaptureSession with AVCaptureDeviceInput and the required output (AVCapturePhotoOutput, AVCaptureVideoDataOutput, AVCaptureMovieFileOutput). For real-time video processing — AVCaptureVideoDataOutput + CVPixelBuffer in captureOutput(_:didOutput:from:) on a background queue. This is where CoreML models receive frames for inference.

A typical mistake with AVFoundation: configuring the session on the main thread. beginConfiguration() / commitConfiguration() should be called on a background thread. Otherwise, the preview freezes, and the user sees a frozen UI. This mistake appears in 70% of the projects we have audited.

Why is AudioFocus Critical for Android Apps?

Audio on mobile platforms requires correct management of the sound lifecycle. AudioFocus is a coordination mechanism between apps. AudioManager.requestAudioFocus() with OnAudioFocusChangeListener. If you don't handle AUDIOFOCUS_LOSS_TRANSIENT (pause) and AUDIOFOCUS_LOSS (stop) — your app will play over a phone call. That guarantees a bad review on Google Play. Android Developer Guide: AudioFocus

On iOS, AudioSession categories define behavior: playback — for players (continues playing when screen is locked), record — for recording, muting other sources, playAndRecord — for voice messages. Wrong category — the app mutes the user's background music on start.

AVAudioEngine — modern API for audio processing: a graph of nodes (mixers, equalizers), taps for buffer capture. For real-time speech — SFSpeechRecognizer + inputNode.installTap.

On Android for recording with noise suppression — NoiseSuppressor.isAvailable() + create(audioRecord.audioSessionId). Works not on all devices, need a fallback.

Video: Playback and Streaming

ExoPlayer (Media3) — standard for Android. Supports HLS, DASH, SmoothStreaming, progressive playback. DefaultTrackSelector with Parameters allows manual or adaptive quality selection. DRM via DefaultDrmSessionManager with Widevine L1/L3.

Almost everyone faces this problem: ExoPlayer in RecyclerView with fast scrolling. Need a PlayerPool — a pool of reusable players. Without a pool, each new instance creates a MediaCodec instance, which is expensive and leads to MediaCodec$CodecException: Error -19 on some Android 10 devices with more than 3 simultaneous instances.

AVPlayer / AVPlayerViewController on iOS — for playback. For custom UI — AVPlayerLayer + custom controls. HLS works natively via AVPlayer(url:) with m3u8. FairPlay DRM requires a server part: AVContentKeySession, CKC response from KSM server, resource delegate.

For Flutter — video_player as a base layer, chewie for UI. For serious tasks — a platform channel to native ExoPlayer/AVPlayer (due to DRM and subtitles).

Protocol	Latency	Application
RTMP	2–5 sec	Streaming to YouTube/Twitch
HLS	6–30 sec	VOD, broadcast
DASH	6–30 sec	VOD with adaptive bitrate
WebRTC	< 500 ms	Video calls, P2P
SRT	1–4 sec	Professional streaming

WebRTC on mobile — via native frameworks or flutter_webrtc. The real complexity is not in the protocol itself, but in signaling and TURN servers. Without TURN, clients behind symmetric NAT won't establish a connection — that's about 15–20% of traffic. Coturn is the standard open-source server.

RTMP publishing on mobile: LFLiveKit for iOS, HaishinKit as a more modern alternative. On Android — rtmp-rtsp-stream-client-java or via FFmpeg with JNI. The latter gives maximum flexibility but increases the binary by 10–15 MB.

Media Processing: Compression and Transcoding

ProRes video can take up to 6 GB/minute. Compression is needed before upload. On iOS — AVAssetExportSession with a 1920×1080 preset or custom AVVideoComposition. VideoToolbox for hardware H264/HEVC encoding — faster and more battery-efficient.

On Android — MediaCodec directly or Transformer (Media3) — a high-level API for transformations (trimming, resizing, effects via GlEffectsFrameProcessor). For images — BitmapFactory.Options.inSampleSize for downsampling, Glide / Coil for caching. Coil on Coroutines fits well with Compose. Loading a 12 MP original into an ImageView of 200×200dp — a classic OutOfMemoryError on devices with 2 GB RAM.

How to Implement Streaming on Mobile Devices: Step-by-Step Plan

Define requirements: target latency, number of concurrent users, need for P2P.
Choose protocol and stack: WebRTC for video calls, RTMP/HLSLive for broadcasting.
Set up signaling (SIP, WebSocket, MQTT) and TURN server.
Implement publishing/viewing via native API or cross-platform plugin.
Test on real devices with different cameras and network conditions.
Optimize bitrate and resolution based on bandwidth.

Typical Mistakes in Media Feature Development

Configuring AVFoundation session on the main thread.
Missing AudioFocus Loss handling on Android.
Ignoring MediaCodec limitations on cheap devices.
Using emulator for camera tests — emulator does not replicate HAL issues.
Memory leaks when recreating media players without a pool.

What is Included in the Work

Deliverable	Description
Requirements analysis	Stack selection, priorities, test devices
Design	Architecture, data flow diagrams, API selection
Implementation	Code using chosen tools
Backend integration	GraphQL/REST, DRM, WebRTC signaling
Testing	On real devices (at least 5 models)
Documentation	API documentation, build instructions
Post-release support	1 month incident support, team training

Development Process for Media Functionality

Complexity is non-linear: basic video playback — 1–2 days, custom camera with frame processing and streaming — 3–5 weeks. We start by clarifying requirements: DRM, formats, minimum OS, background mode support. Testing on real hardware is mandatory — the emulator does not replicate Camera HAL, hardware codec, and AudioFocus issues. Minimum set: latest iPhone, iPhone SE, flagship Samsung, budget Android, Android Go (if target audience is developing markets).

Timeline estimate: from 5 business days (basic playback) to 8 weeks (complex camera with streaming and DRM). Cost is calculated individually after analyzing your requirements — contact us for a consultation.

Our service: "Mobile Media Integration" — this is our expertise. Every project starts with an audit of the current implementation, identifying bottlenecks, and proposing an optimal stack.

Commercial signals: order an audit of your media functionality, get a free consultation from an engineer.