Working with Media in Mobile Applications: Camera, Audio, Video, Streaming
Applications where users shoot, listen, or watch — technically among the most demanding. Not because APIs are complex, but because hardware is different: everything works on Pixel 7, but not on Redmi Note 8 with non-standard Camera HAL. On iPhone 14 stabilization works via VideoToolbox, on iPhone SE 1st generation that API doesn't exist. These platform differences determine 80% of media development complexity.
Camera: CameraX vs Camera2 and AVFoundation
On Android, for a long time Camera2 API was the only adequate option for custom cameras. This is a low-level API with CaptureRequest, CameraCharacteristics, ImageReader — powerful but verbose. Just a preview with correct aspect ratio and orientation took several hundred lines of code.
CameraX (Jetpack) — wrapper over Camera2 adapted for the device. Preview, ImageCapture, ImageAnalysis, VideoCapture — four use cases that combine. CameraX solves orientation, aspect ratio, and lifecycle for you: bind to LifecycleOwner and don't think about whether camera is closed when app is backgrounded. In 2023–2024, CameraX got Extensions API for bokeh, night mode, HDR — manufacturer's native algorithms via single interface.
When Camera2 directly is needed: RAW capture via ImageFormat.RAW_SENSOR, manual control of ISO/shutter/focus for professional apps, or when CameraX Extensions API isn't supported and custom ML pipeline is needed in ImageAnalysis.
iOS AVFoundation is the only path for custom camera. AVCaptureSession with AVCaptureDeviceInput and needed output (AVCapturePhotoOutput, AVCaptureVideoDataOutput, AVCaptureMovieFileOutput). For real-time video processing — AVCaptureVideoDataOutput + CVPixelBuffer in captureOutput(_:didOutput:from:) on videoDataOutputQueue. This is where CoreML models get frames for inference.
Typical AVFoundation mistake: configuring session on main thread. beginConfiguration() / commitConfiguration() must be called on background thread. If on main — preview freezes during config, user sees second-long interface freeze.
Video: Playback and Streaming
ExoPlayer (now Media3 ExoPlayer) — standard for Android. Supports HLS, DASH, SmoothStreaming, progressive playback. DefaultTrackSelector with Parameters lets you choose quality manually or adaptively. DRM via DefaultDrmSessionManager with Widevine L1/L3.
Problem most face: ExoPlayer in RecyclerView with fast scrolling. Need PlayerPool — pool of reusable players that switch between visible items. Without pool each new ExoPlayer creates MediaCodec instance, expensive and causes MediaCodec$CodecException: Error -19 on some Android 10 devices with >3 simultaneous instances.
AVPlayer / AVPlayerViewController on iOS — for playback. For custom UI with controls — AVPlayerLayer + your own buttons. HLS streaming works natively via AVPlayer(url:) with m3u8 link. FairPlay DRM requires server side: AVContentKeySession, CKC response from KSM server, custom AVAssetResourceLoaderDelegate.
For Flutter — video_player plugin as base layer, chewie for ready UI. For serious tasks — platform channel to native ExoPlayer/AVPlayer, because video_player doesn't support DRM and has subtitle limitations.
Audio: Recording, Playback, Background Mode
On iOS AudioSession categories define behavior: playback — for players (continues when screen locked), record — for recording with other sources disabled, playAndRecord — for voice messages and VoIP. Wrong category — app mutes user's background music on start, immediate negative.
AVAudioEngine — modern API for audio processing: node graph (mixers, equalizers, pitch-shifting), taps for real-time audio buffer capture. For speech recognition in real-time — SFSpeechRecognizer + AVAudioEngine.inputNode.installTap.
On Android AudioFocus — coordination mechanism between apps. AudioManager.requestAudioFocus() with OnAudioFocusChangeListener. If you don't handle AUDIOFOCUS_LOSS_TRANSIENT (pause) and AUDIOFOCUS_LOSS (stop) — your app plays over phone call. Guaranteed bad review on Google Play.
For recording with noise suppression on Android — NoiseSuppressor.isAvailable() + NoiseSuppressor.create(audioRecord.audioSessionId). Works not on all devices, need graceful fallback.
Streaming: RTMP, WebRTC, HLS
| Protocol | Latency | Application |
|---|---|---|
| RTMP | 2–5 sec | Streaming to YouTube/Twitch |
| HLS | 6–30 sec | VOD, broadcast streaming |
| DASH | 6–30 sec | VOD with adaptive bitrate |
| WebRTC | < 500 ms | Video calls, P2P streams |
| SRT | 1–4 sec | Professional streaming |
WebRTC on mobile — via WebRTC.framework (iOS) and libwebrtc.aar (Android) or flutter_webrtc plugin. Real complexity — not WebRTC itself, but signaling and TURN servers. Without TURN, clients behind symmetric NAT won't establish connection — that's ≈15–20% of traffic. Coturn — standard open-source TURN server.
RTMP publishing on mobile: LFLiveKit for iOS (Swift wrapper, supports H264+AAC), HaishinKit as more modern alternative. On Android — rtmp-rtsp-stream-client-java or via FFmpeg with JNI wrapper. Latter gives maximum flexibility, but binary size grows 10–15 MB.
Media Processing: Compression, Transcoding
Video shot on iPhone 15 Pro in ProRes can be 6 GB/minute. Before upload, compression needed. On iOS — AVAssetExportSession with AVAssetExportPreset1920x1080 or custom AVVideoComposition for fine tuning. VideoToolbox for hardware H264/HEVC encoding directly — faster and more battery-efficient than software codec.
On Android — MediaCodec directly or via Transformer (Media3) — high-level API for video transformations without writing encoder-decoder pipeline manually. Transformer can crop, resize, apply effects via GlEffectsFrameProcessor.
For images on Android — BitmapFactory.Options.inSampleSize for loading with downsampling, Glide / Coil for caching and transforms. Coil written in Kotlin Coroutines, fits well in Compose. Loading original 12 MP in 200×200dp ImageView — classic way to get OutOfMemoryError on 2 GB RAM devices.
Media Development Process
Complexity of media tasks is non-linear: basic video playback — 1–2 days, custom camera with real-time frame processing and streaming — 3–5 weeks. Start with clarifying requirements: need DRM, what formats, minimum OS version, background mode support.
Testing on hardware is mandatory — emulator doesn't reproduce Camera HAL issues, hardware codec issues, and AudioFocus. Minimum test device set: latest iPhone, iPhone SE, Samsung flagship, budget Android (Xiaomi/Redmi), Android Go if target audience is developing markets.







