IoT Camera Video Stream (RTSP/WebRTC) in Mobile App
Client wants to watch surveillance camera right in the app. Sounds simple. In practice: RTSP stream from IP camera can't be opened in native iOS and Android players directly. RTMP is outdated. WebRTC works but requires signaling server. And low latency of 200 ms — this is a completely different architecture than HLS with 5-second buffer.
RTSP: Why You Can't Just Open the Link
AVPlayer doesn't support rtsp:// — only http(s):// and HLS. ExoPlayer officially removed RTSP in Media3 (though support exists via RtspMediaSource, it's unstable on some manufacturers with non-standard firmwares). Flutter has no mature native RTSP player.
Two working approaches.
Approach 1: Server-Side Transcoding (RTSP → HLS/WebRTC)
Media server (MediaMTX, Nginx-RTMP + ffmpeg, Ant Media Server) accepts RTSP from camera and provides HLS or WebRTC endpoint to client. Client gets HLS — AVPlayer / ExoPlayer read without issues. HLS latency: 3-8 seconds at standard 2-sec chunk. For monitoring acceptable. For intercom — no.
MediaMTX configuration for RTSP → HLS:
paths:
cam1:
source: rtsp://admin:[email protected]:554/stream1
hlsAlwaysRemux: yes
Client connects to http://media-server/cam1/index.m3u8.
Approach 2: Native RTSP Decoder in App
iOS: VLCKit (MobileVLCKit) — wrapper over libVLC. Supports RTSP, RTMP, H.264, H.265. VLCMediaPlayer with drawable = UIView — renders directly into view. Latency: 500-800 ms with 300 ms network buffer. Minus: binary +30 MB, App Store accepts without issues.
Custom path: FFmpeg via ffmpeg-kit-ios (FFmpegKitConfig.executeAsync), decode stream and render via AVSampleBufferDisplayLayer. Gives full control over buffering and latency (can achieve 100-200 ms with rtsp_transport tcp and minimal analyzeduration). More complex to implement but better result.
Android: ExoPlayer RtspMediaSource — for simple cases. For complex (RTSP over TCP, H.265, multi-stream cameras) — ijkplayer or FFmpegKit. ijkplayer works with Jetpack Compose via AndroidView.
Flutter: flutter_vlc_player (MobileVLCKit / libVLC Android) — cross-platform option. video_player plugin doesn't support RTSP.
WebRTC: Low Latency for Intercoms and PTZ Cameras
If latency matters (intercom, PTZ camera control), WebRTC — right choice. Latency 100-300 ms vs 3-8 seconds for HLS.
Architecture: IoT camera → WebRTC-compatible media server (Janus, Kurento, MediaSoup, Ant Media) → mobile client via ICE/STUN/TURN.
iOS: WebRTC framework (pod 'GoogleWebRTC' or pod 'WebRTC-SDK'). Create RTCPeerConnection, get SDP offer from server, answer, get RTCVideoTrack and render via RTCMTLVideoView (Metal rendering, hardware acceleration). Signaling — WebSocket (Starscream, URLSessionWebSocketTask).
Android: io.getstream:stream-webrtc-android or official WebRTC from Google. SurfaceViewRenderer for rendering VideoTrack.
Flutter: flutter_webrtc — uses native WebRTC under hood. RTCVideoRenderer + RTCVideoView.
For NAT traversal mandatory STUN (free Google stun.l.google.com:19302) and TURN server for symmetric NAT (Coturn on own server).
PTZ Control
Pan/Tilt/Zoom via ONVIF or camera's proprietary HTTP API. On mobile: UIPanGestureRecognizer → compute delta → send ONVIF ContinuousMove request via HTTP. Pinch → AbsoluteMove with zoom coordinate.
Throttle requests: don't send every gesture event — throttle(300ms), otherwise camera can't process commands.
Recording and Snapshots
Snapshot from camera: cheaper to request JPEG snapshot URL directly from camera (most IP cameras support http://cam/snapshot.jpg) than capture frame from stream.
Stream recording on device: AVAssetWriter (iOS) writes CMSampleBuffer from decoded stream to MP4. On Android — MediaMuxer + MediaCodec. For server recording — ffmpeg -i rtsp://... -c copy output.mp4 via media server.
Multi-Camera
List of cameras + stream thumbnails. Don't play all streams simultaneously — only active. Preview: static snapshot, updated every 5 seconds (URLSession.dataTask + UIImageView). Saves battery and traffic 90%.
| Protocol | Latency | Client Complexity | Notes |
|---|---|---|---|
| HLS (RTSP→HLS on server) | 3-8 sec | Low (native player) | Monitoring |
| RTSP (VLCKit/FFmpegKit) | 300-800 ms | Medium | Universal |
| WebRTC | 100-300 ms | High | Intercom, PTZ |
Timeline: RTSP view for one camera — 3-4 days. Multi-camera monitoring with WebRTC and PTZ — 2-3 weeks.







