Development of Group Video Conferences in Mobile Application
Conference for 2 participants and conference for 10 — technically different tasks. At peer-to-peer WebRTC with 5 participants each sends video to four others — total 20 streams per group. iPhone 13 starts overheating after 10 minutes. Correct solution for group — SFU (Selective Forwarding Unit) or MCU.
Architecture: SFU vs MCU
MCU (Multipoint Control Unit) — server mixes all streams into one and sends each participant single video. Client load minimal but server mixing requires powerful CPU and introduces 100–300 ms latency. Suits webinars where most watch one.
SFU (Selective Forwarding Unit) — server only routes streams, doesn't decode. Each client gets N streams (one per participant) but selects which to decode and display. Load higher but latency lower and flexibility greater.
For mobile app with conferences up to 20 participants SFU optimal. Ready solutions: Livekit (open-source, self-hosted), Mediasoup, Jitsi Videobridge, or managed services — Daily.co, 100ms, Twilio Video Rooms.
Livekit — good choice for self-hosted: Go server, WebRTC SFU, simulcast and dynacast support, native SDK for iOS (LiveKit-iOS), Android (LiveKit-Android), Flutter and React Native. MIT license.
Simulcast: Key to Scalability
Without simulcast on 10+ person conference mobile device sends 720p stream — everyone gets it even those with small tile on screen. With simulcast client sends three qualities simultaneously (e.g. 180p/360p/720p) and SFU sends each receiver appropriate quality per tile width.
On iOS simulcast configured via RTCRtpEncodingParameters with three layers:
let encodings = [
RTCRtpEncodingParameters(rid: "q", scaleResolutionDownBy: 4, maxBitrateBps: 150_000),
RTCRtpEncodingParameters(rid: "h", scaleResolutionDownBy: 2, maxBitrateBps: 500_000),
RTCRtpEncodingParameters(rid: "f", scaleResolutionDownBy: 1, maxBitrateBps: 1_200_000),
]
On Android — similarly via RtpParameters.Encoding. This reduces network load and CPU on receivers with large participant count.
Displaying Participants: Grid and Dominant Speaker
Grid for 2–4 participants — static layout. For 5–16 participants — dynamic grid changing on join/leave. Rule: don't recreate RTCVideoRenderer on each grid update — only reassign track to existing renderer. Recreating causes flicker and redraw.
Dominant speaker detection — determine who's speaking now, display larger. Livekit and 100ms provide this out of box via onActiveSpeakersChanged event. In raw WebRTC — via analyzing audioLevel from RTCPeerConnection.getStats().
On iOS render video via RTCMTLVideoView (Metal) — mandatory for conferences. Old RTCEAGLVideoView (OpenGL ES) doesn't support multiple instances with good performance on A15+. With 6 participants RTCEAGLVideoView gives 40 FPS, RTCMTLVideoView — stable 60 FPS.
Battery and Thermal State
Group conference — most battery-intensive app mode. Encoding 720p video at 30 FPS consumes ~15–20% charge per hour on iPhone 14. With 4+ participants — additional decoding of multiple streams.
React to ProcessInfo.thermalState (iOS) — at .serious or .critical lower resolution to 360p and reduce FPS to 15. On Android — PowerManager.getThermalHeadroom() (Android 11+). This isn't degradation but adaptation: stable 360p conference better than overheat and forced CPU throttling.
On app minimize on iOS — stop camera capture (AVCaptureSession.stopRunning()), audio continues via AVAudioSession. On Android — ForegroundService with android:foregroundServiceType="camera|microphone" retains permissions.
Typical Errors
Single AVCaptureSession for whole app — don't recreate on each call. Session init takes 200–500 ms, recreating on each call noticeable to user.
Don't handle AVAudioSession.routeChangeNotification — AirPods connect during conference without handling audio goes to earpiece.
Don't release RTCVideoTrack on participant exit from conference — memory leak accumulating on long sessions and large rooms.
Process and Timeline
Requirements audit → SFU choice (self-hosted or managed) → SDK integration → UI (grid, dominant speaker, controls) → simulcast → thermal state adaptation → load testing.
Conference up to 8 participants via managed SFU (100ms, Daily) with ready SDK — 2–4 weeks. Self-hosted Livekit with custom UI and simulcast — 4–8 weeks. Cost calculated after requirement analysis.







