Co-Streaming (Joint Stream) in Mobile App
Co-streaming — two users broadcast simultaneously, see and hear each other in real time, viewers see both. Technically this is intersection of two different tasks: WebRTC (for peer-to-peer video between streamers) and RTMP/SRT (for broadcasting result to viewers). Combining them is non-trivial.
Co-Stream Architecture
Standard scheme:
Streamer A: camera → WebRTC → Signal Server ← WebRTC ← camera: Streamer B
↓
Mixing Server (SFU/MCU)
↓
RTMP → Twitch/YouTube/Custom
But on mobile additional variant without MCU — "client-side mixing". Streamer A receives Streamer B's video via WebRTC, mixes both streams locally via Metal/OpenGL, sends mixed stream to RTMP. Cheaper server-side but requires powerful processor and unstable if second participant has poor network.
In production for 1000+ simultaneous co-streams — only server MCU/SFU (LiveKit, mediasoup, Agora). For MVP with small load — client-side mixing works.
WebRTC on iOS and Android
iOS: GoogleWebRTC (CocoaPods) or WebRTC.xcframework from Google. Main object — RTCPeerConnection. Initialization:
let config = RTCConfiguration()
config.iceServers = [RTCIceServer(urlStrings: ["stun:stun.l.google.com:19302"],
username: nil, credential: nil)]
config.sdpSemantics = .unifiedPlan
let constraints = RTCMediaConstraints(
mandatoryConstraints: ["OfferToReceiveVideo": "true",
"OfferToReceiveAudio": "true"],
optionalConstraints: nil
)
let peerConnection = factory.peerConnection(with: config,
constraints: constraints,
delegate: self)
Android: org.webrtc:google-webrtc:1.0.+ or io.getstream:stream-webrtc-android. Logic analogous via PeerConnection.
Signal server — WebSocket exchanging SDP offer/answer and ICE candidates between streamers. Write on Node.js (ws) or use ready-made — LiveKit Server, Agora RTM.
Problems We Encounter
Audio delay when mixing. When A hears B via WebRTC with 150–200ms delay, and stream formed from A's local audio, viewers hear desync. Solution: delay compensation via AVAudioPlayerNode.scheduleBuffer with explicit AVAudioTime so A's local audio in final stream delayed same as incoming from B.
Echo cancellation. If streamer has no headphones, microphone captures sound from speaker (WebRTC audio from partner). Built-in WebRTC AEC works only on RTCPeerConnection audio track. With custom audio pipeline need AVAudioEngine with AVAudioUnitEQ + own AEC or speex DSP.
Switching between co-stream and solo. When partner leaves co-stream need to smoothly remove their window from composition and rebuild layout without interrupting RTMP broadcast. Means: Metal render pass must check second texture presence and correctly render full-frame mode if second participant disconnected.
Signaling State for Viewers
Viewers should know it's a co-stream. In app — overlay with both streamers' avatars. Overlay state updates on change (partner connected/disconnected) via WebSocket event, not polling.
Timeline
Client-side co-stream (iOS, two participants, Metal composition, basic signal server): 5–7 weeks. Full implementation with MCU, Android support, AEC, state management: 8–12 weeks. Cost calculated individually after requirements analysis and architecture choice.







