Video Intercom Mobile App Development
Tap the button at the door — phone rings, video from camera on screen, unlock door with one tap. This scenario seems simple, but inside — WebRTC or SIP stack, ONVIF protocol, push notification in fractions of second with closed app, relay control via device API and event logging. Project for 4-12 weeks depending on equipment compatibility requirements.
Stack: What to Start Design From
First question: what "hardware" on the door?
Ready IP intercoms with SIP (Hikvision DS-KV6113, Grandstream GDS3710, Beward DS06M): support SIP and ONVIF. Phone registers on Asterisk/FreeSWITCH as SIP client. Incoming call from intercom → SIP INVITE → server → push to phone → CallKit (iOS) / ConnectionService (Android).
Custom intercom on single-board computer (Raspberry Pi, ESP32-S3, NXP i.MX): choose stack yourself. WebRTC via Pion (Go) or aiortc (Python) on device — signaling via WebSocket to your server — mobile client.
Cloud intercom (Ring, Dahua, Hikvision EZVIZ): proprietary P2P SDK. Integrate via their mobile SDK — fast but vendor lock-in.
Call from Intercom: Delivery in <1 Second
Notification delay critical. Tap button at 23:00 — phone should ring immediately while guest still at door.
iOS CallKit — right path. Not ordinary push but APNs VoIP push (PKPushRegistry, PKPushType.voIP). System wakes app immediately (even in Force Quit in some scenarios) and calls pkPushRegistry(_:didReceiveIncomingPushWith:). Create CXCallUpdate there and pass to CXProvider — user sees native incoming call interface with video.
let update = CXCallUpdate()
update.remoteHandle = CXHandle(type: .generic, value: "Front door")
update.hasVideo = true
update.localizedCallerName = "Intercom"
provider.reportNewIncomingCall(with: callUUID, update: update) { error in ... }
APNs VoIP certificate separate from push certificate. voip in UIBackgroundModes in Info.plist.
Android ConnectionService + FCM with high priority (priority: high in payload). TelecomManager.addNewIncomingCall() shows system call. Problem: on Xiaomi, Huawei, OPPO aggressive battery optimizations kill FCM connection. Solution — JobScheduler keepalive + instruction to user add app to optimization exclusions. Or use HMS Push for Huawei devices.
Flutter: flutter_callkit_incoming + firebase_messaging. Native bindings to CallKit and ConnectionService.
WebRTC Video Communication
After answering call — WebRTC peer connection. Signaling: SDP exchange via our WebSocket server (or Asterisk with WebSocket transport for SIP/WebRTC). Gather ICE candidates via STUN, for NAT traversal — TURN (Coturn).
Video from intercom camera: render RTCVideoTrack in RTCMTLVideoView (Metal, iOS) or SurfaceViewRenderer (Android). Audio — RTCAudioTrack. AEC (echo cancellation) built into WebRTC — important for scenario "hearing own echo through intercom speaker."
Video latency: 150-400 ms with good Wi-Fi. On 4G — 400-800 ms. For decision "unlock / don't unlock" — acceptable.
Lock Control
HTTP or MQTT request to device or server. Relay output: POST /api/unlock or MQTT publish("home/door/unlock", "1"). Confirmation of opening via door sensor (optional): home/door/sensor → OPEN event displayed in app.
Authorization timeout: show "Unlock" button only while call active. After completion — hide. Event log: each call, unlock, refusal — write to DB with timestamp and userId.
Event Video Recording
Video recording on each call (ringback recording): media server (Janus record plugin, Ant Media) writes stream to WebM/MP4. Store in S3/MinIO. Retention: last 30 events or 7 days — clean via lifecycle rule.
Mobile client shows event history: timeline with first frame preview, duration and play button. Video plays via AVPlayer / ExoPlayer from S3 presigned URL.
Multi-Apartment Building
Multiple entrances, multiple apartments. Routing: entrance 3 intercom → rings only residents of apartments in entrance 3, or specific apartment (typed number + call). In Asterisk: dialplan with Dial(SIP/apartment_${EXTEN}). Each apartment — separate SIP account. In app user bound to own apartment account.
Guest access: apartment owner issues temporary QR code to cleaner. QR opens door via HTTP API without call — with limited validity period and logging.
Development Stages
| Stage | Content | Timeline |
|---|---|---|
| Equipment audit and architecture | Device protocols, stack choice | 3-5 days |
| Server part | Asterisk/WebRTC server, API | 1-2 weeks |
| Mobile iOS + Android | CallKit, video, lock control | 2-3 weeks |
| Event recording and history | S3, player, log | 1 week |
| Hardware testing | QA on real intercom | 1 week |
Total from 1 to 3 months depending on equipment compatibility and multi-apartment requirements.







