AI virtual background replacement for video calls in mobile app

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
AI virtual background replacement for video calls in mobile app
Complex
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    760
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    646
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1063
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    878
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    450

AI Virtual Background (Background Replacement) for Video Calls

Standard virtual background implementation via WebRTC and a third-party service works while the network is stable. On mobile during 4G degradation, the roundtrip for sending a frame to the server and receiving the masked result grows to 150–300 ms, causing real-time artifacts. The correct approach is on-device segmentation.

Why Server-Side Segmentation Doesn't Work on Mobile

The task is to isolate the human silhouette on each video frame (30 fps), apply a background, and return it to the pipeline before encoding. This means ~33 ms budget per frame including capture, model inference, post-processing, and rendering.

Server approach: capture → send → infer → response → render. Even with perfect network, roundtrip adds 40–80 ms. In practice—contour jitter, motion "ghosting".

On-device: capture → infer → render. Everything in one pipeline.

iOS: MLKit + CoreImage or Vision

On iOS, use the Vision framework with VNGeneratePersonSegmentationRequest. Apple added it in iOS 15—runs on Neural Engine without explicit model loading. Accuracy is good for front camera but produces ragged contours with complex hairstyles and transparent clothing.

// Segmentation setup
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .balanced   // .accurate better contour, heavier
request.outputPixelFormat = kCVPixelFormatType_OneComponent8

// In AVFoundation frame handler
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])
try handler.perform([request])

guard let mask = request.results?.first?.pixelBuffer else { return }
// mask—8-bit CVPixelBuffer, apply via CIBlendWithMask

CIBlendWithMask with CIContext(options: [.workingColorSpace: NSNull()]) renders on Metal, avoiding color space conversion. Without this, each frame adds ~5 ms just for conversion.

For better segmentation, convert TFLite DeepLab v3 or MediaPipe SelfieSegmentation to Core ML via coremltools and load via MLModel. MediaPipe gives stable contours even with soft edges.

Android: MLKit Selfie Segmentation

val segmenter = Segmentation.getClient(
    SelfieSegmenterOptions.Builder()
        .setDetectorMode(SelfieSegmenterOptions.STREAM_MODE)  // optimized for video
        .enableRawSizeMask()
        .build()
)

// In CameraX ImageAnalysis handler
override fun analyze(imageProxy: ImageProxy) {
    val inputImage = InputImage.fromMediaImage(imageProxy.image!!, imageProxy.imageInfo.rotationDegrees)
    segmenter.process(inputImage)
        .addOnSuccessListener { segmentationMask ->
            val mask = segmentationMask.buffer
            // Apply background via RenderScript or Vulkan compute shader
            applyBackground(mask, imageProxy)
        }
        .addOnCompleteListener { imageProxy.close() }
}

STREAM_MODE is critical—maintains internal state between frames and runs faster than SINGLE_IMAGE_MODE. On Pixel 6 with Tensor G2, inference takes 8–12 ms. On budget phones (Snapdragon 695)—20–28 ms. For mask post-processing—RenderScript (deprecated API 31+) or Vulkan compute shader via RenderEffect on Android 12+.

Applying Backgrounds: Three Variants

Static image — simplest case. CIBlendWithMask on iOS, PorterDuff compositing on Android.

Blur — filter CIGaussianBlur with radius 12–20 applied to original frame, then mask selects between original and blurred. On Android—RenderEffect.createBlurEffect (API 31+) or custom blur via Vulkan.

Video background — needs a decoder synchronized with call timing. On iOS—AVPlayerItemVideoOutput + Metal texture. Heavy on memory: video buffer + camera buffer + mask buffer + result. On iPhone 12 with 4 GB OK, iPhone SE 2nd gen (3 GB) needs aggressive buffer reuse.

Integration in WebRTC Pipeline

Most mobile calling solutions use WebRTC—via LiveKit, Daily.co, Agora, or native WebRTC. All provide custom VideoSource/VideoProcessor mechanism to replace frames before encoding.

In LiveKit SDK for iOS, this is the VideoProcessor protocol:

class BackgroundReplacementProcessor: VideoProcessor {
    func process(frame: RTCVideoFrame) -> RTCVideoFrame? {
        // Segmentation + background application
        // Return new RTCVideoFrame with processed buffer
    }
}
room.localParticipant?.videoTracks.first?.processor = BackgroundReplacementProcessor()

Important: RTCVideoFrame works in CVPixelBuffer with format kCVPixelFormatType_420YpCbCr8BiPlanarFullRange. RGB conversion for ML inference and back—lossy. If model accepts YUV—keep format untouched.

Assessment and Process

Start with audit of existing WebRTC stack: which SDK, how frame pipeline organized, target device list. Then prototype with Vision/MLKit, measure on real devices from minimum requirements list.

Critical steps: tune segmentation model for quality/speed, optimize mask post-processing (contour antialiasing, edge feathering), test edge cases—uneven lighting, complex background, fast motion.

Timeline Estimates

Basic implementation with blurred background (one platform) takes 2–3 weeks. Full implementation supporting static images and video backgrounds, both platforms, integration into existing WebRTC stack requires 5–8 weeks.