Implementing AI Face Swap in a Mobile App
Face Swap technically more complex than simple style overlay. Need to: detect face in source image, detect in target, align landmarks, blend result without edge artifacts. On mobile, adds memory constraint and reasonable time requirement.
Why on-device face swap is rare in 2024
Models like SimSwap, FaceShifter, GHOST work with 100–300 MB weights and require GPU. TFLite ports exist but with notable quality loss. MediaPipe Face Mesh gives 468 landmarks in real-time — good alignment base, but swap itself still needs neural inference.
Actual on-device implementation possible via Core ML on iPhone 14 Pro+ with distilled model: 1–3 seconds per frame. Android — GPU-delegate TFLite, but behavior varies between devices (Adreno 730 vs PowerVR on budget phones).
Most production apps do face swap via API: InsightFace (open-source self-hosted), Akool, DeepFaceLab API. Akool face_swap endpoint accepts source + target image, returns result in 5–15 seconds.
Client processing pipeline
Preprocessing needed before server send:
// Android: detect and crop face before send
class FacePreprocessor(private val context: Context) {
private val detector = FaceDetection.getClient(
FaceDetectorOptions.Builder()
.setPerformanceMode(FaceDetectorOptions.PERFORMANCE_MODE_ACCURATE)
.setLandmarkMode(FaceDetectorOptions.LANDMARK_MODE_ALL)
.build()
)
suspend fun extractFace(bitmap: Bitmap): FaceExtractionResult {
val image = InputImage.fromBitmap(bitmap, 0)
val faces = detector.process(image).await()
if (faces.isEmpty()) throw FaceSwapError.NoFaceDetected
if (faces.size > 1) throw FaceSwapError.MultipleFaces
val face = faces.first()
val bounds = face.boundingBox
// Expand bounds by 40% for better context
val expandedBounds = expandRect(bounds, 0.4f, bitmap.width, bitmap.height)
val croppedBitmap = Bitmap.createBitmap(
bitmap, expandedBounds.left, expandedBounds.top,
expandedBounds.width(), expandedBounds.height()
)
return FaceExtractionResult(croppedBitmap, face.headEulerAngleY)
}
}
Head rotation angle (headEulerAngleY) matters: beyond >30° deviation swap quality drops sharply — warn user.
Blending and post-processing
Even good face swap gives edge mask artifacts. On client can apply smoothing:
iOS: CIFilter(name: "CIGaussianBlur") on face mask, CIBlendWithMask for smooth transition. Metal Performance Shaders for more complex processing.
Skin color between face and neck/background may mismatch — colour transfer via LAB color space. Core Image filters CIColorCube handle this without OpenCV.
Legal restrictions and moderation
Face swap — area with strict App Store requirements (Guideline 1.1 Objectionable Content). Apple rejects apps that:
- let insert others' faces without explicit consent
- lack watermark or AI-generated marking
- can be used for deepfake
Mandatory minimum: watermark on result, explicit disclaimer in onboarding, Terms of Service prohibiting real person use without consent, content reporting system.
Google Play similarly — Policy Center, Sensitive Events section.
Server content moderation: before generation, run through Amazon Rekognition DetectModerationLabels or Google Cloud Vision Safe Search. If input photo flagged — reject on backend, don't proceed to generation.
Storage and deletion
Face swap results shouldn't be stored longer than needed for client delivery. Standard practice: 24–48 hour TTL, then auto-delete from S3/GCS. Input photos — delete right after processing.
Timeline
Basic API integration with face detection and result display — 4–6 days. With post-processing blending, content moderation, watermarking, store policy compliance — 3–4 weeks. Cost calculated individually.







