How does two-stage NSFW detection work?

A lightweight on-device model (CoreML/TFLite) performs fast pre-classification. If confidence is high, the content is blocked locally. Borderline cases are sent to server verification (Google Cloud Vision SafeSearch or AWS Rekognition) for precise analysis.

Which on-device models do you use?

We typically use MobileNetV3-Small or specialized NSFW models (8–15 MB). These provide sufficient accuracy with minimal impact on device performance and battery.

How do you set the confidence threshold?

Thresholds are tuned empirically on a representative dataset. For unsafe content, we often use 0.92 for client-side blocking and 0.65–0.92 for server submission. Values depend on audience age and content policy.

What happens to borderline cases?

Images with confidence between 0.65 and 0.92 are uploaded in a hidden state to the server. Google Cloud Vision SafeSearch or AWS Rekognition returns categories (adult, violence, racy), and a decision is made to block or publish.

How is NSFW detection applied to video?

For video UGC, we sample frames at 1-second intervals using AVAssetImageGenerator (iOS) or MediaMetadataRetriever (Android). Frames are processed in parallel by the on-device model. If any frame exceeds the unsafe threshold, the entire video is flagged for review.

How does two-stage NSFW detection work?

A lightweight on-device model (CoreML/TFLite) performs fast pre-classification. If confidence is high, the content is blocked locally. Borderline cases are sent to server verification (Google Cloud Vision SafeSearch or AWS Rekognition) for precise analysis.

Which on-device models do you use?

We typically use MobileNetV3-Small or specialized NSFW models (8–15 MB). These provide sufficient accuracy with minimal impact on device performance and battery.

How do you set the confidence threshold?

Thresholds are tuned empirically on a representative dataset. For unsafe content, we often use 0.92 for client-side blocking and 0.65–0.92 for server submission. Values depend on audience age and content policy.

What happens to borderline cases?

Images with confidence between 0.65 and 0.92 are uploaded in a hidden state to the server. Google Cloud Vision SafeSearch or AWS Rekognition returns categories (adult, violence, racy), and a decision is made to block or publish.

How is NSFW detection applied to video?

For video UGC, we sample frames at 1-second intervals using AVAssetImageGenerator (iOS) or MediaMetadataRetriever (Android). Frames are processed in parallel by the on-device model. If any frame exceeds the unsafe threshold, the entire video is flagged for review.

Two-Stage AI NSFW Detection for Mobile Apps

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Services we offer

Showing 1 of 1All 1734 services

Two-Stage AI NSFW Detection for Mobile Apps

Medium

~3-5 days

Frequently Asked Questions

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
858
Development of a mobile application for XOOMER
745
Development of a mobile application for RHL
1162
Development of a mobile application for ZIPPY
1034
Development of a mobile application for Affhome
968
Development of a mobile application for the FLAVORS company
563

Show more works

Two-Stage AI NSFW Detection for Mobile Apps

A user uploads a photo in the chat — the moderation server lags, and the recipient sees a blank screen for 2 seconds. Or a false block on a medical image triggers a wave of negative feedback. A two-stage architecture solves both problems: speed (content displayed with minimal delay) and accuracy (false positives destroy trust). Both requirements conflict, and the right architecture is a compromise. Our experience shows: a two-stage scheme with an on-device pre-filter and cloud verification delivers the best balance. The on-device MobileNetV3-Small model processes an image in 50–100 ms, 8× faster than a cloud request to AWS Rekognition (300–800 ms). Server cost savings with this approach reach 40%, translating to savings of $2,000–$5,000 per month for a mid-sized app.

Common Pitfalls in NSFW Detection

Fully Server-Side Classification Without Pre-Filter

If every uploaded image hits an API service and waits for a response before display, latency grows under peak load and UX degrades. A single request to AWS Rekognition DetectModerationLabels takes 300–800 ms. For a chat with photos or a marketplace with fast uploads, this is unacceptable.

Naive On-Device Classification

Running a full NSFW model on every frame of a video call or every photo in a gallery heats up the device and drains the battery. An iPhone 12 with the Open NSFW model (~50 MB in CoreML) under continuous processing enters thermal throttling within 8–10 minutes.

Our Two-Stage Architecture Solution

We implement a two-stage pipeline: a lightweight on-device pre-filter and cloud verification for borderline cases. On-device model on the client (CoreML/TFLite) gives a fast verdict for simple cases. Server verification (Google Cloud Vision SafeSearch or AWS Rekognition) analyzes questionable images with high accuracy. The result: instant display of safe content and a final decision on disputed ones.

How to Minimize False Positives

False positives are the main pain point. Medical images, artwork, or sports photos can be mistakenly flagged as NSFW. The solution is fine-tuning thresholds and whitelists for allowed categories. For example, when integrating Google Cloud Vision SafeSearch, we set the threshold for medical to VERY_LIKELY and do not block; for racy, we trigger at POSSIBLE. This reduces false blocks by 30–40% without losing sensitivity. According to App Store Review Guidelines (section 5.1.1), apps with UGC must filter content.

Why On-Device Pre-Filtering Is Critical for UX

Without it, the user waits for the server response up to 800 ms — in chats and social networks, this ruins the feeling of instant feedback. A lightweight client model (8–15 MB) solves this: it runs in 50–100 ms. If confidence >0.92, we block immediately without uploading. This saves battery and reduces server load.

Two-Stage Architecture Details

On-Device (CoreML / TFLite)

On the client, we run a lightweight binary model (~8–15 MB): MobileNetV3-Small or a specialized NSFW model converted with coremltools. Output: two classes (safe / unsafe) plus a confidence score.

// iOS: CoreML inference before upload
func checkImage(_ image: UIImage, completion: @escaping (NSFWResult) -> Void) {
    guard let pixelBuffer = image.resized(to: CGSize(width: 224, height: 224)).toCVPixelBuffer() else { return }

    let request = VNCoreMLRequest(model: nsfwModel) { request, _ in
        guard let results = request.results as? [VNClassificationObservation],
              let top = results.first else { return }
        let result = NSFWResult(
            label: top.identifier,
            confidence: top.confidence
        )
        DispatchQueue.main.async { completion(result) }
    }
    try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer).perform([request])
}

Thresholds: confidence > 0.92 for unsafe → block on client, no upload. confidence between 0.65 and 0.92 → upload in hidden state, send to server verification.

Criteria	On-Device (Pre-filter)	Server Verification
Speed	50–100 ms	300–800 ms
Accuracy	~85% on borderline cases	>95%
Battery impact	Low (model 8–15 MB)	None
Scenario	Primary filtering	Final decision on suspicious

Android: ML Kit + TFLite

On Android, we use ImageClassifier from TFLite Task Library — it manages the model lifecycle and Bitmap processing without manual buffer handling:

val classifier = ImageClassifier.createFromFileAndOptions(
    context,
    "nsfw_lite.tflite",
    ImageClassifier.ImageClassifierOptions.builder()
        .setMaxResults(2)
        .setScoreThreshold(0.5f)
        .build()
)

val tensorImage = TensorImage.fromBitmap(bitmap)
val results = classifier.classify(tensorImage)
val nsfwScore = results.flatMap { it.categories }
    .firstOrNull { it.label == "nsfw" }?.score ?: 0f

Server Verification via Google Cloud Vision / AWS Rekognition

For borderline cases and final checks before publication:

// send only borderline cases to server
if (nsfwScore in 0.65f..0.92f) {
    uploadForReview(imageUri, nsfwScore)
}

Google Cloud Vision SafeSearch returns 5 categories: adult, spoof, medical, violence, racy — each with VERY_UNLIKELY to VERY_LIKELY. This allows fine-grained policy: medical apps whitelist medical, children's apps set racy = POSSIBLE as a block trigger.

Video: Frame-by-Frame Analysis with Sampling

For video UGC, we extract frames using AVAssetImageGenerator (iOS) at 1-second intervals, running the on-device model in parallel via DispatchQueue.concurrentPerform. On Android, we use MediaMetadataRetriever.getFrameAtTime() with coroutines on Dispatchers.Default. If any frame exceeds the unsafe threshold, the entire video is flagged for review.

Case Study: Reducing False Positives on a Health App

We worked with a telemedicine app that allowed users to share treatment photos. The initial server-only moderation falsely blocked 15% of legitimate medical images, causing user complaints. After integrating an on-device pre-filter with a whitelist for the medical category and tuning thresholds, false blocks dropped to 2%. The app now shows safe images instantly, with borderline cases reviewed by human moderators within 30 seconds. Server costs decreased by 35%, saving the app $3,000 per month. With over 7 years of experience in AI moderation and 50+ successful projects, our team guarantees high accuracy and minimal false positives.

Our Work Process

Analyze content policy: which categories to block, which need human review, whitelists for medicine/art.
Select and test on-device model on a representative app dataset.
Integrate two-stage logic into client + server verifier.
Tune thresholds considering audience (app age rating).
Document and train the moderation team.

Our team of 10+ engineers ensures quick turnaround and reliable delivery. Trusted by leading mobile apps, our solution adheres to industry best practices for data security.

What's Included in the Service

Requirements analysis for content moderation.
Selection and adaptation of on-device model (CoreML/TFLite).
Integration with server verifier (Google Cloud Vision or AWS Rekognition).
Testing on real data (minimize false positives).
Documentation, service access, team training.
Post-launch support (2 weeks of monitoring).
Our two-stage integration service starts at $4,000.

Timeline Estimates

Stage	Duration
On-device pre-filter with CoreML/TFLite	2–3 days
Full two-stage system with server verification	1–1.5 weeks
Testing and threshold tuning	3–5 days
Video processing integration	+2 days

Contact us to discuss details and get a consultation on AI moderation integration for your app. Request a threshold tuning assessment — we'll evaluate your project and propose an optimal architecture.

Machine Learning in Mobile Apps: CoreML, TFLite, and On-Device Models

We distinguish two fundamentally different approaches: an app with on-device AI and an app that simply calls a cloud API. The former works without internet, does not send user data to third-party servers, and responds within 50 milliseconds. The latter depends on network latency and pricing plans. Choosing the architecture is a key step that directly affects cost, privacy, and user experience in machine learning in mobile apps. Our experience shows that in 70% of projects, on-device inference is cheaper in the long run due to eliminating server costs.

How to Choose Between CoreML and TFLite for On-Device Inference?

CoreML — Apple's native framework for running ML models on device. Supports Neural Engine (starting with A11 Bionic), GPU, and CPU as fallback. Models are converted to .mlmodel format via coremltools from PyTorch, ONNX, or TensorFlow. Conversion is not always trivial: custom layers require implementing MLCustomLayer, and INT8 quantization can sometimes noticeably reduce accuracy on specific data. We ensure the final model passes validation on real data before and after conversion.

TensorFlow Lite — cross-platform alternative for Android and Flutter. On Android it uses NNAPI (Neural Networks API) for hardware acceleration — since Android 10 NNAPI is more stable; before that it's better to explicitly use GPU delegate via GpuDelegate. A typical mistake: the model is trained on normalized data in range [0,1], but the app feeds [0,255] — inference runs but produces meaningless results without any error. We include an automatic input data validation module in the SDK.

For image classification, object detection, and segmentation tasks, ready-to-use optimized models are available. YOLOv8 in CoreML format runs detection on a 640×640 frame in 15–20 ms on iPhone 14 Neural Engine. MobileNetV3 on TFLite with GPU delegate runs around 8 ms on Pixel 7 for classification.

Parameter	CoreML	TFLite
Platforms	iOS, macOS, watchOS	Android, iOS, Linux, embedded
Hardware acceleration	Neural Engine, GPU, CPU	NNAPI, GPU (OpenCL/OpenGL), CPU
Quantization support	FP16, INT8 (with coremltools)	FP16, INT8, dynamic range
Custom operations	Via MLCustomLayer (Swift)	Via delegates (Java/Kotlin)
Model bundle size	~3–5 MB (MobileNetV2 quantized)	~2–4 MB

What If You Need Text Generation On-Device?

Running small language models on device has become a reality in the last few years. Apple Intelligence uses its own models via Private Cloud Compute, but for third-party developers other paths are available.

llama.cpp with Metal backend on iOS is a working approach for phi-3-mini (3.8B parameters, 4-bit quantization, ~2.3 GB). Inference: 15–25 tokens/second on iPhone 15 Pro. For integration in Swift, use the Swift Package llama.swift or a wrapper via C interface llama.h. The binary is not bundled with the app — the model is downloaded on first launch and stored in Application Support. Our certified developers configure incremental download to avoid blocking the first launch.

On Android, the analog is Google AI Edge (formerly MediaPipe LLM Inference API) supporting Gemma-2B. It works via GPU delegate, on Tensor G3 chip Pixel 8 Pro — about 20 tokens/second.

Limitations are real: models larger than 4B parameters are still slow on mobile devices. For complex reasoning tasks, on-device LLM falls behind GPT-4o in quality. A hybrid approach — on-device for short tasks and private data, cloud for complex queries — is often optimal. We will evaluate your case and propose a balance of performance and privacy — contact us.

How Does On-Device Inference Compare to Cloud in Terms of Cost and Performance?

On-device inference is typically 10x cheaper per request than cloud APIs for image recognition tasks, while also eliminating latency variability and privacy risks. The table below summarizes the trade-offs.

Criteria	On-Device Inference	Cloud API
Latency	<50ms	200–500ms (including network)
Cost per 1M requests	$0 (no server)	$10–50 (AWS Rekognition, Google Vision)
Privacy	Data stays on device	Data sent to server
Offline	Yes	No
Scalability	No server scaling issues	Need to provision API capacity

For an app with 100k MAU running 10 image recognitions per user per month, on-device inference can save up to $5,000 monthly compared to cloud API. Get a free consultation on your ML architecture today.

Integrating OpenAI API and Other Cloud Models

For scenarios where cloud inference is acceptable, integrating OpenAI, Anthropic, or Google Gemini is an HTTP client + streaming SSE. In Swift, AsyncThrowingStream is convenient for streaming responses. In Kotlin, use Flow.

Critically: API keys must never be stored in the app bundle. Even an obfuscated key can be extracted from the IPA in 10 minutes using strings or frida. Correct architecture: mobile app → your own backend → OpenAI API. The backend controls rate limiting, logs requests, and protects the key.

What Is Included in the Work (Deliverables)

Trained and quantized model for the target device (documentation with metrics)
SDK for integration (Swift/Kotlin/Flutter) with call examples
Performance tests on 3–5 real devices
Instructions for OTA model updates
Support during App Store / Google Play moderation (compliance with Guidelines 4.2, 5.1)
2 weeks of technical support after release

Typical Project Pipeline

Task analysis — measure latency, privacy, size, supported devices.
Model prototyping — in Python, evaluate accuracy on target data.
Conversion and quantization — for CoreML/TFLite with validation.
Integration into the app — model wrapped in a service layer (easy to swap CoreML ↔ TFLite ↔ cloud).
Testing — on real devices, measure FPS, RAM, battery.
Deployment — via TestFlight / Firebase App Distribution, monitor metrics.

Timelines: integration of a ready CoreML/TFLite model — 1–2 weeks, development of a custom model with mobile optimization — from 6 weeks, on-device LLM chat with personalization — 4–8 weeks.

Why We Take on Complex Cases?

10+ years of experience in mobile development, 50+ implemented AI/ML solutions, guarantee of compatibility with current iOS and Android versions. All projects undergo code review and load testing. The cost includes preparation of moderation documentation and training of your team.

Contact us — we will help you choose the architecture and implement ML in your app turnkey. Order an audit of your existing solution — we will assess the potential for server cost savings free of charge. In some projects, savings can reach significant amounts per month.