What do you do about false positives in moderation?

False positives are handled through a manual review queue (REQUIRE_REVIEW). We adjust probability thresholds for different categories to minimize false positives. Logging is also maintained for analysis and model improvement.

What languages does the moderation support?

OpenAI Moderation API supports many languages, including Russian, English, Chinese. However, for non-standard forms (translit, leetspeak) we add a text normalization layer before the check.

How long does it take to implement a moderation system?

Basic integration with OpenAI Moderation and a local filter takes 2-3 days. A full system with pipeline, manual moderation, and analytics takes 2-3 weeks. Exact timelines depend on your app's complexity.

What guarantees do you provide?

We guarantee data confidentiality, compliance with App Store Review Guidelines, and stable system operation. We provide documentation and team training. Our developers have over 5 years of experience in mobile development.

What do you do about false positives in moderation?

False positives are handled through a manual review queue (REQUIRE_REVIEW). We adjust probability thresholds for different categories to minimize false positives. Logging is also maintained for analysis and model improvement.

What languages does the moderation support?

OpenAI Moderation API supports many languages, including Russian, English, Chinese. However, for non-standard forms (translit, leetspeak) we add a text normalization layer before the check.

How long does it take to implement a moderation system?

Basic integration with OpenAI Moderation and a local filter takes 2-3 days. A full system with pipeline, manual moderation, and analytics takes 2-3 weeks. Exact timelines depend on your app's complexity.

What guarantees do you provide?

We guarantee data confidentiality, compliance with App Store Review Guidelines, and stable system operation. We provide documentation and team training. Our developers have over 5 years of experience in mobile development.

AI Text Moderation for Mobile Apps: Step-by-Step Guide

Q: How do you protect the OpenAI API key during moderation?

The API key must never be stored on the client. We use a backend-proxy: the app sends text to your server, the server calls the OpenAI Moderation API, and returns the result. The key is stored only on the server.

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Services we offer

Showing 1 of 1All 1734 services

AI Text Moderation for Mobile Apps: Step-by-Step Guide

Medium

~2-3 days

Frequently Asked Questions

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
858
Development of a mobile application for XOOMER
745
Development of a mobile application for RHL
1162
Development of a mobile application for ZIPPY
1034
Development of a mobile application for Affhome
968
Development of a mobile application for the FLAVORS company
563

Show more works

Imagine your app with user-generated content (UGC) getting blocked in the App Store due to lack of moderation. Or users complaining about harassment in chats. Without a reliable content moderation system, you cannot release a product that passes review and remains safe. UGC moderation is a key element of any social app. We are a team of mobile developers with 5+ years of experience, and we have implemented dozens of moderation systems for iOS and Android. In this article, we'll show you how to build an AI text moderation pipeline using the OpenAI Moderation API that ensures compliance with App Store Review Guideline 1.2 and protects users from unwanted content. One of our projects—a fintech app with chats—we implemented multi-level moderation that reduced complaints by 80%. The client-side filter catches 40% of violations before they reach the server, reducing latency and load. Final moderation accuracy reached 99.5%, with a median check time of 180 milliseconds. This approach guarantees speed and precision. Savings on manual moderation amounted to 60% per moderator cost.

Problems We Solve

The main technical challenges when implementing moderation:

API Key Leakage

If the OpenAI Moderation API is called directly from the client, the key ends up in the binary. Even obfuscation does not help—attackers extract it. Solution: all requests go through a backend-proxy. Only the server knows the key.

False Positives

OpenAI returns probabilities, not a binary answer. Without proper thresholds, up to 30% of legitimate content gets blocked. We adjust thresholds for each category and add manual verification for the "gray area." In one project, this reduced false positives by 60%.

Multilingual Support

Non-standard forms (translit, leetspeak, deliberate misspellings) reduce accuracy. We apply text normalization before the check—this increases detection by 20%.

How AI Text Moderation Improves Your App's Security?

The architecture includes four levels:

User enters text
    ↓
[Client] Local check (instant)
    ↓ passed
[Backend] OpenAI Moderation API (100–300 ms)
    ↓ passed
[Backend] Custom rules (regex, domain-specific)
    ↓ passed
Content published
    ↓ parallel
[Backend] Async re-check (more expensive model)

According to OpenAI documentation, Moderation API is designed to detect harmful content across several categories. The client-side filter catches obvious violations before sending them to the server. This reduces load and protects the user from delays. For one fintech app, we implemented such a pipeline: 99.5% accuracy with a median delay of 180 ms.

Why Client-Side Moderation?

On the client, speed is crucial. We use the NaturalLanguage framework on iOS and similar on Android. A simple example—a local list of banned words compiled into regex:

import NaturalLanguage

class LocalTextModerator {
    private let forbiddenPatterns: NSRegularExpression

    init() {
        let patterns = ["word1", "word2"].joined(separator: "|")
        forbiddenPatterns = try! NSRegularExpression(
            pattern: "\\b(\(patterns))\\b",
            options: [.caseInsensitive]
        )
    }

    func quickCheck(_ text: String) -> ModerationResult {
        let range = NSRange(text.startIndex..., in: text)
        if forbiddenPatterns.firstMatch(in: text, range: range) != nil {
            return .blocked(reason: .explicitContent)
        }
        return .passed
    }
}

We store the word list encrypted or load it from the server at startup—to avoid exposing the binary. The client-side filter is 10 times faster than the server: 10 ms vs. 100-300 ms. Our engineers are ready to audit your app—contact us for a consultation.

Backend-Proxy Architecture for API Key Protection

The only secure way is a backend-proxy. The app sends text to your server, the server calls the OpenAI Moderation API, and returns the result. Example request: POST https://api.openai.com/v1/moderations with Authorization: Bearer and body {"input": "text", "model": "omni-moderation-latest"}. The response contains categories and their probabilities. On the server, we configure rate limiting (no more than 20 requests per minute per user) and shadowban for violators.

Handling Edge Cases

OpenAI Moderation does not give a binary answer—it's probabilities. You need business logic for the "gray zone":

fun evaluateModerationResult(result: ModerationResult): ContentDecision {
    return when {
        result.flagged -> ContentDecision.BLOCK
        result.categoryScores["harassment"]!! > 0.7 -> ContentDecision.BLOCK
        result.categoryScores["harassment"]!! > 0.3 -> ContentDecision.REQUIRE_REVIEW
        result.categoryScores["sexual"]!! > 0.4 -> ContentDecision.REQUIRE_REVIEW
        else -> ContentDecision.ALLOW
    }
}

Content with REQUIRE_REVIEW goes into a manual moderation queue or is published with reduced visibility.

Example threshold configuration for categories

For the hate category, BLOCK threshold = 0.7, REQUIRE_REVIEW = 0.3. For sexual, REQUIRE_REVIEW = 0.4. Thresholds are selected based on the app's specifics.

Approach	Speed	Accuracy	Load
Client-side filter	10 ms	70%	Low
OpenAI Moderation API	200 ms	98%	Medium
Combined pipeline	180 ms	99.5%	Medium

Multilingual Normalization

For Russian and translit, we apply normalization:

func normalizeText(_ text: String) -> String {
    var result = text.lowercased()
    let translitMap = ["a": "а", "e": "е", "o": "о", "p": "р", "c": "с"]
    for (latin, cyrillic) in translitMap {
        result = result.replacingOccurrences(of: latin, with: cyrillic)
    }
    result = result.replacingOccurrences(of: "(.)\\1{2,}", with: "$1", options: .regularExpression)
    return result
}

We check both normalized and original text—this gives +20% accuracy.

Process

Analytics — study your app's specifics, UGC, platform requirements.
Design — choose the stack (iOS/Android/cross-platform), draw the pipeline architecture.
Implementation — write code: local filters, OpenAI integration, custom rules, normalization, rate limiting.
Testing — load testing, A/B tests for thresholds, verification on real data.
Deploy — configure monitoring, logging for appeals, CI/CD.

What's Included

Stage	Result
Requirements analysis	Document with architecture and metrics
Design	Pipeline diagram, model selection
Implementation	Integration with OpenAI Moderation, local filters, normalization, rate limiting
Testing	Load test report, threshold tuning
Deploy	Documentation, team training, 1 month support

Our Results and Guarantees

We are a team with 5+ years of experience in mobile development, certified Apple and Google developers. We have completed over 20 content moderation projects. We guarantee:

Passing App Store and Google Play Review.
Data confidentiality (NDA).
Stable system operation under load.

Get a free engineer consultation—contact us.

Timelines and Cost

Basic integration (client-side filter + OpenAI Moderation) — from 2 to 3 days. Full system with pipeline, manual moderation, normalization, and analytics — from 2 to 3 weeks. Cost is calculated individually after an audit. The investment pays off by reducing blocking risks and saving on manual labor.

Machine Learning in Mobile Apps: CoreML, TFLite, and On-Device Models

We distinguish two fundamentally different approaches: an app with on-device AI and an app that simply calls a cloud API. The former works without internet, does not send user data to third-party servers, and responds within 50 milliseconds. The latter depends on network latency and pricing plans. Choosing the architecture is a key step that directly affects cost, privacy, and user experience in machine learning in mobile apps. Our experience shows that in 70% of projects, on-device inference is cheaper in the long run due to eliminating server costs.

How to Choose Between CoreML and TFLite for On-Device Inference?

CoreML — Apple's native framework for running ML models on device. Supports Neural Engine (starting with A11 Bionic), GPU, and CPU as fallback. Models are converted to .mlmodel format via coremltools from PyTorch, ONNX, or TensorFlow. Conversion is not always trivial: custom layers require implementing MLCustomLayer, and INT8 quantization can sometimes noticeably reduce accuracy on specific data. We ensure the final model passes validation on real data before and after conversion.

TensorFlow Lite — cross-platform alternative for Android and Flutter. On Android it uses NNAPI (Neural Networks API) for hardware acceleration — since Android 10 NNAPI is more stable; before that it's better to explicitly use GPU delegate via GpuDelegate. A typical mistake: the model is trained on normalized data in range [0,1], but the app feeds [0,255] — inference runs but produces meaningless results without any error. We include an automatic input data validation module in the SDK.

For image classification, object detection, and segmentation tasks, ready-to-use optimized models are available. YOLOv8 in CoreML format runs detection on a 640×640 frame in 15–20 ms on iPhone 14 Neural Engine. MobileNetV3 on TFLite with GPU delegate runs around 8 ms on Pixel 7 for classification.

Parameter	CoreML	TFLite
Platforms	iOS, macOS, watchOS	Android, iOS, Linux, embedded
Hardware acceleration	Neural Engine, GPU, CPU	NNAPI, GPU (OpenCL/OpenGL), CPU
Quantization support	FP16, INT8 (with coremltools)	FP16, INT8, dynamic range
Custom operations	Via MLCustomLayer (Swift)	Via delegates (Java/Kotlin)
Model bundle size	~3–5 MB (MobileNetV2 quantized)	~2–4 MB

What If You Need Text Generation On-Device?

Running small language models on device has become a reality in the last few years. Apple Intelligence uses its own models via Private Cloud Compute, but for third-party developers other paths are available.

llama.cpp with Metal backend on iOS is a working approach for phi-3-mini (3.8B parameters, 4-bit quantization, ~2.3 GB). Inference: 15–25 tokens/second on iPhone 15 Pro. For integration in Swift, use the Swift Package llama.swift or a wrapper via C interface llama.h. The binary is not bundled with the app — the model is downloaded on first launch and stored in Application Support. Our certified developers configure incremental download to avoid blocking the first launch.

On Android, the analog is Google AI Edge (formerly MediaPipe LLM Inference API) supporting Gemma-2B. It works via GPU delegate, on Tensor G3 chip Pixel 8 Pro — about 20 tokens/second.

Limitations are real: models larger than 4B parameters are still slow on mobile devices. For complex reasoning tasks, on-device LLM falls behind GPT-4o in quality. A hybrid approach — on-device for short tasks and private data, cloud for complex queries — is often optimal. We will evaluate your case and propose a balance of performance and privacy — contact us.

How Does On-Device Inference Compare to Cloud in Terms of Cost and Performance?

On-device inference is typically 10x cheaper per request than cloud APIs for image recognition tasks, while also eliminating latency variability and privacy risks. The table below summarizes the trade-offs.

Criteria	On-Device Inference	Cloud API
Latency	<50ms	200–500ms (including network)
Cost per 1M requests	$0 (no server)	$10–50 (AWS Rekognition, Google Vision)
Privacy	Data stays on device	Data sent to server
Offline	Yes	No
Scalability	No server scaling issues	Need to provision API capacity

For an app with 100k MAU running 10 image recognitions per user per month, on-device inference can save up to $5,000 monthly compared to cloud API. Get a free consultation on your ML architecture today.

Integrating OpenAI API and Other Cloud Models

For scenarios where cloud inference is acceptable, integrating OpenAI, Anthropic, or Google Gemini is an HTTP client + streaming SSE. In Swift, AsyncThrowingStream is convenient for streaming responses. In Kotlin, use Flow.

Critically: API keys must never be stored in the app bundle. Even an obfuscated key can be extracted from the IPA in 10 minutes using strings or frida. Correct architecture: mobile app → your own backend → OpenAI API. The backend controls rate limiting, logs requests, and protects the key.

What Is Included in the Work (Deliverables)

Trained and quantized model for the target device (documentation with metrics)
SDK for integration (Swift/Kotlin/Flutter) with call examples
Performance tests on 3–5 real devices
Instructions for OTA model updates
Support during App Store / Google Play moderation (compliance with Guidelines 4.2, 5.1)
2 weeks of technical support after release

Typical Project Pipeline

Task analysis — measure latency, privacy, size, supported devices.
Model prototyping — in Python, evaluate accuracy on target data.
Conversion and quantization — for CoreML/TFLite with validation.
Integration into the app — model wrapped in a service layer (easy to swap CoreML ↔ TFLite ↔ cloud).
Testing — on real devices, measure FPS, RAM, battery.
Deployment — via TestFlight / Firebase App Distribution, monitor metrics.

Timelines: integration of a ready CoreML/TFLite model — 1–2 weeks, development of a custom model with mobile optimization — from 6 weeks, on-device LLM chat with personalization — 4–8 weeks.

Why We Take on Complex Cases?

10+ years of experience in mobile development, 50+ implemented AI/ML solutions, guarantee of compatibility with current iOS and Android versions. All projects undergo code review and load testing. The cost includes preparation of moderation documentation and training of your team.

Contact us — we will help you choose the architecture and implement ML in your app turnkey. Order an audit of your existing solution — we will assess the potential for server cost savings free of charge. In some projects, savings can reach significant amounts per month.