AI-Powered Text Content Moderation for Mobile Apps
Text moderation is mandatory for any app with UGC: chat, comments, reviews, descriptions. Without it, App Store Review Guideline 1.2 (User Generated Content) rejects apps or demands immediate changes. Technically, the task splits into client-side (basic) and server-side (primary) moderation.
Client-Side Moderation: First Line
On client — quick check without network. Goal: prevent obvious violations before sending, reduce server load.
Two tools on iOS:
-
NaturalLanguageframework withNLTaggerfor basic sentiment analysis - Local forbidden word list (compiled regex)
import NaturalLanguage
class LocalTextModerator {
private let forbiddenPatterns: NSRegularExpression
init() {
// Compile pattern once at initialization
let patterns = ["word1", "word2"].joined(separator: "|")
forbiddenPatterns = try! NSRegularExpression(
pattern: "\\b(\(patterns))\\b",
options: [.caseInsensitive]
)
}
func quickCheck(_ text: String) -> ModerationResult {
let range = NSRange(text.startIndex..., in: text)
if forbiddenPatterns.firstMatch(in: text, range: range) != nil {
return .blocked(reason: .explicitContent)
}
return .passed
}
}
Don't keep word list in binary openly — Apple reviewers sometimes check. Better: encrypted list, decrypted on first launch, or loaded from server during initialization.
Server-Side Moderation: Main Layer
OpenAI Moderation API — free (as of March 2025) and accurate:
POST https://api.openai.com/v1/moderations
Authorization: Bearer <key>
{
"input": "text to check",
"model": "omni-moderation-latest"
}
Response contains categories and category_scores:
{
"results": [{
"flagged": false,
"categories": {
"hate": false,
"harassment": false,
"sexual": false,
"violence": false,
"self-harm": false
},
"category_scores": {
"hate": 0.0023,
"harassment": 0.0156,
"sexual": 0.0001
}
}]
}
Never call OpenAI Moderation from client — API key on client. Always via backend-proxy.
Moderation Pipeline Architecture
User inputs text
↓
[Client] Local check (instant)
↓ passed
[Backend] OpenAI Moderation API (100–300 ms)
↓ passed
[Backend] Custom rules (regex, domain-specific)
↓ passed
Publish content
↓ parallel
[Backend] Async re-check (more expensive model)
Two-level check: synchronous (for immediate response) and asynchronous (for deep analysis). Async result may lead to retroactive deletion.
Handling Edge Cases
OpenAI Moderation doesn't give binary answer — it's probabilities. Need business logic for "gray zone":
// Android: process moderation results
fun evaluateModerationResult(result: ModerationResult): ContentDecision {
return when {
result.flagged -> ContentDecision.BLOCK
result.categoryScores["harassment"]!! > 0.7 -> ContentDecision.BLOCK
result.categoryScores["harassment"]!! > 0.3 -> ContentDecision.REQUIRE_REVIEW
result.categoryScores["sexual"]!! > 0.4 -> ContentDecision.REQUIRE_REVIEW
else -> ContentDecision.ALLOW
}
}
REQUIRE_REVIEW — content goes to moderation queue. Published with delay or immediately with reduced visibility.
Multilingual Moderation
OpenAI Moderation works with Russian, but quality on non-standard forms (transliteration, intentional typos, leetspeak) is worse. Additional layer: normalize text before checking.
func normalizeText(_ text: String) -> String {
var result = text.lowercased()
// Transliteration to Cyrillic
let translitMap: [String: String] = ["a": "а", "e": "е", "o": "о", "p": "р", "c": "с"]
for (latin, cyrillic) in translitMap {
result = result.replacingOccurrences(of: latin, with: cyrillic)
}
// Remove repeated characters: "priivyet" → "privet"
result = result.replacingOccurrences(of: "(.)\\1{2,}", with: "$1", options: .regularExpression)
return result
}
Check both normalized and original text.
Rate Limiting and Abuse
If moderation is paid or expensive — protect against flooding:
- Backend rate limiting: 20 posts/minute per user
- Shadowban: user with violation history undergoes stricter check automatically
- Temporary block: at 3 violations per 24 hours — temporary publishing block
Logging for Appeals
Users challenge blocks. Need to log: original text, moderation result, reason, timestamp, model version. This also helps improve thresholds over time.
Timelines
Backend with OpenAI Moderation + basic client filter — 2–3 days. Full system with pipeline, manual moderation, normalization, analytics, and appeals — 2–3 weeks. Cost calculated individually.







