AI Message Toxicity Detection for Mobile App

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
AI Message Toxicity Detection for Mobile App
Medium
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    760
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    640
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1056
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    874
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    449

AI-Powered Message Toxicity Detection for Mobile Apps

Toxicity and spam are different tasks. Spam detected by repetition patterns and behavior. Toxic message is unique, written by real human, often grammatically correct — makes detection much harder.

Main Technical Problem

General toxicity models like unitary/toxic-bert work well on English Reddit dataset. In Russian-language app, they false positive on words with culture-specific connotation and miss masked profanity with character substitution (standard circumvention practice in CIS audience). Same story with Ukrainian and Belarusian.

Another trap — sync model call before message send. User presses "send", waits 800 ms — UX broken. Detection should be either async post-processing or fast enough that delay is unnoticed.

Architecture That Actually Works

Multi-Level Classification

Level 1 — on-device, fast: regex + dictionary of 2000 obvious toxic patterns, including leetspeak variants. Processed in < 5 ms, no network. Catches 60–65% of toxic messages with minimal false positive.

Level 2 — server ML: fine-tuned model on Russian dataset (RuToxic or similar from Hugging Face). Called async after message display — if triggered, message hidden and replaced with placeholder.

// Android: optimistic sending + async toxicity check
fun sendMessage(text: String) {
    val tempMessage = Message(text = text, status = MessageStatus.PENDING_REVIEW)
    chatAdapter.addMessage(tempMessage)  // show immediately

    viewModelScope.launch {
        val result = toxicityRepository.classify(text)
        if (result.isToxic && result.confidence > 0.78f) {
            chatAdapter.updateMessageStatus(tempMessage.id, MessageStatus.HIDDEN)
            showToxicityNotice()
        } else {
            chatAdapter.updateMessageStatus(tempMessage.id, MessageStatus.VISIBLE)
        }
    }

    messageApi.send(tempMessage)
}

This approach — "optimistic UI" + post-facto check — solves delay problem. User sees message instantly, check runs in parallel.

Multilingual Support via xlm-roberta-base

For apps with multi-country audience, use xlm-roberta-base, fine-tuned on mixed dataset. Model in ONNX format deployed via FastAPI endpoint. Important: inference should run in batches for high traffic — onnxruntime supports dynamic batching, giving ~4x throughput vs sequential processing.

Granular Categories Instead of Binary Label

Instead of simple "toxic/not" model returns score vector:

Category Auto-Block Threshold Human Review Threshold
hate_speech 0.85 0.60
insult 0.90 0.70
threat 0.80 0.55
obscenity 0.88 0.65

This lets tune moderation policy for app type: kids' app — stricter thresholds, adult forum — looser.

iOS: Core ML Pre-Filter

On iOS implement pre-filter via Core ML with Text Classifier model converted via coremltools:

let request = NLModel(mlModel: toxicityModel.model)
let prediction = request.predictedLabel(for: text) ?? "safe"
let confidence = request.predictedLabelHypotheses(for: text, maximumCount: 2)

if prediction == "toxic", let score = confidence["toxic"], score > 0.9 {
    return .block
}

NaturalLanguage.framework with custom NLModel — cleanest path for iOS, requires no build dependencies.

Process

Dataset collection: export historical user reports, label via Label Studio or Toloka.

Fine-tune base model on domain-specific data.

Deploy inference API + integrate into mobile clients.

Tune thresholds based on precision/recall tradeoff per product requirements.

Monitoring: share of auto-blocked messages, false positive rate from user complaints.

Timeline Guidance

Basic integration of ready multilingual model — 4–6 days. Fine-tuning on own dataset and deployment — 2–3 weeks additional. Full system with categorization, human review queue, feedback loop — 4–6 weeks.