AI Message Toxicity Detection for Mobile App

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

AI Message Toxicity Detection for Mobile App

Medium

~3-5 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
760
Development of a mobile application for XOOMER
640
Development of a mobile application for RHL
1056
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
874
Development of a mobile application for the FLAVORS company
449

Show more works

AI-Powered Message Toxicity Detection for Mobile Apps

Toxicity and spam are different tasks. Spam detected by repetition patterns and behavior. Toxic message is unique, written by real human, often grammatically correct — makes detection much harder.

Main Technical Problem

General toxicity models like unitary/toxic-bert work well on English Reddit dataset. In Russian-language app, they false positive on words with culture-specific connotation and miss masked profanity with character substitution (standard circumvention practice in CIS audience). Same story with Ukrainian and Belarusian.

Another trap — sync model call before message send. User presses "send", waits 800 ms — UX broken. Detection should be either async post-processing or fast enough that delay is unnoticed.

Architecture That Actually Works

Multi-Level Classification

Level 1 — on-device, fast: regex + dictionary of 2000 obvious toxic patterns, including leetspeak variants. Processed in < 5 ms, no network. Catches 60–65% of toxic messages with minimal false positive.

Level 2 — server ML: fine-tuned model on Russian dataset (RuToxic or similar from Hugging Face). Called async after message display — if triggered, message hidden and replaced with placeholder.

// Android: optimistic sending + async toxicity check
fun sendMessage(text: String) {
    val tempMessage = Message(text = text, status = MessageStatus.PENDING_REVIEW)
    chatAdapter.addMessage(tempMessage)  // show immediately

    viewModelScope.launch {
        val result = toxicityRepository.classify(text)
        if (result.isToxic && result.confidence > 0.78f) {
            chatAdapter.updateMessageStatus(tempMessage.id, MessageStatus.HIDDEN)
            showToxicityNotice()
        } else {
            chatAdapter.updateMessageStatus(tempMessage.id, MessageStatus.VISIBLE)
        }
    }

    messageApi.send(tempMessage)
}

This approach — "optimistic UI" + post-facto check — solves delay problem. User sees message instantly, check runs in parallel.

Multilingual Support via xlm-roberta-base

For apps with multi-country audience, use xlm-roberta-base, fine-tuned on mixed dataset. Model in ONNX format deployed via FastAPI endpoint. Important: inference should run in batches for high traffic — onnxruntime supports dynamic batching, giving ~4x throughput vs sequential processing.

Granular Categories Instead of Binary Label

Instead of simple "toxic/not" model returns score vector:

Category	Auto-Block Threshold	Human Review Threshold
hate_speech	0.85	0.60
insult	0.90	0.70
threat	0.80	0.55
obscenity	0.88	0.65

This lets tune moderation policy for app type: kids' app — stricter thresholds, adult forum — looser.

iOS: Core ML Pre-Filter

On iOS implement pre-filter via Core ML with Text Classifier model converted via coremltools:

let request = NLModel(mlModel: toxicityModel.model)
let prediction = request.predictedLabel(for: text) ?? "safe"
let confidence = request.predictedLabelHypotheses(for: text, maximumCount: 2)

if prediction == "toxic", let score = confidence["toxic"], score > 0.9 {
    return .block
}

NaturalLanguage.framework with custom NLModel — cleanest path for iOS, requires no build dependencies.

Process

Dataset collection: export historical user reports, label via Label Studio or Toloka.

Fine-tune base model on domain-specific data.

Deploy inference API + integrate into mobile clients.

Tune thresholds based on precision/recall tradeoff per product requirements.

Monitoring: share of auto-blocked messages, false positive rate from user complaints.

Timeline Guidance

Basic integration of ready multilingual model — 4–6 days. Fine-tuning on own dataset and deployment — 2–3 weeks additional. Full system with categorization, human review queue, feedback loop — 4–6 weeks.