Text Classification Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Text Classification Implementation
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Text Classification Implementation

Text classification—task of assigning label(s) to text. Email topic, article category, request type, document language—beneath apparent simplicity lies many technical decisions that fundamentally affect quality.

Problem Statement and Approach Selection

Before choosing architecture, define task parameters:

  • Number of classes: 2–5 (binary/simple multiclass) vs 20–100+ (hierarchical)
  • Annotation volume: do you have 500+ examples per class?
  • Language: English, Russian, multilingual
  • Latency requirements: real-time (<100ms) vs batch
  • Interpretability: must you explain decision?

These parameters determine the stack. Mistake—automatically jumping to BERT when logistic regression solves the task in 50ms.

Hierarchy of Approaches

Level 1 — Classic ML (TF-IDF / BOW + Logistic Regression / SVM / LightGBM):

  • When sufficient: clear topics, lots of annotation, need interpretability, latency < 10ms
  • scikit-learn Pipeline: TfidfVectorizer → LogisticRegression
  • Accuracy on typical tasks: 85–92%

Level 2 — FastText:

  • When: need quick training on large volume, multilingual task
  • Training 100K examples: < 30 seconds
  • Inference: ~1ms per text
  • Quality close to BERT for pure topic classifiers

Level 3 — Transformer Fine-tuning:

  • BERT / RoBERTa / DeBERTa for English
  • ruBERT / ruRoBERTa for Russian
  • When: complex classes, little annotation (few-shot), need high precision
  • Training: 2–10 epochs on GPU, 15–60 minutes for typical dataset

Level 4 — LLM with Prompting:

  • Zero-shot / few-shot via GPT-4o-mini or Claude
  • When: no annotation, need quick start, or classes are descriptive
  • Drawbacks: latency 500ms–2s, cost at scale

BERT Fine-tuning with PyTorch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
import numpy as np

model_name = "DeepPavlov/rubert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, num_labels=len(label2id)
)

def tokenize(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )

training_args = TrainingArguments(
    output_dir="./classifier",
    num_train_epochs=5,
    per_device_train_batch_size=16,
    learning_rate=2e-5,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    load_best_model_at_end=True,
)

Handling Imbalanced Classes

Real data is almost always imbalanced. Strategies:

  • Class weights: compute_class_weight('balanced', ...) — passed to loss function
  • Oversampling: SMOTE for embeddings or text augmentation (paraphrase)
  • Undersampling: only if majority class is truly excessive
  • Focal Loss: for extreme imbalance (1:100+)

Monitor per-class F1, not just accuracy—95% accuracy with 5% rare class means nothing.

Multiclass vs Multilabel Classification

For multilabel (text can have multiple labels simultaneously):

  • Replace softmax with sigmoid in final layer
  • Use BCEWithLogitsLoss instead of CrossEntropyLoss
  • Classification threshold tuned separately per class (maximize F1)

Classifier Deployment

Inference Optimization:

  • ONNX export: 2–4x CPU inference speedup
  • Quantization (INT8): 4x memory reduction, accuracy degradation < 1%
  • TorchScript: for production PyTorch serving

Serving:

# ONNX Runtime
pip install onnxruntime
# Export
from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained("./classifier", export=True)

ONNX+INT8 latency on CPU: 20–50ms for 512-token text.

Metrics and Monitoring

  • F1 Macro — main metric for imbalanced tasks
  • Confusion matrix — mandatory in initial assessment
  • Calibration curve — if you need reliable probabilities

In production: monitor distribution shift via KL-divergence of predicted class distribution. Signal: metric exceeds historical corridor → retrain model.

Implementation Timeline

  • Baseline (TF-IDF + ML): 3–5 days (including annotation)
  • BERT fine-tuning: 1–2 weeks
  • Production with monitoring: 3–5 weeks