Machine Translation Implementation
Machine translation evolved from statistical models (Moses) through neural (seq2seq+attention) to modern transformers. Today, high-quality ready models exist for most language pairs—task reduces to choosing right model and integration.
Translation Model Selection
Ready APIs (best quality, simplicity):
- Google Cloud Translation API: 500K characters/month free, >100 languages, $20/1M characters
- DeepL API: exceeds Google for European languages, $5.99/month for 500K characters
- OpenAI GPT-4o: for context-dependent translation (marketing, literature)
Open-source models (privacy, on-premise, no API costs):
- MarianMT (Helsinki-NLP): compact models for 1000+ language pairs, Hugging Face
- NLLB-200 (Meta): 200 languages including rare ones, quality near Google for many pairs
- SeamlessM4T (Meta): multimodal—text and speech, 100+ languages
- Opus-MT: large collection of trained MarianMT models
from transformers import MarianMTModel, MarianTokenizer
model_name = "Helsinki-NLP/opus-mt-ru-en"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
def translate(texts: list[str]) -> list[str]:
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
translated = model.generate(**inputs)
return tokenizer.batch_decode(translated, skip_special_tokens=True)
Specialized Translation
Ready models struggle with domain terminology. Strategies:
Terminology Dictionaries: post-process translation with approved term replacement. sacremoses library for detokenization, then regex replacement.
Fine-tuning on domain data: 10K–100K parallel sentences from your field. MarianMT trains on one GPU in hours. Quality grows 3–8 BLEU for specialized texts.
Prompt engineering for LLM: GPT-4o with instruction "translate medical texts, preserve Latin terms" without fine-tuning.
Quality Post-Processing
Automatic translation evaluation:
- BLEU: standard metric, but correlates with quality only on large sets
-
COMET: neural metric, better correlates with human ratings (model
Unbabel/wmt22-comet-da) - chrF: good for morphologically rich languages (Russian)
In production: A/B test two models on real users—engagement, page time, explicit ratings.
Long Text Processing
MarianMT limited to 512 tokens. For long documents:
- Split into sentences:
nltk.sent_tokenizeorspacy - Translate per sentence
- Assemble preserving formatting
For GPT-4o: chunk by paragraphs with overlap (last sentence of previous chunk)—preserves context for coherent transitions.
Performance
| Model | Speed (CPU) | Speed (GPU) | Quality ru-en |
|---|---|---|---|
| MarianMT | 50–100 words/sec | 500–1000 words/sec | BLEU ~35 |
| NLLB-200 | 20–50 words/sec | 200–500 words/sec | BLEU ~38 |
| GPT-4o-mini API | — | ~500 words/sec | BLEU ~42 |
| DeepL API | — | ~2000 words/sec | BLEU ~44 |
For on-premise with GPU budget: NLLB-200 on A10G yields good quality with full data control.







