Named Entity Recognition (NER) Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Named Entity Recognition (NER) Implementation
Medium
~3-5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Named Entity Recognition (NER) Implementation

NER (Named Entity Recognition)—task of recognizing and classifying entities mentioned in text: persons, organizations, locations, dates, monetary amounts, products. Fundamental component of most text processing systems.

Standard Entity Types and Extensions

Base types (CoNLL-2003 standard): PER (persons), ORG (organizations), LOC (locations), MISC (miscellaneous).

For business applications, standard set is insufficient. Typical extensions:

  • Finance: MONEY, PERCENT, DATE, TICKER, FINANCIAL_INSTRUMENT
  • Medicine: DISEASE, DRUG, DOSAGE, PROCEDURE, ANATOMY
  • Law: LAW, COURT, CASE_NUMBER, LEGAL_ENTITY
  • Logistics: ADDRESS, POSTAL_CODE, VEHICLE_ID, CARGO

Tools for Russian NER

natasha—best choice for basic Russian tasks:

from natasha import Segmenter, MorphVocab, NewsEmbedding, NewsNERTagger, Doc

segmenter = Segmenter()
emb = NewsEmbedding()
ner_tagger = NewsNERTagger(emb)

doc = Doc("Gazprom signed contract with German company Wintershall in Berlin.")
doc.segment(segmenter)
doc.tag_ner(ner_tagger)
# [(Gazprom, ORG), (Wintershall, ORG), (Berlin, LOC)]

spaCy with Russian model (ru_core_news_lg): good speed-quality balance, integration into production pipelines.

BERT-based (DeepPavlov, Hugging Face): DeepPavlov/rubert-base-cased-ner—for high quality on complex texts.

Fine-tuning for Custom Entities

For custom entity types, you need own corpus and fine-tuning:

  1. Annotation: Prodigy, Label Studio, or Doccano. Minimum 200–500 examples per entity type
  2. Format: IOB2 (BIO-tagging)—NER standard
  3. Training: HuggingFace TokenClassification with pretrained RuBERT
from transformers import AutoModelForTokenClassification, TrainingArguments
model = AutoModelForTokenClassification.from_pretrained(
    "DeepPavlov/rubert-base-cased",
    num_labels=len(label_list),
    id2label=id2label,
    label2id=label2id
)

NER Quality Assessment

Entity-level F1 (strict)—main metric. "Strict" means: correct type AND correct span boundaries. Partial match counts as error.

Typical Russian text scores:

  • PER: F1 95–97% (easily recognizable patterns)
  • ORG: F1 88–93% (many abbreviations, acronyms)
  • LOC: F1 90–95%
  • Custom domain entities: 80–90% after fine-tuning on 1K+ examples

Complex Cases

  • Nested entities: "Ministry of Finance of Russia"—(ORG + LOC). Most standard models don't support nesting; needs specialized architectures (Span-BERT, biaffine NER)
  • Scattered entities: "LLC... (hereinafter—Company)"—coreference requires separate module
  • Ambiguity: "Apple"—company or fruit? Resolved via context (transformers handle well)

Deployment

spaCy: export to .spacy format, serving via FastAPI. BERT: ONNX export for CPU, TorchServe for GPU. Latency: spaCy CPU ~5ms/sentence, BERT ONNX CPU ~30ms/sentence.