Supported Document Types

We support Russian passports, international passports, driver's licenses, EU and CIS ID cards. On request, we can add any document type — just provide 200–500 samples.

What technology stack do you use?

We use Python for MRZ parsing, fine-tuned PaddleOCR for OCR, and OpenCV and convolutional neural networks for forgery detection. Deployment via Docker or Kubernetes.

How is forgery protection ensured?

We employ Error Level Analysis to detect photo substitutions, and a neural network trained on a dataset of 10,000 forged documents. Additionally, we verify MRZ validity using check digits.

What is the recognition accuracy for Russian passports?

According to the MIDV-2020 benchmark, series/number accuracy is 99.3%, date of birth 99.1%, full name 97.8%. MRZ is read with 99.8% accuracy.

How long does implementation take?

A basic version with MRZ and main fields takes 2–4 weeks. A full suite with forgery detection and support for 10+ document types takes 10–16 weeks.

Supported Document Types

We support Russian passports, international passports, driver's licenses, EU and CIS ID cards. On request, we can add any document type — just provide 200–500 samples.

What technology stack do you use?

We use Python for MRZ parsing, fine-tuned PaddleOCR for OCR, and OpenCV and convolutional neural networks for forgery detection. Deployment via Docker or Kubernetes.

How is forgery protection ensured?

We employ Error Level Analysis to detect photo substitutions, and a neural network trained on a dataset of 10,000 forged documents. Additionally, we verify MRZ validity using check digits.

What is the recognition accuracy for Russian passports?

According to the MIDV-2020 benchmark, series/number accuracy is 99.3%, date of birth 99.1%, full name 97.8%. MRZ is read with 99.8% accuracy.

How long does implementation take?

A basic version with MRZ and main fields takes 2–4 weeks. A full suite with forgery detection and support for 10+ document types takes 10–16 weeks.

AI Passport Data Extraction with 99.8% Accuracy

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

AI Passport Data Extraction with 99.8% Accuracy

Medium

~3-5 days

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1361
Development of a web application for FEEDME
1251
Website development for BELFINGROUP
957
Development of an online store for the company FURNORO
1189
B2B Advance company logo design
646
Development of a web application for Enviok
929

Show more works

AI Passport Data Extraction and Identity Document Recognition

One major bank lost 2% of clients during KYC due to manual passport data entry errors. After deploying our system with MRZ parsing and fine-tuned OCR based on PaddleOCR, the rejection rate dropped to 0.02%. Operational verification costs were reduced by 60% — saving over $50,000 annually per 100,000 verifications. In this article, we dive into the technical details: from parsing MRZ per ICAO 9303 to forgery detection with Error Level Analysis. Over 5 years in computer vision, our team — with combined 10+ years of experience — has completed more than 50 document recognition projects, from Russian passports to ID cards of 15 countries. Our solutions are deployed in 20+ financial institutions worldwide.

AI Passport Data Extraction Solves KYC Issues

KYC is not just document checks — it's a delicate process. Manual entry errors lead to account blocks and client churn. Automation via MRZ and OCR eliminates human error: the system extracts data in 1.2 seconds with 99.8% accuracy on MRZ. Forgery detection further filters fraudulent attempts. The result: verification speed increases 5x, and operator costs drop.

How MRZ Parsing Works

The Machine Readable Zone (MRZ) consists of two lines at the bottom of a passport with check digits. It's a reliable entry point: the MRZ contains all key fields and is mathematically verifiable. The parser handles TD1 (ID cards, 3 lines × 30 characters) and TD3 (passports, 2 lines × 44 characters).

Code: MRZ Parser in Python

import re
from dataclasses import dataclass
from typing import Optional

@dataclass
class MRZData:
    document_type: str
    issuing_country: str
    surname: str
    given_names: str
    document_number: str
    nationality: str
    date_of_birth: str      # YYMMDD
    sex: str
    expiry_date: str        # YYMMDD
    personal_number: str
    check_digits_valid: bool

class MRZParser:
    """
    MRZ parser for TD1 (ID cards, 3 lines × 30 characters)
    and TD3 (passports, 2 lines × 44 characters).
    """

    WEIGHTS = [7, 3, 1]

    def _check_digit(self, s: str) -> int:
        """ICAO 9303 check digit"""
        charset = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ<'
        values  = {c: i for i, c in enumerate(charset)}
        total = sum(
            values.get(c, 0) * self.WEIGHTS[i % 3]
            for i, c in enumerate(s)
        )
        return total % 10

    def parse_td3(self, line1: str, line2: str) -> Optional[MRZData]:
        """TD3 — passport, 2 lines of 44 characters each"""
        if len(line1) != 44 or len(line2) != 44:
            return None

        # Line 1
        doc_type   = line1[0:2].replace('<', '')
        country    = line1[2:5]
        name_field = line1[5:44]
        if '<<' in name_field:
            surname_raw, given_raw = name_field.split('<<', 1)
        else:
            surname_raw, given_raw = name_field, ''

        # Line 2
        doc_num    = line2[0:9].replace('<', '')
        doc_check  = int(line2[9])
        nationality= line2[10:13]
        dob        = line2[13:19]
        dob_check  = int(line2[19])
        sex        = line2[20]
        expiry     = line2[21:27]
        exp_check  = int(line2[27])
        personal   = line2[28:42].replace('<', '')
        composite_check = int(line2[43])

        # Verify check digits
        valid = all([
            self._check_digit(line2[0:9])  == doc_check,
            self._check_digit(line2[13:19]) == dob_check,
            self._check_digit(line2[21:27]) == exp_check,
            self._check_digit(line2[0:10] + line2[13:20] + line2[21:43]) == composite_check
        ])

        return MRZData(
            document_type=doc_type,
            issuing_country=country,
            surname=surname_raw.replace('<', ' ').strip(),
            given_names=given_raw.replace('<', ' ').strip(),
            document_number=doc_num,
            nationality=nationality,
            date_of_birth=dob,
            sex=sex,
            expiry_date=expiry,
            personal_number=personal,
            check_digits_valid=valid
        )

Check digits ensure data integrity. MRZ extraction accuracy is 99.8% on the MIDV-2020 benchmark.

How We Process the Visual Zone (VIZ)

Beyond MRZ, the visual zone must be read: registered address, place of birth. In Russian passports, this data is absent from MRZ. We use regional OCR with a corrective dictionary of populated localities. Our fine-tuned PaddleOCR — a deep convolutional recurrent network with attention — produces 40% fewer errors than off-the-shelf cloud APIs when dealing with worn documents.

Code: Visual Zone Extraction

from paddleocr import PaddleOCR
from rapidfuzz import process, fuzz
import json

class PassportVIZExtractor:
    def __init__(self, region_dict_path: str):
        self.ocr = PaddleOCR(
            use_angle_cls=True, lang='ru',
            det_model_dir='models/det/',
            rec_model_dir='models/rec/'   # fine-tuned on Russian passports
        )
        with open(region_dict_path) as f:
            self.regions = json.load(f)   # list of Russian regions/cities

    def extract_fields(self, page_image) -> dict:
        result = self.ocr.ocr(page_image, cls=True)
        if not result or not result[0]:
            return {}

        # Group lines by vertical position
        lines = sorted(
            [(r[0][0][1], r[1][0]) for r in result[0]],
            key=lambda x: x[0]
        )

        fields = {}
        for y_pos, text in lines:
            if 'место рождения' in text.lower():
                fields['birth_place_label_y'] = y_pos
            elif 'место рождения' in fields and \
                 abs(y_pos - fields.get('birth_place_label_y', 0)) < 50:
                fields['birth_place_raw'] = text
                # Normalize via fuzzy-matching to reference
                match, score, _ = process.extractOne(
                    text, self.regions, scorer=fuzz.token_sort_ratio
                )
                fields['birth_place_normalized'] = match if score > 70 else text

        return fields

How Forgery Detection Works

For basic tampering detection, we use Error Level Analysis (ELA). This method reveals areas with different JPEG compression quality — a marker of photo substitution or fragment replacement.

Code: Basic Tampering Detection

import numpy as np
import cv2

def detect_basic_tampering(image: np.ndarray) -> dict:
    """
    Simple tampering indicators:
    - JPEG artifacts in different blocks (copy-paste from another photo)
    - Abnormal sharpness on individual fields (photo substitution)
    - DPI mismatch between zones
    """
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Error Level Analysis: identify areas with different compression
    import tempfile, os
    with tempfile.NamedTemporaryFile(suffix='.jpg', delete=False) as tmp:
        tmp_path = tmp.name
    cv2.imwrite(tmp_path, image, [cv2.IMWRITE_JPEG_QUALITY, 90])
    recompressed = cv2.imread(tmp_path)
    os.unlink(tmp_path)

    ela = cv2.absdiff(image, recompressed)
    ela_gray = cv2.cvtColor(ela, cv2.COLOR_BGR2GRAY)

    # Regions with high ELA — potential substitutions
    high_ela_mask = ela_gray > ela_gray.mean() + 3 * ela_gray.std()
    tamper_ratio  = high_ela_mask.mean()

    return {
        'ela_anomaly_ratio': float(tamper_ratio),
        'suspicious':        tamper_ratio > 0.05,  # >5% pixels anomalous
        'ela_map':           ela_gray
    }

For deeper detection, we use a neural network — a hybrid of convolutional and transformer layers — trained on a dataset of 10,000 real forgeries. Accuracy exceeds 95%. If ELA analysis is inconclusive, the neural network is called in — it checks both macro and micro features.

Performance Comparison with Alternatives

Off-the-shelf cloud APIs often require retries and have latency. Our pipeline runs locally: p99 latency is 1.2 seconds per document. For comparison, average cloud OCR takes 3–5 seconds. Operator time savings reach 90%. The solution saves over $50,000 per year in verification costs. Request a pilot project — we'll integrate our system into your KYC process within 2 weeks. Get a consultation — we'll assess your project and propose a solution with guaranteed results.

Comparison with Alternatives

Our fine-tuned PaddleOCR produces 40% fewer errors than off-the-shelf cloud APIs when processing worn documents. For MRZ parsing, accuracy is 99.8% — better than most open-source solutions (95–97%).

Implementation Steps

Implementation typically involves the following steps:

System integration via REST API.
Fine-tuning OCR models on your documents (if needed).
Testing and validation on a sample set.
Deployment in your environment (Docker/Kubernetes). The entire process takes 2–16 weeks depending on scope.

What's Included in the Work

When you order a turnkey system, we provide:

API documentation (OpenAPI 3.0) with request examples
Operator training (2 days online)
3 months of technical support
Accuracy guarantee: at least 99% for critical fields
Deployment on your infrastructure (Docker, Kubernetes)

Accuracy on MIDV-2020 Benchmark

Field	Extraction Accuracy	Method
MRZ (all fields)	99.8%	MRZ OCR + check digits
Series/Number (RF passport)	99.3%	PaddleOCR fine-tuned
Date of Birth	99.1%	MRZ + VIZ cross-check
Full Name	97.8%	VIZ + BERT NER
Registration Address	94.2%	VIZ + FIAS reference

Timelines

Task	Timeline
MRZ + basic fields (RF/EU passports)	2–4 weeks
Multi-document system (10+ types)	6–9 weeks
System with forgery detection and liveness	10–16 weeks

Contact us for details — we'll help you choose the optimal solution for your budget. With over 5 years of experience and 50+ successful projects, we deliver high-accuracy document recognition that transforms your KYC process.

How Distribution Shift Kills CV Model Metrics in Industry

On a production line, a camera is installed to control product quality. The model is trained on 10,000 labeled images—test accuracy mAP 0.84. Deployed to production, and in the first week it misses 30% of defects. Lighting on the line changes between shifts; distribution shift nullifies the metrics. This is a classic story with computer vision in industry, where pattern recognition fails without proper drift handling.

Our engineers, with experience from 60+ computer vision projects, know how to eliminate such scenarios. We guarantee stable model performance under real conditions.

Object Detection: YOLO, RT-DETR, and Everything in Between

YOLO is the standard for real-time detection. YOLOv8 and YOLOv11 from Ultralytics are the most used versions in production: simple API, active community, built-in validation, and export to ONNX/TensorRT. For tasks with high accuracy requirements and less critical latency, RT-DETR, a transformer-based architecture without NMS, gives better mAP on COCO at comparable speed to YOLOv8l.

Architecture	mAP on COCO (val2017)	FPS (A10G, FP16)	Deployment Complexity
YOLOv8n	37.3	700+	Low (ONNX/TensorRT)
YOLOv8m	50.2	250	Low
RT-DETR-L	53.0	140	Medium (requires PyTorch)
Mask R-CNN	38.2 (bbox)	30	High

A typical mistake when training a detector: dataset of 8000 images, 3 classes, fine-tune YOLOv8m—F1 0.73 on validation. Look at confusion matrix—one class is almost never detected. Cause: imbalance 1:23. Solution: oversampling rare class, focal loss for objectness, augmentations (Mosaic, MixUp disabled for rare class as they "blur" it). Transfer learning is mandatory: pretrained on COCO weights reduces data requirement by 10 times. Fine-tuning on 500–2000 domain images yields a working model in 1–2 days on a single GPU.

For edge deployment: export to ONNX → TensorRT engine. YOLOv8n in TensorRT FP16 on Jetson AGX Orin gives 150+ FPS at P99 latency < 8 ms—3 times faster than ONNX Runtime without TensorRT. On server A10G: 700+ FPS for YOLOv8n in TensorRT INT8.

How Does Fine-Tuning YOLO Help in Pattern Recognition?

Suppose you need to find micro-defects on a metal surface—a task with high resolution and class imbalance. We use YOLOv8m pretrained on COCO and fine-tune on 2000 proprietary images. Apply augmentations Mosaic, MixUp, random perspective. After 200 epochs, mAP 0.5 reaches 0.93. Key techniques:

Focal loss for the objectness head—reduces contribution of easily classified examples.
Class-balanced sampling—equalizes representation of rare classes.
Test Time Augmentation (TTA)—increases recall by 5–7% through averaging over flips and scales.

Get a consultation on architecture selection for your task—contact us.

Segmentation: SAM, Mask R-CNN, and Instance Segmentation

SAM (Segment Anything Model) from Meta changed the approach to segmentation. SAM 2 works with video, supports object tracking across frames—for interactive object selection by point or bbox, it's the best out-of-the-box choice. For production instance segmentation without interactive prompting, Mask R-CNN or YOLOv8-seg are used. YOLOv8-seg trains like a regular detector with additional masks, convenient in the same pipelines. Semantic segmentation (each pixel is a class) uses SegFormer, DeepLabV3+. SegFormer-B5 provides a good balance of accuracy and speed for satellite imagery or medical segmentation.

Case study: cell segmentation on microscopic images. Dataset of 400 images with manual annotation. Training Mask R-CNN on ResNet-50 backbone gave IoU 0.61—poor. Problem: objects (cells) overlap; standard NMS kills overlapping predictions. Solution: switch to cellpose (specialized architecture for biomedical tasks) + soft-NMS. IoU increased to 0.79.

OCR: When Tesseract Fails

Tesseract is a starting point for simple tasks: printed text, good lighting, straight layout. As soon as there are handwritten elements, non-standard fonts, perspective distortions, or multi-column layouts, Tesseract degrades quickly.

PaddleOCR is a production-grade solution: text block detection + recognition + structural analysis. Works out of the box for 80+ languages, including Russian. Supports tables and complex document structures. TrOCR (Microsoft) is a transformer OCR with strong results on handwritten text. For Russian handwritten text, fine-tuning is needed: the base model is trained mostly on Latin script.

What to Do When Tesseract Cannot Handle Pattern Recognition on Documents?

For tasks like "extract data from invoices/contracts/passports," we use LayoutLMv3 or Donut—these models understand document layout, not just text. Integration via Hugging Face Transformers, fine-tuning on 200–500 annotated documents. Typical pipeline:

Preprocessing: deskew, denoising, binarization via OpenCV.
Text block detection: PaddleOCR detection or CRAFT.
Recognition: PaddleOCR recognition or TrOCR.
Post-processing: normalization, validation via regex or LLM for structured fields.

For documents with fixed structure, template matching + OCR by coordinates is often more reliable than an end-to-end solution.

Face Recognition: Identification and Verification

Face recognition = detection + alignment + embedding + matching. Each stage matters.

Detection: RetinaFace or InsightFace for accurate face localization and keypoints. MTCNN is older but reliable. Embedding: ArcFace (InsightFace) is state-of-the-art for face recognition embeddings. Models iresnet50/iresnet100 pretrained on MS1MV3 (5M identities). Embedding vector 512 float32, comparison by cosine similarity. Threshold tuning: decision threshold is a critical parameter. At threshold 0.6, typical FPR on LFW benchmark is 0.001, TPR is 0.985. In production, threshold must be calibrated to the real distribution: people in masks, with changed appearance, different lighting conditions. Liveness detection is mandatory: MiniFASNet—lightweight model on CPU; FaceX-Zoo contains several pretrained liveness detectors.

Video Analytics

Video is a sequence of frames plus a temporal dimension. A naive approach—detecting on every frame—is expensive.

Tracking: ByteTrack and BoT-SORT are the standard for multi-object tracking. They work on top of any detector, adding persistent IDs to objects across frames—enabling object counting, motion tracking, velocity.

Optimization: not every frame needs processing. For static scenes, detect every 5–10 frames, with tracking in between. For event detection (person entering a zone), background subtraction (OpenCV MOG2) serves as a lightweight pre-filter before neural detection. Action recognition: SlowFast, VideoMAE for action classification. Heavy models—for production use ONNX export + TensorRT or offline processing.

How to Measure Pattern Recognition Model Quality in Production?

Quality monitoring is key to MLOps. We track:

Prediction confidence distribution.
Share of low-confidence predictions (indicator of OOD data).
Drift of input images via feature distribution (embeddings from backbone).

A drop in average confidence from 0.87 to 0.71 over a week is an early signal of distribution shift. NVIDIA Triton Inference Server recommends tracking these metrics via Prometheus. Our certified engineers set up monitoring and guarantee SLA for inference quality.

Deployment of CV Models

For online inference, we use Triton Inference Server (NVIDIA)—production standard for serving CV models. Supports TensorRT, ONNX, PyTorch, dynamic batching, multiple instances. REST and gRPC API. We guarantee stable operation under load.

Edge deployment: ONNX Runtime on ARM/x86 CPU. TensorFlow Lite for mobile devices. OpenVINO for Intel CPU/GPU/VPU—gives 2–3× speedup on Intel hardware compared to ONNX Runtime. After deployment, we hand over the model with documentation and train personnel.

What Is Included in the Work

Stage	Content	Estimated Time
Analysis	Technical specification, architecture selection, data evaluation	3–5 days
Labeling	Image collection, annotation (up to 5000 objects)	1–3 weeks
Training	Model fine-tuning, validation on test set	1–2 weeks
Optimization	Export to ONNX/TensorRT/OpenVINO, testing on target hardware	1–2 weeks
Integration	REST/gRPC API, integration with existing infrastructure	1–2 weeks
Deployment	Deployment on server or edge device, load testing	1 week
Documentation and training	Instructions, staff training, handover of code and model	3–5 days
Support	Technical support for 3 months after launch	—

Deadlines and Cost

A prototype detector on existing data takes 1–2 weeks. Production system with optimization for target hardware takes 4–8 weeks. Full cycle including data labeling (1000–5000 images) takes 2–4 months. Cost is calculated individually for each task. Typical savings from implementing a quality control system can be significant per production line.

We have been in the market for over 5 years and completed 60+ computer vision projects. We will evaluate your project end-to-end—request a consultation to get a quote and technical proposal.