Azure Computer Vision OCR Integration

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Azure Computer Vision OCR Integration
Simple
from 1 business day to 3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Integration Azure Computer Vision OCR

Azure Computer Vision предоставляет два OCR-сервиса: Read API (оптимизирован для плотных документов, рекомендуется Microsoft) и старый OCR API (только для простых изображений). Read API 4.0 работает как на cloud, так и в виде контейнера для on-premise развёртывания.

Integration Read API

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import time

class AzureOCR:
    def __init__(self, endpoint: str, api_key: str):
        self.client = ComputerVisionClient(
            endpoint,
            CognitiveServicesCredentials(api_key)
        )

    def extract_text_from_url(self, image_url: str) -> str:
        """Read API: асинхронная обработка через URL"""
        read_response = self.client.read_in_stream(
            open('image.jpg', 'rb'),
            raw=True
        )

        # Получаем operation ID из заголовка
        operation_location = read_response.headers['Operation-Location']
        operation_id = operation_location.split('/')[-1]

        # Ожидание результата
        while True:
            read_result = self.client.get_read_result(operation_id)
            if read_result.status not in [
                OperationStatusCodes.running,
                OperationStatusCodes.not_started
            ]:
                break
            time.sleep(0.5)

        # Извлечение текста
        text_lines = []
        if read_result.status == OperationStatusCodes.succeeded:
            for page in read_result.analyze_result.read_results:
                for line in page.lines:
                    text_lines.append(line.text)

        return '\n'.join(text_lines)

    def extract_with_positions(self, image_path: str) -> list[dict]:
        """Извлечение с координатами bounding boxes"""
        with open(image_path, 'rb') as f:
            read_response = self.client.read_in_stream(f, raw=True)

        operation_id = read_response.headers['Operation-Location'].split('/')[-1]

        while True:
            result = self.client.get_read_result(operation_id)
            if result.status not in [OperationStatusCodes.running,
                                       OperationStatusCodes.not_started]:
                break
            time.sleep(0.3)

        words = []
        if result.status == OperationStatusCodes.succeeded:
            for page in result.analyze_result.read_results:
                for line in page.lines:
                    for word in line.words:
                        words.append({
                            'text': word.text,
                            'confidence': word.confidence,
                            'bbox': word.bounding_box  # [x1,y1,x2,y1,x2,y2,x1,y2]
                        })
        return words

Document Intelligence (бывший Form Recognizer)

Для структурированных документов (счета, договоры, удостоверения) Azure Document Intelligence значительно мощнее базового OCR:

from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

class AzureDocumentIntelligence:
    def __init__(self, endpoint: str, api_key: str):
        self.client = DocumentAnalysisClient(
            endpoint=endpoint,
            credential=AzureKeyCredential(api_key)
        )

    def analyze_invoice(self, image_path: str) -> dict:
        """Специализированный анализ инвойсов"""
        with open(image_path, 'rb') as f:
            poller = self.client.begin_analyze_document(
                'prebuilt-invoice', f
            )

        result = poller.result()
        invoices = []

        for invoice in result.documents:
            fields = invoice.fields
            invoices.append({
                'vendor_name': fields.get('VendorName', {}).get('value'),
                'invoice_date': str(fields.get('InvoiceDate', {}).get('value')),
                'total_amount': fields.get('AmountDue', {}).get('value'),
                'invoice_id': fields.get('InvoiceId', {}).get('value'),
                'line_items': [
                    {
                        'description': item.get('Description', {}).get('value'),
                        'amount': item.get('Amount', {}).get('value')
                    }
                    for item in (fields.get('Items', {}).get('value') or [])
                ]
            })

        return invoices[0] if invoices else {}

On-premise контейнер

Для данных с требованием локальной обработки — Read API Container:

docker run --rm -it -p 5000:5000 \
  -e ApiKey=YOUR_KEY \
  -e Billing=YOUR_ENDPOINT \
  mcr.microsoft.com/azure-cognitive-services/vision/read:3.2

Контейнер работает идентично cloud API, но данные не покидают инфраструктуру.

Сравнение Read API vs Document Intelligence

Возможность Read API Document Intelligence
OCR для произвольного текста Да Да
Структура таблиц Нет Да
Специализированные модели (invoice, ID) Нет Да
Кастомные модели Нет Да
Цена (1000 страниц) $1.50 $10–50
Задача Срок
Базовая интеграция Read API 3–5 дней
Document Intelligence с извлечением полей 1–2 недели
On-premise контейнер + обработка PDF 1–2 недели