Mobile Application Development for Business Card Scanner
Task looks simple: photograph business card — get contact in address book. In practice between shot and correctly filled CNContact — chain where everything that can break does: poor lighting, non-standard fonts, bilingual cards, vertical text orientation on Japanese cards.
Text Recognition: Vision vs ML Kit vs Cloud
On iOS first choice — Vision framework with VNRecognizeTextRequest. Since iOS 16 recognition accuracy increased, supports 18 languages, works completely offline. Sufficient for most tasks.
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
let strings = observations.compactMap { $0.topCandidates(1).first?.string }
self.parseBusinessCard(lines: strings)
}
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
request.recognitionLanguages = ["ru-RU", "en-US"]
let handler = VNImageRequestHandler(cgImage: image, options: [:])
try? handler.perform([request])
On Android — ML Kit Text Recognition v2. Supports Latin, Cyrillic, Chinese, Japanese, Korean right out of box without extra model downloads. Important note: TextRecognizer must close via close() after use, otherwise — native resource leak.
When maximum accuracy needed or exotic font support — integrate Google Cloud Vision API or AWS Textract. Cloud options give structured output with block, line, word division with bounding box.
Parsing Recognized Text to Contact Fields
OCR gives array of lines. Convert to {name: "Ivanov Ivan", phone: "+7 999 123-45-67", email: "[email protected]", title: "CTO"} — separate task.
Regular expressions cover phones and email reliably. Names and titles — harder. Good approach: NER (Named Entity Recognition) via CoreML model or lightweight on-device NLP. Apple NaturalLanguage framework with NLTagger for token type detection (personalName, organizationName) works well for English and Russian.
Typical problem: name and title next to each other without explicit separators. Context matters — if line contains dictionary word for job titles (CEO, director, manager), it's likely title.
For bilingual cards (common in CIS B2B: Russian on one side, English on other) need detecting language per line via NLLanguageRecognizer / LanguageIdentification from ML Kit and applying corresponding parsing rules.
Image Capture Quality
Final OCR accuracy directly depends on shot quality. Several things really matter:
-
Perspective correction — card held at angle, need straightening. On iOS
CIPerspectiveCorrection+VNDetectRectanglesRequestfor finding card borders. On Android —OpenCVor ML KitObjectDetector. -
Contrast enhancement —
CIColorControlswith increased contrast and reduced saturation helps with gray text on white background. -
Automatic capture — detect card in frame via
VNDetectRectanglesRequestand shoot automatically when card occupies >60% frame and stable 0.5 seconds. Manual "press button" degrades quality due to hand shake.
Implementation Process
Audit: target card languages, need offline mode, CRM integration or just device contacts.
Implementation: capture with auto-detect → perspective correction → OCR → field parsing → manual edit before save (mandatory — OCR makes mistakes).
Testing: set of 100+ real cards different quality and formats.
Timeline Estimates
Scanner with Vision/ML Kit, basic parsing and contact save — 2–3 weeks. With cloud OCR, NER, multi-language support and CRM integration — 5–7 weeks.







