AI Car Make and Model Recognition in Mobile Applications
Recognizing car make and model from photo—well-studied task. Models trained on Stanford Cars Dataset (196 classes) or CompCars achieve 90%+ accuracy on clean side shots. Production complexity: angles, partial visibility (front or rear only), night conditions, niche market vehicles.
Ready APIs and Limitations
| Service | Model Count | Notes |
|---|---|---|
| CarAPI / CarQuery | 10,000+ | Good for classification, weak on old/rare cars |
| AutoVIN API | Broad base | VIN-decoding with photo |
| Imagga | Custom tags | Requires automotive fine-tuning |
| Google Cloud AutoML Vision | Custom | Needs own markup |
For most projects: custom CoreML/TFLite model on EfficientNetV2 basis, fine-tuned on combined dataset (Stanford Cars + VMMRdb). Model size 25–40 MB, Top-1 accuracy on popular models—88–93%.
iOS Implementation with CoreML
class CarRecognitionService {
private lazy var model: VNCoreMLModel = {
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
let mlModel = try! CarClassifierV3(configuration: config).model
return try! VNCoreMLModel(for: mlModel)
}()
func recognize(image: UIImage) async throws -> [CarPrediction] {
guard let cgImage = image.cgImage else { throw CarError.invalidImage }
return try await withCheckedThrowingContinuation { continuation in
let request = VNCoreMLRequest(model: model) { request, error in
if let error = error {
continuation.resume(throwing: error)
return
}
let results = (request.results as? [VNClassificationObservation]) ?? []
let predictions = results
.filter { $0.confidence > 0.05 }
.prefix(5)
.map { CarPrediction(
makeModel: $0.identifier, // "Toyota Camry 2022"
confidence: $0.confidence
)}
continuation.resume(returning: Array(predictions))
}
// Image orientation normalization critical—otherwise accuracy drops
request.imageCropAndScaleOption = .centerCrop
let handler = VNImageRequestHandler(cgImage: cgImage,
orientation: image.cgImageOrientation)
try? handler.perform([request])
}
}
}
Parameter imageCropAndScaleOption = .centerCrop isn't obvious. By default, Vision framework scales differently than model expects during training, causing 5–8% accuracy loss.
Multi-Angle Classification
For high-accuracy tasks (insurance apps, dealerships), one photo insufficient. Request three angles:
enum CarPhotoAngle: CaseIterable {
case frontThreeQuarter // 3/4 front—optimal for make/model
case rear // for rear verification
case side // strict side—for body and generation
var instruction: String {
switch self {
case .frontThreeQuarter: return "Photograph car front-side (45°)"
case .rear: return "Photograph from rear"
case .side: return "Photograph strictly from side"
}
}
}
// Aggregate results from three photos—weighted voting
func aggregatePredictions(_ predictions: [[CarPrediction]]) -> CarPrediction {
let weights: [Double] = [0.5, 0.3, 0.2] // frontThreeQuarter most important
// ... weighted voting by makeModel
}
Determining Year and Generation
Year visually harder than make/model: restylings change appearance minimally. Two approaches:
- Generation classifier (separate head in multi-task model)
- Hybrid: VIN via OCR (if visible) + visual generation classification
VIN approach more precise: if OCR reads VIN from plate/frame, all data (make, model, year, trim) decode without AI via NHTSA API or paid VIN decoders.
Timeline Estimates
Integrating ready CoreML model with result display UI—3–5 days. Full system with multi-angle capture, hybrid VIN+Visual approach, car spec database, iOS + Android—1–2 weeks.







