Model Conversion to TensorFlow Lite Format for Mobile Devices
TensorFlow Lite — standard format for ML on Android, iOS, and embedded Linux. Supports hardware acceleration via NNAPI (Android), GPU delegate, Hexagon DSP, Apple Core ML delegate.
Conversion Pipeline
TF/Keras → TFLite:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] # post-training quantization
tflite_model = converter.convert()
PyTorch → ONNX → TFLite:
PyTorch lacks direct path. torch.onnx.export → onnx-tf → TFLite. Losses in double conversion — careful testing mandatory.
Quantization
Post-Training Quantization:
- Dynamic range: weights quantized to INT8, activations remain float. Minimal quality loss
- Full integer: both weights and activations INT8. Requires representative dataset for calibration. Best performance
- Float16: good for GPU delegate
Quantization-Aware Training (QAT): training with quantization simulation → better quality at INT8
Delegate Selection
| Platform | Delegate | Acceleration (vs CPU) |
|---|---|---|
| Android GPU | GPU Delegate | 3–10x |
| Qualcomm | NNAPI/Hexagon | 5–20x |
| iOS | Core ML Delegate | 5–15x |
| Edge TPU | EdgeTPU Delegate | 100x (INT8 only) |







