AI Model Development for Microcontrollers (TinyML)
Developing ML model for MCU — primarily architectural task: model must be designed with resource constraints from start, not compressed after training.
Design Under Constraints
Model Footprint Budget: RAM (inference time) = activations buffer. Flash = model weights. Typical budget: STM32H7 (1 MB RAM, 2 MB Flash) → model ≤ 300 KB Flash, activations ≤ 100 KB.
Architecture Design:
- MobileNetV3-Small: 2.5 MB, adapts via quantization to 600 KB
- MCUNet: specifically designed for MCU, 1 MB Flash
- EfficientNet-Lite0: good balance for vision
- DS-CNN: depthwise separable CNN, classic for audio
- 1D CNN for time series: 50–200 KB for simple tasks
Neural Architecture Search (NAS) for MCU: Once-for-All, ProxylessNAS — search for optimal architecture under specific constraints.
Training and Optimization
Quantization-Aware Training (QAT): Training with INT8/INT4 quantization simulation. 2–4% more accurate than Post-Training Quantization.
Knowledge Distillation: Small student model trained on soft labels from large teacher. Small student achieves 90–95% of teacher quality at 5–10% size.
Pruning: Structured pruning (entire filters) → deployment-friendly size reduction.
Tools
Edge Impulse: complete pipeline from data to MCU deployment. STM32Cube.AI: STM32 optimization with Neural Engine. TFLite Micro compiler.







