AutoML: AutoGluon, FLAML, Vertex AI AutoML and When It Works
AutoML not "push button, get model." It means "automated hyperparameter search and algorithm selection." Difference critical: AutoML requires correct problem statement, quality data, result understanding. But for specific tasks saves weeks.
What AutoML Does Well
On tabular data AutoML competes with manual ML engineering — sometimes wins. On Kaggle competitions AutoGluon tops 10% without tweaking on many datasets. Reason: stacks different algorithms (LightGBM, XGBoost, CatBoost, neural nets, RF) with stacking — ensemble often beats single model.
Good AutoML candidates:
- Standard binary/multiclass classification or regression on tabular data
- Tasks without specific constraints (latency < 50ms, model < 10MB)
- MVP or baseline before manual optimization
- Teams without deep ML expertise needing working prototype fast
Poor candidates: custom loss tasks, specific architecture requirements, real-time with hard constraints, domain-specific (medical imaging, rare language NLP).
AutoGluon: Details
AutoGluon-Tabular — strongest AutoML for tabular on most benchmarks. Key features:
Multi-level stacking. AutoGluon builds several ensemble layers. First layer models (LightGBM, XGBoost, CatBoost, FastAI tabular, KNN) → predictions as features → second layer. Controlled via num_stack_levels=2.
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(
label='target',
eval_metric='roc_auc',
path='./ag_models'
).fit(
train_data,
time_limit=3600, # 1 hour
presets='best_quality', # vs 'medium_quality', 'high_quality'
)
Preset best_quality includes stacking, max memory/time. medium_quality balance speed/quality, suits >1M rows. optimize_for_deployment removes heavy ensembles, speeds inference.
Gotcha: AutoGluon trains dozens models, saves all — 2–10GB on serious tasks. Deploy via predictor.clone_for_deployment() exporting only final model.
Memory with num_stack_levels=2 on 500k rows: second-layer models need out-of-fold first-layer predictions. Auto-manages but <32GB RAM risks OOM. Solution: ag_args_fit={'num_cpus': 4, 'num_gpus': 0} and exclude NeuralNetFastAI.
FLAML: Fast and Economical
FLAML (Fast and Lightweight AutoML) from Microsoft targets minimum compute budget at good quality. Cost-frugal search — tries cheap configs first, gradually more expensive.
Suits: limited compute budget, time_budget < 60 sec tasks, CI/CD integration where AutoML runs per data update.
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=120, metric="roc_auc")
FLAML also fine-tunes LLM via flaml.autogen — auto-prompt and parameter selection for GPT/Claude.
Vertex AI AutoML: Managed Service
Google Vertex AI AutoML right when:
- No ML infrastructure
- Need Google Cloud integration (BigQuery, Cloud Storage, Dataflow)
- Task — Computer Vision or NLP, not just tabular
- Need managed inference endpoint without DevOps
Cost: $1.375/hour node for tabular. 100k rows, 50 features — usually 2–4h training. Inference: $0.05–0.10 per 1k predictions. For high-load self-hosted AutoGluon cheaper.
Limitations: less architecture control, no custom loss, limited export (TF SavedModel or TFLite, no ONNX). But managed feature store, auto drift monitoring and MLOps out-of-box.
No-Code Platforms: H2O.ai, DataRobot
For business analysts without code — H2O.ai AutoML (open source) and DataRobot (enterprise). Both provide GUI, automatic feature importance, model explanations.
H2O AutoML open source deploys locally, supports Stacked Ensemble, REST API or R/Python client. DataRobot — expensive enterprise ($50k+/year) but deep corporate process integration and compliance.
When AutoML Can't Replace ML Engineer
AutoML automates algorithm and hyperparameters. Doesn't solve:
- Feature engineering. Creating "time since last purchase" or "debit/credit ratio" — expert work. AutoGluon does basic, domain-specific — no.
- Custom preprocessing. Medical imaging, log parsing, audio feature extraction.
- Deployment constraints. AutoML chooses best-quality model, not one fitting 4MB mobile app.
- Distribution shift resistance. Optimizes test metrics. How performs in 6 months with changed data — separate question.
Workflow
For AutoML projects start quick benchmark: AutoGluon medium_quality in 30min gives honest baseline. If sufficient — deploy with monitoring. If not — AutoML results show promising algorithms, guide manual optimization with right starting point.
Timelines: MVP with AutoGluon — 1–2 weeks (including EDA and deploy). Production with monitoring and auto-retraining — 1–3 months.







