Google Cloud AutoML integration for automated model training
Google Cloud AutoML is a managed service for training custom models without deep ML expertise. Ideal for teams without ML specialists or for tasks where speed is more important than maximum accuracy.
Google Cloud AutoML Products
AutoML service line:
automl_products = {
'AutoML Tables': 'структурированные данные, classification/regression',
'AutoML Vision': 'классификация изображений, object detection',
'AutoML Natural Language': 'классификация текста, entity extraction, sentiment',
'AutoML Translation': 'кастомные NMT модели для специализированных доменов',
'Vertex AI AutoML': 'унифицированный интерфейс для всех типов данных'
}
Vertex AI AutoML Tables
Learning on structured data:
from google.cloud import aiplatform
import pandas as pd
def train_vertex_automl_classification(
project_id: str,
dataset_gcs_uri: str,
target_column: str,
model_display_name: str,
training_budget_hours: float = 1.0
) -> dict:
"""
Vertex AI AutoML Tables: budget_milli_node_hours = часы × 1000.
Минимум 1 час, рекомендуется 8-24 часа для лучшего качества.
"""
aiplatform.init(project=project_id, location='us-central1')
# Создаём датасет
dataset = aiplatform.TabularDataset.create(
display_name=f'{model_display_name}_dataset',
gcs_source=dataset_gcs_uri
)
# Запускаем обучение
job = aiplatform.AutoMLTabularTrainingJob(
display_name=model_display_name,
optimization_prediction_type='classification',
optimization_objective='maximize-au-roc',
column_transformations=[
{'auto': {'column_name': col}}
for col in get_feature_columns(dataset_gcs_uri, target_column)
]
)
model = job.run(
dataset=dataset,
target_column=target_column,
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=int(training_budget_hours * 1000),
model_display_name=model_display_name,
disable_early_stopping=False
)
return {
'model_resource_name': model.resource_name,
'model_display_name': model_display_name
}
Endpoint deployment and inference
Online prediction endpoint:
def deploy_and_predict(model_resource_name: str,
endpoint_display_name: str,
instances: list) -> dict:
"""
Деплой модели на endpoint для online prediction.
"""
model = aiplatform.Model(model_resource_name)
endpoint = model.deploy(
deployed_model_display_name=endpoint_display_name,
machine_type='n1-standard-4',
min_replica_count=1,
max_replica_count=3,
traffic_percentage=100
)
# Инференс
predictions = endpoint.predict(instances=instances)
return {
'predictions': predictions.predictions,
'deployed_model_id': predictions.deployed_model_id
}
def batch_prediction(model_resource_name: str,
input_gcs_uri: str,
output_gcs_dir: str) -> dict:
"""
Batch prediction: для больших объёмов данных (без endpoint).
"""
model = aiplatform.Model(model_resource_name)
batch_job = model.batch_predict(
job_display_name='batch_prediction_job',
gcs_source=input_gcs_uri,
gcs_destination_prefix=output_gcs_dir,
machine_type='n1-standard-4',
instances_format='csv',
predictions_format='jsonl'
)
batch_job.wait()
return {'output_location': output_gcs_dir}
Google Cloud AutoML Limitations:
- Limited customization: you can't set your own loss function
- Cost: $19.32/hour of training (Tables), higher than self-hosted
- No full control over the pipeline
- Model export: only in TF SavedModel/TFLite format
When AutoML Tables is not suitable: non-standard features (graph data, time series > simple patterns), tasks with strict latency requirements, need for SHAP or detailed explanations.
Timeframe: Dataset upload + AutoML training + endpoint deployment — 1-2 days. Batch prediction pipeline, drift monitoring, retraining trigger — 1-2 weeks.







