H2O.ai AutoML Integration for Automated Model Training

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
H2O.ai AutoML Integration for Automated Model Training
Medium
from 1 business day to 3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

H2O.ai AutoML Integration for Automatic Model Training

H2O.ai AutoML is one of the most mature industrial AutoML platforms with built-in staking, leaderboard, and distributed training support on a Spark/Hadoop cluster.

H2O AutoML Key Features

What H2O AutoML does:

  • Automatically runs algorithms: GBM, XGBoost, Random Forest, Deep Learning, GLM, Stacked Ensembles
  • Builds a Stacked Ensemble from the best models
  • Leaderboard with sorting by selected metric
  • Cross-validation is built-in by default

Basic integration

Python client:

import h2o
from h2o.automl import H2OAutoML
import pandas as pd

def run_h2o_automl(train_df: pd.DataFrame,
                    target_col: str,
                    max_models: int = 20,
                    max_runtime_secs: int = 600) -> dict:
    """
    H2O AutoML полный pipeline.
    """
    # Инициализация (локально или на кластере)
    h2o.init(nthreads=-1, max_mem_size='8G')

    # Конвертация в H2OFrame
    h2o_train = h2o.H2OFrame(train_df)

    # Типы колонок
    for col in train_df.select_dtypes(include=['object']).columns:
        h2o_train[col] = h2o_train[col].asfactor()

    if train_df[target_col].nunique() <= 20:
        h2o_train[target_col] = h2o_train[target_col].asfactor()

    feature_cols = [c for c in train_df.columns if c != target_col]

    # Запуск AutoML
    aml = H2OAutoML(
        max_models=max_models,
        max_runtime_secs=max_runtime_secs,
        seed=42,
        sort_metric='AUC',
        balance_classes=True,
        stopping_metric='AUC',
        stopping_rounds=5
    )
    aml.train(x=feature_cols, y=target_col, training_frame=h2o_train)

    # Leaderboard
    lb = aml.leaderboard.as_data_frame()

    # Лучшая модель
    best_model = aml.leader

    # MOJO для production деплоя
    mojo_path = best_model.save_mojo(path='/tmp/h2o_mojo/')

    return {
        'leaderboard': lb,
        'best_model_id': best_model.model_id,
        'best_auc': lb.iloc[0]['auc'],
        'mojo_path': mojo_path
    }

Production deployment of H2O MOJO

Java-based inference without H2O server:

import subprocess
import json

def deploy_h2o_mojo_rest_api(mojo_path: str, port: int = 8080):
    """
    H2O MOJO: компилируется в Java-артефакт, работает без Python и H2O.
    Подходит для встраивания в Java/Scala микросервисы.
    """
    # Запуск H2O Scoring Server (REST API для MOJO)
    cmd = [
        'java', '-cp', 'h2o-genmodel.jar:scoring-server.jar',
        'hex.genmodel.tools.PredictCsv',
        '--mojo', mojo_path,
        '--input', '/dev/stdin'
    ]
    # В production: используется h2o-mojo-scoring-server Docker образ

    return {'endpoint': f'http://localhost:{port}/predict', 'format': 'CSV/JSON'}

def predict_with_mojo_api(endpoint: str, features: dict) -> dict:
    import requests
    response = requests.post(f'{endpoint}', json={'features': features})
    return response.json()

Integration with Spark (H2O Sparkling Water)

Distributed training on Spark cluster:

# pysparkling — H2O на Spark
from pysparkling import H2OContext
from pysparkling.ml import H2OAutoML as SparkH2OAutoML
from pyspark.sql import SparkSession

def h2o_sparkling_automl(spark_df, target_col: str):
    """
    H2O Sparkling Water: AutoML на Spark DataFrame.
    Подходит для датасетов > 10 млн строк.
    """
    spark = SparkSession.builder.getOrCreate()
    hc = H2OContext.getOrCreate()

    automl = SparkH2OAutoML(
        maxModels=30,
        labelCol=target_col,
        maxRuntimeSecs=3600
    )
    automl.fit(spark_df)

    leaderboard = automl.getAllModelsParams()
    return automl, leaderboard

Timeframe: H2O AutoML baseline + leaderboard + MOJO export — 3-5 days. Sparkling Water cluster launch, custom metrics, continuous retraining pipeline — 2-3 weeks.