AI Molecule Property Prediction System (ADMET)

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
AI Molecule Property Prediction System (ADMET)
Complex
from 2 weeks to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Developing AI System for Molecule Property Prediction (ADMET)

ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) — set of pharmacokinetic properties determining drug fate in body. ~50% clinical trial failures — due to ADMET problems that could have been predicted earlier.

Critical ADMET Properties

Absorption

  • Aqueous solubility: poor solubility → inconsistent bioavailability
  • Lipophilicity (logP/logD): determines membrane penetration, solubility
  • Caco-2 / MDCK permeability: intestinal absorption
  • P-glycoprotein (P-gp) efflux: active cell export, reduces bioavailability
  • Oral bioavailability (F%): what fraction of dose reaches systemic circulation

Distribution

  • Volume of distribution (Vd): how distributes across tissues
  • Blood-brain barrier permeability (BBB): needed for CNS drugs, undesirable for peripheral
  • Plasma protein binding (PPB): albumin binding, only free drug active

Metabolism

  • CYP450 inhibition (CYP3A4, CYP2D6, CYP2C9, CYP2C19, CYP1A2): slows metabolism of other drugs → interactions
  • CYP450 substrate: which isoforms metabolize compound
  • Half-life (T½): how fast cleared from body
  • Hepatotoxicity (DILI): liver damage

Excretion

  • Renal clearance: rate of kidney elimination

Toxicity

  • hERG inhibition: blocking cardiac K⁺ channel → QT prolongation → potentially fatal arrhythmia. Major reason for drug withdrawal
  • Ames test: mutagenicity / genotoxicity
  • DILI (Drug-Induced Liver Injury): hepatotoxicity
  • Skin sensitization: contact dermatitis
  • Reproductive toxicity: teratogenicity

Prediction Models

Molecular Fingerprints + ML

ECFP4/6 (circular fingerprints 1024–2048 bits) + XGBoost/Random Forest. Fast, interpretable, good on small datasets.

Graph Neural Networks

Molecule as graph → GNN learns structural patterns. MPNN, AttentiveFP, D-MPNN (chemprop). On most TDC benchmarks GNN exceeds fingerprint+ML.

Multitask Learning

One model predicts 20+ ADMET properties simultaneously. Advantage: shared representations improve prediction for properties with small dataset through information from related tasks.

from chemprop import args, data, featurizers, models, train

# Chemprop — state-of-the-art for molecular ADMET
arguments = [
    '--data_path', 'admet_train.csv',
    '--dataset_type', 'regression',
    '--target_columns', 'solubility logP hERG_inhibition caco2_permeability',
    '--smiles_columns', 'smiles',
    '--epochs', '50',
    '--batch_size', '64',
    '--ffn_num_layers', '3',
    '--dropout', '0.1',
    '--save_dir', 'admet_model',
]
args.parse_train_args(arguments)
train.cross_validate(...)

Uncertainty Quantification

ADMET prediction: know not just value but model confidence. For molecules outside applicability domain — warning about unreliable prediction.

Methods: Monte Carlo Dropout, Deep Ensembles, Conformal Prediction. Conformal Prediction gives statistically rigorous prediction intervals.

Datasets

Task Dataset Size
Solubility ESOL, AqSolDB 1k–10k
logP ChEMBL 100k+
Caco-2 Biopharmaceutics DB ~1k
hERG BindingDB, ChEMBL 10k+
DILI DILIrank ~1k
CYP inhibition ChEMBL 10k+
Ames TDC AMES dataset ~7k

Data Problem: many biological datasets small and noisy. Transfer learning (pretraining on large chemical corpus → fine-tuning on specific task) helps with small datasets.

Applicability Domain

Model reliable only for molecules similar to training data. AD evaluation:

  • Tanimoto similarity to nearest neighbors in training set
  • Leverage hat matrix (Williams plot)
  • k-NN distance in embedding space

When exiting AD → explicit warning "low confidence prediction".

Integration: REST API, Jupyter-friendly Python API, KNIME nodes for chemist workflows. Visualization: 2D property map with color-coding drug-likeness violations (Lipinski Rule of 5, Veber rules).