Face Age and Gender Estimation System Development
Age and gender estimation from facial images is a computer vision task with applications in retail analytics (visitor demographic profile), adaptive content systems, medical research, age verification (age gate). Both tasks are often implemented by a single multitask model.
Multitask Architecture
import torch
import torch.nn as nn
import timm
class AgeGenderModel(nn.Module):
"""Single model for simultaneous age and gender prediction"""
def __init__(self, pretrained_backbone: str = 'efficientnet_b2'):
super().__init__()
backbone = timm.create_model(pretrained_backbone, pretrained=True, num_classes=0)
self.backbone = backbone
feat_dim = backbone.num_features # 1408 for B2
# Shared representation
self.shared = nn.Sequential(
nn.Linear(feat_dim, 512),
nn.GELU(),
nn.Dropout(0.3)
)
# Separate heads for each task
self.age_head = nn.Linear(512, 1) # regression (MAE)
self.gender_head = nn.Linear(512, 2) # classification (CE)
def forward(self, x):
features = self.backbone(x)
shared = self.shared(features)
age = self.age_head(shared).squeeze()
gender_logits = self.gender_head(shared)
return age, gender_logits
Age as regression vs classification: regression gives continuous result (32.4 years), classification by ranges (30–35 years) is less accurate but more convenient for some applications. Distributional regression (DLDL — Distribution Learning) is the best approach: age is modeled as probability distribution rather than point value.
Datasets
| Dataset | Photo Count | Age Range | Labels |
|---|---|---|---|
| IMDB-Wiki | 524k | 0–100 | Age, gender |
| UTKFace | 23k | 0–116 | Age, gender, ethnicity |
| APPA-REAL | 7.6k | 7–77 | Apparent and actual age |
| FairFace | 108k | 0–70+ | Gender, race, 9 age ranges |
| AgeDB | 16k | 0–101 | Age, gender |
Preprocessing and Augmentation
import albumentations as A
from albumentations.pytorch import ToTensorV2
train_transform = A.Compose([
A.Resize(224, 224),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3, p=0.5),
A.GaussianBlur(blur_limit=(3, 7), p=0.2),
A.CoarseDropout(max_holes=4, max_height=30, max_width=30, p=0.3),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
ToTensorV2()
])
Loss Functions for Multitask Learning
def multitask_loss(age_pred, age_true, gender_logits, gender_true,
age_weight=1.0, gender_weight=0.5):
# MAE for age + CE for gender
age_loss = nn.L1Loss()(age_pred, age_true.float())
gender_loss = nn.CrossEntropyLoss()(gender_logits, gender_true)
# Uncertainty weighting (Kendall et al.)
return age_weight * age_loss + gender_weight * gender_loss
Performance Metrics
| Model | MAE (age) | Accuracy (gender) | Speed |
|---|---|---|---|
| EfficientNet-B2 (IMDB-Wiki FT) | 4.8 years | 96.3% | 8 ms |
| MobileNetV3 (UTKFace FT) | 5.2 years | 95.8% | 3 ms |
| ViT-B/16 (IMDB-Wiki FT) | 4.3 years | 97.1% | 12 ms |
MAE 4–6 years is typical for "in-the-wild" (selfies, varying quality photos). In controlled conditions (frontal portrait, good lighting) — 3–4 years.
Ethics and Bias
Models trained on IMDB-Wiki have underrepresentation of elderly people and some ethnic groups. FairFace dataset is specially balanced to reduce bias. When using for decision-making (age gate) — mandatory fairness testing across demographic groups.
| Task | Timeline |
|---|---|
| Pre-trained model integration (InsightFace) | 1 week |
| Custom model on corporate data | 3–5 weeks |
| System with analytics and reports | 4–7 weeks |







