Facial Emotion Recognition System Development

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Facial Emotion Recognition System Development
Medium
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Development of Facial Emotion Recognition Systems

Emotion recognition from facial expression is a task of classifying face expression into basic emotional states. Ekman's classic model identifies 7 universal emotions: happiness, sadness, anger, fear, surprise, disgust, neutral. Applications: engagement analysis in online learning, customer satisfaction monitoring in call centers, UX research, driver state monitoring.

Model Architecture

Pipeline: face detection → alignment → emotion classification.

import torch
import torch.nn as nn
import timm
import cv2
import numpy as np
from insightface.app import FaceAnalysis

class EmotionRecognizer:
    def __init__(self, model_path: str):
        # Face detection and alignment
        self.detector = FaceAnalysis(allowed_modules=['detection'])
        self.detector.prepare(ctx_id=0, det_size=(640, 640))

        # Emotion classifier
        backbone = timm.create_model('efficientnet_b0', pretrained=False)
        backbone.classifier = nn.Sequential(
            nn.Dropout(0.3),
            nn.Linear(backbone.num_features, 7)
        )
        backbone.load_state_dict(torch.load(model_path))
        backbone.eval()
        self.model = backbone

        self.emotions = ['angry', 'disgust', 'fear', 'happy',
                         'neutral', 'sad', 'surprise']
        self.transform = get_inference_transform()

    @torch.no_grad()
    def predict(self, image: np.ndarray) -> list[dict]:
        faces = self.detector.get(image)
        results = []

        for face in faces:
            x1, y1, x2, y2 = face.bbox.astype(int)
            face_crop = image[y1:y2, x1:x2]
            face_crop = cv2.resize(face_crop, (48, 48))

            tensor = self.transform(face_crop).unsqueeze(0)
            logits = self.model(tensor)
            probs = torch.softmax(logits, dim=1).squeeze()

            emotion_scores = {
                self.emotions[i]: float(probs[i])
                for i in range(7)
            }
            dominant = max(emotion_scores, key=emotion_scores.get)

            results.append({
                'bbox': [x1, y1, x2, y2],
                'emotion': dominant,
                'confidence': emotion_scores[dominant],
                'all_scores': emotion_scores
            })

        return results

Datasets and Model Quality

Dataset Size Conditions Classes
FER-2013 35k photos Wild 7
AffectNet 1M photos Wild 8 (+ contempt)
RAF-DB 30k photos Real-world 7 + compound
CK+ 593 videos Laboratory 7
SFEW 1766 frames Movies 7

Accuracy on FER-2013:

  • EfficientNet-B0 fine-tuned: 73.1%
  • Vision Transformer (ViT-B/16): 74.8%
  • EfficientFace: 73.3%

Main difficulty: labels in public datasets are subjective, people disagree in 30–40% of cases. 75% accuracy is the practical limit for FER-2013 due to human disagreement.

Temporal Analytics on Video

Frame-by-frame classification is unstable — emotion "flickers" between frames. Solutions:

  • Temporal smoothing: moving average over 10–30 frames
  • RNN/LSTM on top of frame-level classifier: accounts for temporal dynamics
  • Interval aggregation: average emotion per N-second interval for analytics
from collections import deque

class TemporalEmotionTracker:
    def __init__(self, window_size: int = 30):
        self.window = deque(maxlen=window_size)

    def update(self, emotion_scores: dict) -> dict:
        self.window.append(emotion_scores)
        # Average over window
        averaged = {}
        for emotion in emotion_scores:
            averaged[emotion] = sum(
                frame[emotion] for frame in self.window
            ) / len(self.window)
        return averaged

Limitations and Ethical Aspects

Important to understand technology limitations:

  • Cultural differences in emotion expression (facial cues vary across cultures)
  • Neutral face ≠ neutral emotional state
  • Acted expressions differ from genuine ones

Technology should not be used for covert employee monitoring without consent. Production systems always require legal consent.

Task Timeline
SDK for mobile/web application 2–3 weeks
Video engagement analytics 3–5 weeks
Custom model on corporate dataset 5–8 weeks