What are graph neural networks?

Graph Neural Networks (GNNs) are a class of deep learning models for data in graph format. Unlike CNNs or RNNs, they directly account for connections between objects (edges). The core operation is message passing: each node aggregates neighbor features. GNNs are used in social networks, molecular biology, financial transactions, and recommendations. Our experience shows GNNs improve accuracy by 10–20% over baseline models.

What tasks do GNNs solve?

GNNs solve node classification (e.g., spam accounts), link prediction (friend recommendations), graph classification (molecular properties), and anomaly detection (fraudulent transactions). They are also used for graph generation and modeling physical systems. For example, in fraud detection, GNNs caught 30% more fraud cases than rule-based systems.

How to choose a GNN architecture?

Architecture choice depends on task and data. GCN works for homogeneous graphs, GraphSAGE for inductive learning on large graphs, GAT for heterogeneous graphs with attention, GIN for isomorphism tasks. We help select the architecture based on your data benchmarks. Typically, GraphSAGE is 2x faster than GAT on graphs with 1M+ nodes.

How long does GNN development take?

Timelines depend on task complexity and data volume. A typical project ranges from 2 weeks (prototype) to 2–3 months (production system with monitoring and MLOps). Budget starts at $15,000 for a proof-of-concept. Final timelines are determined after graph structure analysis.

How to evaluate GNN quality?

Quality is measured by metrics: accuracy, F1-score, AUC for classification; NDCG/Recall for recommendations; RMSE/MAE for regression. It is also important to measure p99 latency and memory usage, especially in production. We provide a model card with a full report. In our projects, typical AUC gains are 0.10–0.15 over non-graph models.

What are graph neural networks?

Graph Neural Networks (GNNs) are a class of deep learning models for data in graph format. Unlike CNNs or RNNs, they directly account for connections between objects (edges). The core operation is message passing: each node aggregates neighbor features. GNNs are used in social networks, molecular biology, financial transactions, and recommendations. Our experience shows GNNs improve accuracy by 10–20% over baseline models.

What tasks do GNNs solve?

GNNs solve node classification (e.g., spam accounts), link prediction (friend recommendations), graph classification (molecular properties), and anomaly detection (fraudulent transactions). They are also used for graph generation and modeling physical systems. For example, in fraud detection, GNNs caught 30% more fraud cases than rule-based systems.

How to choose a GNN architecture?

Architecture choice depends on task and data. GCN works for homogeneous graphs, GraphSAGE for inductive learning on large graphs, GAT for heterogeneous graphs with attention, GIN for isomorphism tasks. We help select the architecture based on your data benchmarks. Typically, GraphSAGE is 2x faster than GAT on graphs with 1M+ nodes.

How long does GNN development take?

Timelines depend on task complexity and data volume. A typical project ranges from 2 weeks (prototype) to 2–3 months (production system with monitoring and MLOps). Budget starts at $15,000 for a proof-of-concept. Final timelines are determined after graph structure analysis.

How to evaluate GNN quality?

Quality is measured by metrics: accuracy, F1-score, AUC for classification; NDCG/Recall for recommendations; RMSE/MAE for regression. It is also important to measure p99 latency and memory usage, especially in production. We provide a model card with a full report. In our projects, typical AUC gains are 0.10–0.15 over non-graph models.

Graph Neural Network (GNN) AI System Development

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

Graph Neural Network (GNN) AI System Development

Complex

from 1 week to 3 months

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1354
Development of a web application for FEEDME
1248
Website development for BELFINGROUP
951
Development of an online store for the company FURNORO
1186
B2B Advance company logo design
643
Development of a web application for Enviok
925

Show more works

We develop AI systems based on graph neural networks (GNNs) for tasks where relationships between objects matter. When tabular data loses context, graphs preserve it. For example, in fraud detection, GNNs analyze transaction chains rather than individual operations. Our experience includes projects for fintech, e-commerce, and bioinformatics—over 30 implementations in 5+ years. Order turnkey GNN system development: we select the architecture for your data and ensure production readiness. Contact us for a consultation and graph structure assessment.

Why GNNs outperform classical ML models on graph data?

Traditional models (GBDT, linear regression) work with feature vectors, ignoring topology. GNNs operate not only on node features but also on structure—which nodes are connected, connection strength, edge types. Through message passing, after K iterations, each node "sees" its neighbors within K hops. This yields an AUC increase of 5–15% in node classification and link prediction tasks compared to MLP or CatBoost.

Practical example: in fraud detection, a gradient model looks at features of a single transaction—amount, time, geolocation. GNN adds context: how this account is connected to others, whether any neighbors were previously blocked, how dense the suspicious transaction network is around it. This network context reveals coordinated fraud schemes invisible to point analysis. In our projects, switching from CatBoost to GraphSAGE on financial data improved recall at 5% FPR by 12–18 percentage points. Our clients saved an average of 20% on fraud losses after deploying GNN.

Theoretical basis and key architectures

The core idea of GNN is message passing: each node aggregates information from its neighbors. After K iterations, the node "sees" its K-hop neighborhood.

Aggregation formula (GraphSAGE):

h_v^(k) = σ(W · CONCAT(h_v^(k-1), AGG({h_u^(k-1), u ∈ N(v)})))

Key architectures:

Architecture	Aggregation	Application	Features
GCN (Kipf & Welling, 2017)	Spectral conv	Node classification	Transductive
GraphSAGE (Hamilton et al., 2017)	Mean/LSTM/Max	Large graphs	Inductive
GAT (Veličković et al., 2018)	Attention	Heterogeneous graphs	Weighted edges
GIN (Xu et al., 2019)	Sum (most powerful)	Graph isomorphism	Maximum expressivity
RGCN (Schlichtkrull et al., 2018)	Relation-specific	Knowledge graphs	Different edge types

Implementing GCN with PyTorch Geometric

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv, SAGEConv, GATConv, global_mean_pool
from torch_geometric.data import Data, DataLoader
import numpy as np
import pandas as pd

class GraphConvNet(nn.Module):
    """
    GCN for classification/regression on a graph.
    Suitable for: fraud detection, recommendations, molecules.
    """

    def __init__(self, node_features: int,
                  hidden_channels: int = 64,
                  output_dim: int = 1,
                  num_layers: int = 3,
                  dropout: float = 0.3):
        super().__init__()

        self.convs = nn.ModuleList()
        self.bns = nn.ModuleList()

        # Input layer
        self.convs.append(GCNConv(node_features, hidden_channels))
        self.bns.append(nn.BatchNorm1d(hidden_channels))

        # Hidden layers
        for _ in range(num_layers - 2):
            self.convs.append(GCNConv(hidden_channels, hidden_channels))
            self.bns.append(nn.BatchNorm1d(hidden_channels))

        # Output layer
        self.convs.append(GCNConv(hidden_channels, hidden_channels))
        self.bns.append(nn.BatchNorm1d(hidden_channels))

        self.dropout = dropout
        self.classifier = nn.Linear(hidden_channels, output_dim)

    def forward(self, x: torch.Tensor,
                edge_index: torch.Tensor,
                batch: torch.Tensor = None) -> torch.Tensor:
        """
        x: (N, node_features) — node feature matrix
        edge_index: (2, E) — edge list in COO format
        batch: (N,) — batch assignment for batched graphs
        """
        for conv, bn in zip(self.convs, self.bns):
            x = conv(x, edge_index)
            x = bn(x)
            x = F.relu(x)
            x = F.dropout(x, p=self.dropout, training=self.training)

        # Graph-level readout (for graph-level tasks)
        if batch is not None:
            x = global_mean_pool(x, batch)

        return self.classifier(x)


class GraphSAGEEncoder(nn.Module):
    """
    GraphSAGE for inductive learning (works on new nodes without retraining).
    Used for large graphs: social networks, transactions.
    """

    def __init__(self, in_channels: int, hidden_channels: int, out_channels: int,
                  num_layers: int = 3, aggr: str = 'mean'):
        super().__init__()
        self.convs = nn.ModuleList()

        self.convs.append(SAGEConv(in_channels, hidden_channels, aggr=aggr))
        for _ in range(num_layers - 2):
            self.convs.append(SAGEConv(hidden_channels, hidden_channels, aggr=aggr))
        self.convs.append(SAGEConv(hidden_channels, out_channels, aggr=aggr))

    def forward(self, x, edge_index):
        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index)
            if i < len(self.convs) - 1:
                x = F.relu(x)
                x = F.dropout(x, p=0.2, training=self.training)
        return x

    def encode(self, x, edge_index):
        """L2-normalized embeddings for downstream tasks"""
        out = self.forward(x, edge_index)
        return F.normalize(out, p=2, dim=-1)


class GATNetwork(nn.Module):
    """
    Graph Attention Network: weighted neighbor aggregation.
    Attention weights show the "importance" of each neighbor.
    """

    def __init__(self, in_channels: int, hidden_channels: int,
                  out_channels: int, num_heads: int = 8):
        super().__init__()

        self.conv1 = GATConv(in_channels, hidden_channels,
                              heads=num_heads, dropout=0.6)
        self.conv2 = GATConv(hidden_channels * num_heads, out_channels,
                              heads=1, concat=False, dropout=0.6)

    def forward(self, x, edge_index):
        x = F.dropout(x, p=0.6, training=self.training)
        x = F.elu(self.conv1(x, edge_index))
        x = F.dropout(x, p=0.6, training=self.training)
        return self.conv2(x, edge_index)

Building a graph from tabular data

Steps to convert tabular data to a graph:

Identify entities (nodes) and relationships (edges).
Map entity IDs to node indices.
Build edge index from interaction data.
Create node feature matrix by aligning dimensions.
Add edge attributes if available.

class GraphBuilder:
    """Convert tabular data to graph for GNN"""

    def build_user_item_graph(self, interactions: pd.DataFrame,
                               user_features: pd.DataFrame,
                               item_features: pd.DataFrame) -> Data:
        """
        Bipartite user-item graph for recommendations.
        interactions: user_id, item_id, rating/count
        """
        # Map IDs to node indices
        user_ids = interactions['user_id'].unique()
        item_ids = interactions['item_id'].unique()
        n_users = len(user_ids)

        user_idx = {uid: i for i, uid in enumerate(user_ids)}
        item_idx = {iid: i + n_users for i, iid in enumerate(item_ids)}

        # Edges: user → item
        src = interactions['user_id'].map(user_idx).values
        dst = interactions['item_id'].map(item_idx).values

        # Bidirectional graph (typical for GNN)
        edge_index = torch.tensor(
            np.vstack([
                np.concatenate([src, dst]),
                np.concatenate([dst, src])
            ]),
            dtype=torch.long
        )

        # Node feature matrix
        # Users: embedding + behavioral features
        user_feat_matrix = user_features.set_index('user_id').reindex(user_ids).fillna(0).values
        # Items: embedding + characteristics
        item_feat_matrix = item_features.set_index('item_id').reindex(item_ids).fillna(0).values

        # Align dimensions
        max_dim = max(user_feat_matrix.shape[1], item_feat_matrix.shape[1])
        user_feat_padded = np.pad(user_feat_matrix, ((0, 0), (0, max_dim - user_feat_matrix.shape[1])))
        item_feat_padded = np.pad(item_feat_matrix, ((0, 0), (0, max_dim - item_feat_matrix.shape[1])))

        x = torch.tensor(
            np.vstack([user_feat_padded, item_feat_padded]),
            dtype=torch.float
        )

        # Edge weights (e.g., rating)
        edge_attr = torch.tensor(
            np.concatenate([
                interactions['rating'].values,
                interactions['rating'].values  # Mirror edges
            ]),
            dtype=torch.float
        ).unsqueeze(1)

        return Data(
            x=x,
            edge_index=edge_index,
            edge_attr=edge_attr,
            n_users=n_users
        )

    def build_transaction_graph(self, transactions: pd.DataFrame) -> Data:
        """
        Transaction graph for fraud detection.
        Nodes: accounts, cards, IP addresses, merchants.
        Edges: transactions between them.
        """
        # Unique entities
        accounts = transactions['account_id'].unique()
        merchants = transactions['merchant_id'].unique()
        n_accounts = len(accounts)

        acc_idx = {a: i for i, a in enumerate(accounts)}
        mer_idx = {m: i + n_accounts for i, m in enumerate(merchants)}

        src = transactions['account_id'].map(acc_idx).values
        dst = transactions['merchant_id'].map(mer_idx).values

        edge_index = torch.tensor([
            np.concatenate([src, dst]),
            np.concatenate([dst, src])
        ], dtype=torch.long)

        # Transaction features as edge attributes
        edge_attr = torch.tensor(
            transactions[['amount', 'hour_of_day', 'is_international']].values,
            dtype=torch.float
        )
        edge_attr = torch.cat([edge_attr, edge_attr], dim=0)  # Duplicate for mirror edges

        # Labels: fraud = 1
        if 'is_fraud' in transactions.columns:
            y = torch.tensor(transactions['is_fraud'].values, dtype=torch.long)
        else:
            y = None

        return Data(
            x=torch.zeros(n_accounts + len(merchants), 16),  # Placeholder features
            edge_index=edge_index,
            edge_attr=edge_attr,
            y=y
        )

Training and evaluating a GNN

class GNNTrainer:
    """GNN training pipeline"""

    def __init__(self, model: nn.Module, device: str = 'cuda'):
        self.model = model.to(device)
        self.device = device
        self.optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

    def train_epoch(self, data: Data, mask: torch.Tensor = None) -> float:
        """One epoch for node classification"""
        self.model.train()
        self.optimizer.zero_grad()

        data = data.to(self.device)
        out = self.model(data.x, data.edge_index)

        if mask is not None:
            loss = F.cross_entropy(out[mask], data.y[mask])
        else:
            loss = F.cross_entropy(out, data.y)

        loss.backward()
        self.optimizer.step()
        return float(loss)

    def evaluate(self, data: Data, mask: torch.Tensor) -> dict:
        """Evaluate prediction quality"""
        self.model.eval()
        with torch.no_grad():
            out = self.model(data.x.to(self.device), data.edge_index.to(self.device))
            pred = out[mask].argmax(dim=-1).cpu()
            true = data.y[mask].cpu()

        from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
        probs = torch.softmax(out[mask], dim=-1)[:, 1].cpu().numpy()

        return {
            'accuracy': accuracy_score(true, pred),
            'f1_macro': f1_score(true, pred, average='macro'),
            'auc': roc_auc_score(true, probs) if len(np.unique(true)) > 1 else 0.5
        }

    def train(self, data: Data,
               n_epochs: int = 200,
               train_mask: torch.Tensor = None,
               val_mask: torch.Tensor = None) -> dict:
        """Full training loop with early stopping"""
        best_val_auc = 0
        patience, patience_counter = 20, 0
        history = {'train_loss': [], 'val_auc': []}

        for epoch in range(n_epochs):
            loss = self.train_epoch(data, train_mask)
            history['train_loss'].append(loss)

            if val_mask is not None and epoch % 5 == 0:
                metrics = self.evaluate(data, val_mask)
                history['val_auc'].append(metrics['auc'])

                if metrics['auc'] > best_val_auc:
                    best_val_auc = metrics['auc']
                    patience_counter = 0
                    torch.save(self.model.state_dict(), 'best_gnn_model.pt')
                else:
                    patience_counter += 1
                    if patience_counter >= patience:
                        print(f"Early stopping at epoch {epoch}")
                        break

        return {'best_val_auc': best_val_auc, 'history': history}

Scaling to large graphs

Standard GNN does not scale to graphs with millions of nodes—the full adjacency matrix does not fit in memory. Solutions:

GraphSAGE with mini-batch: sampling K neighbors instead of all. PyG supports this via NeighborLoader with num_neighbors=[25, 10]. This reduces memory usage by 80%.
Cluster-GCN: splitting the graph into clusters, training within clusters.
GraphSAINT: random subgraph sampling with importance sampling.

from torch_geometric.loader import NeighborLoader

def create_scalable_dataloader(data: Data, batch_size: int = 1024) -> NeighborLoader:
    """Mini-batch loader for large graphs"""
    return NeighborLoader(
        data,
        num_neighbors=[25, 10, 5],  # Neighbors for 3 hops
        batch_size=batch_size,
        input_nodes=data.train_mask,
        shuffle=True,
        num_workers=4
    )

Application domains and benchmarks

Task	Dataset	Architecture	AUC/Accuracy
Fraud detection	Financial transactions	GraphSAGE	AUC 0.93-0.97
Recommendations	Amazon	LightGCN	NDCG@20 0.045
Social spam	Twitter	GAT	F1 0.89
Molecular properties	ZINC	GIN	MAE 0.163
Road traffic	METR-LA	Diffusion GCN	RMSE 2.37

GNNs outperform traditional methods only when the graph structure carries information. If relationships between objects are random, a regular GBDT or MLP will show comparable results with less complexity.

Before starting a project, we conduct an audit: does your domain have meaningful graph structure, is there enough data for training, is the target AUC realistic. If GNN does not provide an advantage, we honestly say so and propose a simpler model. This approach saves client budget and increases long-term trust in the solution. We have saved clients up to 40% on development costs by avoiding unnecessary GNN implementation.

What's included in the work

Analysis of graph structure and architecture selection (GCN, GraphSAGE, GAT, GIN).
Building a data pipeline: converting tables to graphs, feature normalization.
Training and hyperparameter tuning with validation (early stopping, cross-validation).
Deploying the model via Triton Inference Server or ONNX Runtime.
Documentation: model card, API specification, and user guide.
Post-deployment support: monitoring for data drift, retraining.

We guarantee result quality—all solutions are tested on your data before finalization. We can assess your project in 2–3 business days: send us a description of the task and approximate graph size (number of nodes, edges, task). Contact us to discuss your challenge.

Recommender System Development: From Collaborative Filtering to Real-Time Serving

On one e-commerce project with a catalog of 300k SKUs, we boosted CTR from 1.8% to 4.4% — a 2.4x increase. The first leap came from switching from 'popular in the last 7 days' to collaborative filtering; the second from adding content features and re-ranking. The difference between showing popular items and showing personalized recommendations is measurable and significant. Below is the engineering experience that made this possible, along with architectures that actually work in production.

Collaborative Filtering: Matrix Factorization and Neural Approaches

Matrix Factorization is the classic approach for implicit feedback (clicks, views, purchases without explicit ratings). ALS (Alternating Least Squares) from the Implicit library handles user×item matrices with hundreds of millions of non-zero values in minutes on GPU. Latent factors 64–256, regularization λ=0.01–0.1 are starting parameters. Cold start problem: no history for new users or items — pure CF fails; content features or hybrid approach needed.

Neural Collaborative Filtering (NCF) replaces the dot product with a neural network. In practice, the gain over a well-tuned ALS is modest, but NCF is easier to extend with additional features (age, category, time of day). Sequence-aware models (SASRec, BERT4Rec) account for the order of interactions — state-of-the-art for session-based recommendations.

How to Choose Recommender System Architecture?

The answer depends on data, load, and cold start requirements. Below are three main approaches with selection criteria.

Criterion	Collaborative Filtering	Content-Based Filtering	Hybrid (two-stage)
Data required	Interaction history	Item/user features	Both
Cold start	Poor	Works for new items	Partially solved
Diversity (long-tail)	Low, popularity bias	High	Medium–High
Serving latency	<5 ms (precomputed)	<10 ms (FAISS)	20–50 ms
Implementation complexity	Low	Medium	High

Hybrid architecture outperforms pure CF by 20–40% in long-tail coverage — validated on catalogs from 100k SKU.

Content-Based Filtering: When Interaction History is Scarce

Content-based recommends based on item characteristics rather than other users' behavior — solves cold start for new items. Text embeddings via sentence-transformers (multilingual-e5-base, BGE-M3) → similarity search using FAISS IndexFlatIP — query in <5 ms for 100k items. Item2Vec (Word2Vec on view sequences) yields interpretable 'similar items' in a couple hours of training.

Structured features (category, brand, price) are fed through embedding layers or gradient boosting — CatBoost handles categories without manual encoding.

Why Hybrid Models Work Better?

Production systems are almost always two-level. Stage 1 (Retrieval) — fast selection of 100–500 candidates from 300k items using ALS or Two-Tower model with vector search (FAISS, Qdrant). Stage 2 (Ranking) — heavy ranker on LightGBM or neural network with cross-features, time, device, and session context. LightFM is a good starting point for medium scale without heavy infrastructure. Our practice shows: moving from single-stage to two-stage yields a 15–25% accuracy improvement with only 20–30 ms additional latency.

Real-Time Serving: Architecture Under Load

Latency SLA — 50–100 ms at thousands of requests per second. Base recommendations precomputed (batch job hourly) → Redis by user_id → <5 ms. Real-time re-ranking via Kafka for events (clicks, cart adds) → update of context features. Feature serving — Redis with TTL (views in 24 hours, last clicked item). At 10k req/s, we deploy Redis Cluster with replication.

A/B testing is the only reliable way to measure improvements. Offline metrics do not always correlate with online. Kohavi et al., 'Online Controlled Experiments at Large Scale' (KDD 2013) — a must-read for the team. Test on 5–10% of traffic, monitor CTR, conversion, revenue per session. One of our client systems after hybridization increased revenue by 18% over a month of A/B.

Recommender System Development Timeline

The stages and typical time frames are in the table below. Costs are calculated individually based on catalog scale and latency requirements.

Stage	Duration	Result
Data audit and baseline	1–2 weeks	Report with matrix density, cold start zones, 'popular' metrics
Prototype (offline validation)	2–3 weeks	Working model with offline metrics (Recall@k, NDCG)
Production system (two-stage, A/B)	1.5–2.5 months	Low-latency service with monitoring and A/B infrastructure
Team training and documentation	1–2 weeks	Model card, deployment runbook, fine-tuning session

What's Included in Turnkey Development

Data audit — user×item matrix density (typically <0.1%), activity distribution, temporal patterns, cold start statistics.
Baseline — 'popular' as a simple threshold that is often hard to beat.
Iterative improvement — ALS → content features → two-stage → sequence-aware. Each step with A/B.
Serving infrastructure — batch precomputation, Redis, real-time re-ranking, Grafana monitoring.
Documentation — model card with metrics, deployment instructions, feature descriptions.
Team training — session on interpreting results and model fine-tuning.
Support — 1 month post-launch (incident fixes, pipeline tuning).

We are a team with 7+ years of experience in recommender systems, having delivered over 30 projects for e-commerce and media. We guarantee transparent A/B testing and documented metric improvements.

Want to assess the growth potential of your catalog? Contact us for a free data audit. Order recommender system development — first prototype within two weeks.

Example ALS config for implicit feedback

from implicit.als import AlternatingLeastSquares

model = AlternatingLeastSquares(
    factors=64,
    regularization=0.05,
    iterations=15,
    use_gpu=True
)
model.fit(user_item_matrix)

More about the mathematics of recommender systems — in specialized literature.