What GPUs does io.net support?

io.net aggregates GPUs from RTX 3090 to H100. For LLM inference, 24GB VRAM is sufficient; for training, A100/H100 with NVLink is needed. We help select the right configuration for your task.

How does io.net handle worker failures?

The decentralized nature implies failures. We use checkpointing and async retries with exponential backoff (30s, 60s, 120s). Monitoring via Solana WebSocket enables quick response.

What are the advantages over AWS or GCP?

Cost savings of 60-80% on inference, no vendor lock-in, permissionless access. For burst tasks, no need to reserve resources. The downside is variable latency, addressed by an asynchronous architecture.

What tasks does io.net solve?

LLM inference, AI content generation for NFTs, ZK-proof generation, federated learning with geo-restrictions. Particularly advantageous for Web3 projects with periodic GPU loads.

How does payment with $IO token work?

Payments in $IO on Solana. Automated top-up via Jupiter, cluster spending limits. Enterprise stablecoin options available. We implement balance monitoring and alerts.

What GPUs does io.net support?

io.net aggregates GPUs from RTX 3090 to H100. For LLM inference, 24GB VRAM is sufficient; for training, A100/H100 with NVLink is needed. We help select the right configuration for your task.

How does io.net handle worker failures?

The decentralized nature implies failures. We use checkpointing and async retries with exponential backoff (30s, 60s, 120s). Monitoring via Solana WebSocket enables quick response.

What are the advantages over AWS or GCP?

Cost savings of 60-80% on inference, no vendor lock-in, permissionless access. For burst tasks, no need to reserve resources. The downside is variable latency, addressed by an asynchronous architecture.

What tasks does io.net solve?

LLM inference, AI content generation for NFTs, ZK-proof generation, federated learning with geo-restrictions. Particularly advantageous for Web3 projects with periodic GPU loads.

How does payment with $IO token work?

Payments in $IO on Solana. Automated top-up via Jupiter, cluster spending limits. Enterprise stablecoin options available. We implement balance monitoring and alerts.

io.net GPU Network Integration: DePIN Computing Turnkey

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1305 services

io.net GPU Network Integration: DePIN Computing Turnkey

Medium

~3-5 days

Frequently Asked Questions

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

B2B ADVANCE company website development
1358
Development of a web application for FEEDME
1250
Website development for BELFINGROUP
956
Development of an online store for the company FURNORO
1188
B2B Advance company logo design
646
Development of a web application for Enviok
929

Show more works

io.net (GPU Network) Integration

The standard problem with blockchain ML infrastructure is that centralized GPU providers (AWS, GCP) offer predictable latency and SLA but completely break the principle of permissionless access to computing. We face this every day when clients want to reduce costs while maintaining decentralization. io.net solves this through a DePIN model—a decentralized network of ~200K GPUs aggregated from data centers, mining farms, and gaming PCs. The integration challenge is not just calling a REST API but building a reliable pipeline that accounts for the specifics of decentralized computing: variable latency, worker failures, stochastic task distribution. We design such a system turnkey, ensuring stability and savings.

Integration Architecture with io.net

io.net provides two main interaction methods: IO Cloud API for managed clusters and IOG (IO Compute) for direct access to individual GPU workers. For production systems, the former with clusters is preferred.

Cluster Lifecycle

A typical flow looks like this:

POST /clusters          → create cluster with GPU requirements
GET  /clusters/{id}     → polling status (PROVISIONING → READY)
POST /clusters/{id}/jobs → launch tasks
GET  /jobs/{job_id}     → monitor execution
DELETE /clusters/{id}   → release resources

A critical aspect is the provisioning strategy: io.net does not guarantee resource allocation time—depending on network load and GPU requirements, it can take from 2 minutes to 30+. Any integration must be built on an asynchronous model with webhook notifications or polling with exponential backoff, not synchronous calls with timeouts.

Cluster Specification

When creating a cluster, specify requirements:

{
  "cluster_name": "inference-cluster-prod",
  "num_gpus": 8,
  "gpu_model": "NVIDIA_3090",
  "min_vcpus": 16,
  "min_ram": 64,
  "locations": ["US", "EU"],
  "compliance": ["GDPR"],
  "duration_hours": 4
}

The gpu_model field is one of the most important. For LLM inference (LLaMA 3, Mistral), RTX 3090/4090 with 24GB VRAM is enough. For training or fine-tuning, A100/H100 with NVLink is needed. Mismatching GPU model to task is the main source of inefficient spending in io.net.

Why io.net is Better Than Cloud Providers?

Cost savings on inference can reach 80% compared to AWS SageMaker with comparable throughput. For periodic tasks (NFT generation, ZK-proofs), you don't need to reserve instances—pay only for actual usage. However, decentralization requires compensating for reliability: we incorporate checkpointing and retry mechanisms so that a worker loss does not reset progress.

How to Ensure Reliable Computing on a Decentralized GPU Network?

A decentralized network is inherently less predictable than managed cloud. In practice this means:

A worker may disconnect mid-task (node lost connectivity, operator removed machine)
GPUs may have varying performance—one cluster slot might be faster than another
Network latency between workers in a cluster is not guaranteed—critical for tasks with allreduce (distributed training)

Retry and Checkpointing Pattern

For long tasks, a checkpoint mechanism is mandatory. If a 6-hour training task crashes at hour 5, without checkpoints it restarts from scratch:

class IONetJobManager:
    def __init__(self, api_key: str, checkpoint_storage: str):
        self.client = IONetClient(api_key)
        self.storage = CheckpointStorage(checkpoint_storage)  # S3/IPFS
    
    def submit_with_retry(self, job_config: dict, max_retries: int = 3):
        last_checkpoint = self.storage.get_latest_checkpoint(job_config["job_id"])
        if last_checkpoint:
            job_config["resume_from"] = last_checkpoint
        
        for attempt in range(max_retries):
            try:
                job = self.client.submit_job(job_config)
                return self._monitor_with_checkpointing(job)
            except WorkerFailureError as e:
                if attempt == max_retries - 1:
                    raise
                wait_time = 2 ** attempt * 30  # 30s, 60s, 120s
                time.sleep(wait_time)

Monitoring via On-Chain Events

io.net uses Solana for settlements and verification—this enables monitoring on top of on-chain events, not just REST API. Worker accounts update on status changes, and WebSocket subscription via @solana/web3.js (connection.onAccountChange) provides lower notification latency than API polling.

Payment via $IO Token

Payments in io.net are made with the $IO token (SPL-token on Solana). For automated systems, this necessitates on-chain balance management:

Aspect	Solution
Balance top-up	Programmatic swap via Jupiter Aggregator or direct purchase
Spend control	Set `max_spend` limit on cluster creation
Refunds	Automatic on `DELETE /clusters/{id}`
Currency risk	Hedging via perpetuals on Drift Protocol

For enterprise clients, io.net offers stablecoin settlements via a separate enterprise plan—this eliminates $IO volatility concerns.

Typical Use Cases

Inference-as-a-Service: Deploy a model on an io.net cluster and expose your own API on top. Savings compared to AWS SageMaker: 60–80% with comparable throughput.

Federated Learning: io.net supports isolated clusters with geographic compliance constraints—enabling federated learning pipelines where data does not leave the jurisdiction.

Burst Computing for Web3 Projects: On-chain games, AI content generation for NFTs, ZK-proof verification—tasks that need GPUs only periodically. io.net lets you pay only for used time without reserving capacity.

What's Included in the Integration

Stage	Description
Analysis	Audit of current infrastructure, GPU configuration selection
Design	Development of asynchronous architecture with checkpointing and retry
Implementation	io.net API integration, cluster setup, $IO balance monitoring
Testing	Load testing, failure scenarios, SLA verification
Documentation	API description, operation instructions, runbook
Support	3 months post-launch support, training for your team

Methodology based on years of experience with decentralized computing.

Timeline and How to Start

We assess the project in 2–3 business days. Basic scenario integration takes 2 to 4 weeks. Complex projects with custom pipelines—up to 8 weeks.

We work with clients who have 5+ years of blockchain development experience and have completed over 50 projects. Contact us for an estimate—we'll send architecture examples and answer your questions.

Blockchain Infrastructure Deployment: Nodes, RPC, Indexing

Subgraph fell at 3:47 AM. By morning users saw outdated balances, transactions "hung" in the UI, support received 47 tickets in an hour. Cause: the handler in the subgraph failed on a transaction with a non-standard event log — and the entire index stopped. We have encountered such situations dozens of times. Our experience shows: blockchain infrastructure does not forgive gaps in observability. Guaranteeing uptime without multi-layered monitoring and fault-tolerant architecture is impossible. Over 8 years working with Ethereum, Polygon, and Solana, we have developed an approach that allows predictable deployment of infrastructure of any scale — from a single node to a multichain grid with dozens of subgraphs.

RPC Layer Architecture

Every dApp interaction with the blockchain goes through RPC — the JSON-RPC API provided by a node. Three options:

Managed providers — Alchemy, QuickNode, Infura, Ankr. Minimal operational costs, SLA, built-in monitoring. Limits: rate limits (Alchemy Free: 300 RU/sec), vendor lock, potential downtime during provider incidents. For most projects — the right choice at the start.

Self-owned nodes — full control, no rate limits, no third-party dependence. Cost: archive Ethereum node requires 2.5–3TB SSD, a strong server, and DevOps support. Sync from scratch on Ethereum via Geth/Nethermind — 3–7 days. Justified under high load or latency requirements.

Hybrid — self-owned node as primary, managed provider as fallback. Standard for protocols with high TVL. Proper load balancing can reduce costs by 20–30% compared to pure managed setup. Under high monthly request volume, hybrid saves significantly.

Provider	Strength	Limitation
Alchemy	Supernode, Enhanced APIs, webhooks	Expensive on high-volume
QuickNode	Low latency, multi-chain	More expensive than Alchemy on basic plan
Infura	Historical reliability	Rate limits on free, one major incident halted half of DeFi
Ankr	Cheap, 40+ chains	Less stable

How to Set Up an RPC Layer Without a Single Point of Failure?

At least two providers, DNS round-robin with health check every 5 seconds, automatic fallback when latency >500 ms. In practice, this gives 99.99% availability during any provider failure. For protocols with high TVL, we recommend a custom HA-proxy (nginx or Envoy) in front of two managed providers.

Why Is a Hybrid RPC Scheme More Cost-Effective Than Pure Managed?

At high request volumes, managed providers can be very expensive; a hybrid using a self-owned node as primary and a managed fallback cuts costs significantly without losing SLA.

Ethereum Node Clients

Execution clients: Geth (most used), Nethermind (C#, fast sync), Besu (Java, enterprise), Erigon (fastest sync, efficient archive mode ~2TB instead of 3TB).

Consensus clients (post-Merge): Lighthouse (Rust), Prysm (Go), Teku (Java), Nimbus (Nim). Each node after The Merge requires a pair of execution + consensus clients.

For DevOps: eth-docker — Docker Compose configurations for all client combinations. Setting up monitoring via Grafana + Prometheus is mandatory; a standard dashboard is available in each client's repository.

The Graph: Event Indexing

The Graph Protocol — decentralized indexing. A subgraph describes which events from which contracts to index and how to transform them into a GraphQL schema.

Subgraph structure:

subgraph.yaml — manifest: contract addresses, startBlock, events to handle
schema.graphql — GraphQL schema of entities
src/mapping.ts — AssemblyScript event handlers

dataSources:
  - kind: ethereum
    name: UniswapV3Pool
    network: mainnet
    source:
      address: "0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640"
      abi: UniswapV3Pool
      startBlock: 12370624
    mapping:
      eventHandlers:
        - event: Swap(indexed address,indexed address,int256,int256,uint160,uint128,int24)
          handler: handleSwap

AssemblyScript handlers — not TypeScript. No nullable types, no closures, no many standard APIs. An error in the handler stops the subgraph indexing on that transaction. Important: add try-catch for operations that can fail (e.g., store.get() for an entity that may not exist).

How to Avoid Subgraph Indexing Stops?

Graph Node logs are monitored in real-time; on hasIndexingErrors = true an alert fires and an automatic node restart (via systemd or Kubernetes). Typical downtime on error — 150–300 seconds to recover. Additionally, for production we set up a watchdog that restarts Graph Node if subgraph lag exceeds 50 blocks.

Choosing Between Hosted Service and Decentralized Network

Graph Hosted Service (free, centralized) is deprecated in favor of Subgraph Studio + Graph Network. For production: deploy on Graph Network with GRT curation signal — the subgraph gets indexers proportional to curation.

Alternatives to The Graph: Ponder (TypeScript, self-hosted, easier to debug), Envio (ultra-fast indexer, supports EVM + non-EVM), Subsquid (TypeScript, own network), Moralis Streams (managed, webhook-based). Our experience shows: for high-load projects with unique logic, Ponder or Envio are more effective — they give full control over the process and do not require GRT tokenomics.

Webhooks and Real-Time Notifications

Alchemy Webhooks and QuickNode Streams allow receiving events in real-time via HTTP webhook or WebSocket. For monitoring addresses, new transactions, mints — this is faster than polling RPC.

Tenderly — platform for monitoring and alerts. You can set up an alert for a specific contract event, balance change, function call with certain parameters. Transaction simulation via Tenderly API is invaluable for debugging.

Monitoring and Observability

Minimum monitoring stack for a protocol:

On-chain: OpenZeppelin Defender Sentinel — watches contract events, triggers webhook or Autotask when conditions are met. Forta Network — community-maintained bots detect anomalies (large withdrawals, flash loans, governance attacks).

Infrastructure: Grafana + Prometheus for nodes, Datadog or Grafana Cloud for managed metrics. Alerts on: node is 10+ blocks behind, RPC latency >500ms, subgraph lag >100 blocks.

Uptime: Better Uptime or PagerDuty on RPC endpoint and subgraph health endpoint (The Graph provides _meta { hasIndexingErrors, block { number } }).

Why Is Monitoring Without Tenderly Insufficient?

Tenderly provides transaction simulation and detailed traces — critical for debugging subgraph and smart contract errors. Forta focuses on network anomalies, not your infrastructure. The combination of Tenderly plus a custom Grafana dashboard covers 90% of incident scenarios.

Multichain Infrastructure

A protocol on 5 chains = 5 separate RPC endpoints, 5 subgraphs, 5 monitoring configs. Manageable but requires deployment automation.

For subgraph multi-network deployment: graph deploy --network mainnet, graph deploy --network arbitrum-one etc. with a unified codebase and network-specific addresses in separate config files.

Chainlink CCIP and LayerZero for cross-chain messaging require monitoring of both chains and transactions on intermediate relayers. A reorg on the source chain after a confirmed mint on the target chain is a classic bridge problem. Solution: wait for finality (on Ethereum ~15 minutes after Merge for economic finality) before confirming on the target chain.

Infrastructure Setup Process

Audit current stack — determine chains, request volume, latency and availability requirements.
Architecture design — select providers, load balancing, redundancy.
Subgraph development — manifest → schema → handlers → testing on local Graph Node → deploy to testnet → mainnet.
Monitoring configuration — Tenderly alerts, Grafana dashboard, PagerDuty integration.
Documentation and runbook — what to do when: subgraph falls behind, RPC downtime, node desync.
Handover to operations — team training, access transfer, first month support.

What's Included

Deployment of managed or self-hosted Ethereum, Polygon, BNB Chain nodes
RPC layer setup with primary/fallback and load balancing
Subgraph development and deployment for your protocol
Monitoring connection (Tenderly, Grafana, alerts)
Runbook and operations documentation
Team training (up to 4 hours online)
30-day support after delivery

Timeline

Task	Duration
RPC and basic monitoring setup	1–2 weeks
Subgraph for one protocol	2–4 weeks
Self-hosted node with monitoring	2–3 weeks
Full infrastructure (multi-chain, monitoring, runbooks)	6–10 weeks

All projects are managed in a GitHub/GitLab repository with CI/CD; configuration code stays with you. Order infrastructure deployment — we'll show how to cut costs by 20–30% without losing reliability. Get a consultation — we'll demonstrate how we deployed infrastructure for a protocol with large TVL on Ethereum and Arbitrum. Contact us.