On-Chain AI Inference System Development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
On-Chain AI Inference System Development
Complex
from 2 weeks to 3 months
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

On-Chain AI Inference System Development

On-chain AI inference is not a marketing term — it's a concrete engineering problem with several partial solutions, each with its own trade-offs. Clients usually come with a request: "we want the results of our model to be verifiable and impossible to forge". This is a reasonable requirement. The question is how to achieve it without burning $50k in gas per inference.

Direct execution of a neural network in an EVM contract is not an option. GPT-2 tiny (117M parameters) in naive implementation would require ~10^9 floating-point multiplication operations. EVM has no float type, each arithmetic operation costs gas, and Ethereum's 30M gas block limit is a physical ceiling. Real approaches take a different path.

Three Architectural Approaches

1. ZK-proof of inference (zkML)

The model executes off-chain, on-chain verification happens via zero-knowledge proof of correct execution. This is the most promising and technically complex approach.

Primary frameworks:

EZKL — the most mature zkML toolkit today. Accepts ONNX models, generates Halo2 circuits, verifies on-chain through a Solidity verifier.

# Export model to ONNX
python -c "
import torch, ezkl
model = MyModel()
x = torch.randn(1, 784)  # MNIST example
torch.onnx.export(model, x, 'model.onnx', opset_version=11)
"

# Compile to circuit
ezkl gen-settings -M model.onnx -O settings.json
ezkl calibrate-settings -M model.onnx -D input.json -O settings.json
ezkl compile-circuit -M model.onnx -S settings.json --compiled-circuit model.compiled

# Generate SRS and keys
ezkl get-srs --settings-path settings.json
ezkl setup -M model.compiled --vk-path vk.key --pk-path pk.key

# Proof generation (this is slow)
ezkl gen-witness -M model.compiled -D input.json -O witness.json
ezkl prove --witness witness.json --compiled-circuit model.compiled --pk-path pk.key --proof-path proof.json

# Deploy verifier on-chain
ezkl create-evm-verifier --vk-path vk.key --sol-code-path verifier.sol

EZKL limitations: proof generation time for a simple CNN (ResNet-18) is 5–20 minutes on CPU, 30–120 seconds on GPU. SRS (structured reference string) size is gigabytes for large models. This limits applicability to small models (< 1M parameters) or scenarios where proof generation latency is non-critical.

Risc Zero + Boundless — alternative approach via zkVM. The model compiles to RISC-V, executes in Risc Zero zkVM, proof verifies on-chain. Less efficient for ML-specific operations than specialized zkML circuits, but allows arbitrary Rust/C++ code.

// Guest program in Risc Zero
use risc0_zkvm::guest::env;

fn main() {
    let input: Vec<f32> = env::read();
    let result = run_inference(&input);  // your model
    env::commit(&result);
}

Models suitable for zkML now:

  • Logistic regression, SVM — no constraints
  • Small MLP (< 100k parameters) — proof in seconds
  • Image classification CNN (< 1M parameters) — proof in minutes
  • LLM, diffusion models — not feasible in 2024–2025 without specialized hardware

2. Optimistic approach (fraud proofs)

Inference happens off-chain, result is published on-chain. Anyone can repeat the computation and challenge (contest) an incorrect result.

This approach is used by Giza Tech and Modulus Labs. Economics: challenge period = N blocks, challenger must provide fraud proof (partial execution trace). Works with sufficient number of verifiers — this is economic game theory, not mathematical guarantee.

// Simplified optimistic AI oracle scheme
contract OptimisticAIOracle {
    struct InferenceResult {
        bytes32 inputHash;
        int256[] outputs;
        address submitter;
        uint256 submittedAt;
        bool challenged;
        bool resolved;
    }
    
    uint256 public constant CHALLENGE_PERIOD = 7200; // ~24h in blocks
    uint256 public constant SUBMITTER_BOND = 1 ether;
    
    mapping(uint256 => InferenceResult) public results;
    
    function submitResult(
        uint256 requestId,
        bytes32 inputHash,
        int256[] calldata outputs
    ) external payable {
        require(msg.value >= SUBMITTER_BOND, "Insufficient bond");
        results[requestId] = InferenceResult({
            inputHash: inputHash,
            outputs: outputs,
            submitter: msg.sender,
            submittedAt: block.number,
            challenged: false,
            resolved: false
        });
    }
    
    function challenge(uint256 requestId, bytes calldata fraudProof) external {
        InferenceResult storage result = results[requestId];
        require(block.number < result.submittedAt + CHALLENGE_PERIOD, "Challenge period expired");
        
        // Verify fraud proof — partial execution
        bool isFraud = verifyFraudProof(result.inputHash, result.outputs, fraudProof);
        if (isFraud) {
            // Slash submitter bond, reward challenger
            result.challenged = true;
        }
    }
}

Problem: verifyFraudProof itself is expensive on-chain computation. For ML models, partial execution verification is non-trivial.

3. Decentralized inference networks (TEE + economic security)

Galadriel — EVM-compatible L1 specifically designed for AI. Smart contracts can make system calls to LLM providers directly from Solidity:

import "./interfaces/IOracle.sol";

contract AIChatbot {
    IOracle public oracle;
    
    struct Message {
        string role;
        string content;
    }
    
    mapping(uint256 => Message[]) public conversations;
    
    function startChat(string memory userMessage) external returns (uint256) {
        uint256 runId = IOracle(oracle).createLlmCall(msg.sender);
        conversations[runId].push(Message("user", userMessage));
        return runId;
    }
    
    // Callback from oracle
    function onOracleLlmResponse(
        uint256 runId,
        IOracle.LlmResponse memory response,
        string memory errorMessage
    ) public {
        require(msg.sender == address(oracle), "Only oracle");
        conversations[runId].push(Message("assistant", response.content));
    }
}

Ritual — decentralized inference network. Nodes use TEE (Trusted Execution Environment) to verify inference without revealing model weights. Eigen Layer restaking for economic security.

Opaque Labs — specializes in confidential inference via SGX/TDX.

Approach Selection

Approach Verification Latency Cost Model Size
EZKL (zkML) Mathematical Minutes High < 1M param
Optimistic Economic Seconds Medium Any
Galadriel Network trust < 30s Low LLM-class
Ritual TEE Hardware Seconds Medium Any
Risc Zero Mathematical Minutes High < 10M param

Production System Practical Architecture

Most real projects use a hybrid approach: zkML for small, critical models (e.g., fraud detection on transactions) and optimistic or TEE-based for LLM functions.

[User/Contract] 
    ↓ inference request + deposit
[Inference Request Queue (on-chain)]
    ↓ event
[Off-chain inference node cluster]
    ↓ runs model, generates proof
[Proof submission + result]
    ↓
[On-chain verifier contract]
    ↓ verify ZK proof
[Callback to requesting contract]

On-chain Interface

Standard pattern — Oracle-style with callback:

interface IInferenceOracle {
    struct InferenceRequest {
        bytes model_id;      // hash or CID of model
        bytes input;         // ABI-encoded or raw bytes
        address callback;    // contract for callback
        bytes4 callbackSig;  // callback function signature
        uint256 maxFee;      // max fee for inference
    }
    
    function requestInference(InferenceRequest calldata req) 
        external payable returns (uint256 requestId);
    
    function fulfillInference(
        uint256 requestId,
        bytes calldata result,
        bytes calldata proof
    ) external;
}

Working with Fixed-Point

EVM doesn't support float. All ML computation on-chain or in proof inputs requires quantization:

# Quantization before inference
import numpy as np

def quantize_input(x: np.ndarray, scale: int = 2**16) -> np.ndarray:
    """Convert float to Q16.16 fixed-point"""
    return (x * scale).astype(np.int32)

def dequantize_output(x: np.ndarray, scale: int = 2**16) -> np.ndarray:
    return x.astype(np.float32) / scale

For EZKL, quantization happens automatically during calibrate-settings — important to verify accuracy after quantization, losses > 1% are typically unacceptable for financial models.

Model Storage and Verifiability

Key question: how does the on-chain contract know inference was executed on the correct model, not a substitute? Solutions:

Model commitment — hash of model weights stored on-chain. Inference node proves (via ZK or TEE signature) that it used the model with this hash.

IPFS/Filecoin for weights — model CID is fixed on-chain. Anyone can download and verify.

contract ModelRegistry {
    struct Model {
        bytes32 weightsHash;    // keccak256 of weights in ONNX
        string ipfsCID;         // for download
        address owner;
        bool isPublic;
    }
    
    mapping(bytes32 => Model) public models;
    
    function registerModel(
        bytes32 modelId,
        bytes32 weightsHash,
        string calldata ipfsCID
    ) external {
        models[modelId] = Model(weightsHash, ipfsCID, msg.sender, true);
    }
}

Where This is Being Used Now

DeFi risk scoring — on-chain credit scoring based on on-chain activity. Small models (logistic regression, XGBoost) fit well in zkML.

Generative NFTs — NFT attributes determined by ML model, result is verifiable. Models in 1–10M parameter range, tolerable proof time.

Autonomous agents — AI agent manages on-chain positions. Here zkML is still impractical for LLM-agents, TEE or optimistic approach is used.

On-chain fraud detection — every swap passes through fraud scoring, result affects fee tier or availability. Logistic regression — ideal candidate for zkML.

Development Stages

Phase 1 — Model selection & quantization (2–3 weeks). Select model architecture to suit zkML constraints, train, quantize, verify accuracy. Often the most painful stage — model must be simplified.

Phase 2 — Circuit compilation (1–2 weeks). Compile ONNX → Halo2 circuit via EZKL, calibrate settings, measure proof time and proof size.

Phase 3 — On-chain verifier (1–2 weeks). Deploy Solidity verifier, develop oracle interface, test end-to-end flow.

Phase 4 — Infrastructure (2–3 weeks). Inference nodes (GPU), proof generation pipeline, monitoring, fallback mechanisms.

Phase 5 — Integration (1–2 weeks). Integration into requesting contract, testnet testing, gas optimization for verification.

Total: 7–12 weeks to production. Main unknown factor: how well the specific model can be quantized without quality loss.