On-Chain AI Inference System Development
On-chain AI inference is not a marketing term — it's a concrete engineering problem with several partial solutions, each with its own trade-offs. Clients usually come with a request: "we want the results of our model to be verifiable and impossible to forge". This is a reasonable requirement. The question is how to achieve it without burning $50k in gas per inference.
Direct execution of a neural network in an EVM contract is not an option. GPT-2 tiny (117M parameters) in naive implementation would require ~10^9 floating-point multiplication operations. EVM has no float type, each arithmetic operation costs gas, and Ethereum's 30M gas block limit is a physical ceiling. Real approaches take a different path.
Three Architectural Approaches
1. ZK-proof of inference (zkML)
The model executes off-chain, on-chain verification happens via zero-knowledge proof of correct execution. This is the most promising and technically complex approach.
Primary frameworks:
EZKL — the most mature zkML toolkit today. Accepts ONNX models, generates Halo2 circuits, verifies on-chain through a Solidity verifier.
# Export model to ONNX
python -c "
import torch, ezkl
model = MyModel()
x = torch.randn(1, 784) # MNIST example
torch.onnx.export(model, x, 'model.onnx', opset_version=11)
"
# Compile to circuit
ezkl gen-settings -M model.onnx -O settings.json
ezkl calibrate-settings -M model.onnx -D input.json -O settings.json
ezkl compile-circuit -M model.onnx -S settings.json --compiled-circuit model.compiled
# Generate SRS and keys
ezkl get-srs --settings-path settings.json
ezkl setup -M model.compiled --vk-path vk.key --pk-path pk.key
# Proof generation (this is slow)
ezkl gen-witness -M model.compiled -D input.json -O witness.json
ezkl prove --witness witness.json --compiled-circuit model.compiled --pk-path pk.key --proof-path proof.json
# Deploy verifier on-chain
ezkl create-evm-verifier --vk-path vk.key --sol-code-path verifier.sol
EZKL limitations: proof generation time for a simple CNN (ResNet-18) is 5–20 minutes on CPU, 30–120 seconds on GPU. SRS (structured reference string) size is gigabytes for large models. This limits applicability to small models (< 1M parameters) or scenarios where proof generation latency is non-critical.
Risc Zero + Boundless — alternative approach via zkVM. The model compiles to RISC-V, executes in Risc Zero zkVM, proof verifies on-chain. Less efficient for ML-specific operations than specialized zkML circuits, but allows arbitrary Rust/C++ code.
// Guest program in Risc Zero
use risc0_zkvm::guest::env;
fn main() {
let input: Vec<f32> = env::read();
let result = run_inference(&input); // your model
env::commit(&result);
}
Models suitable for zkML now:
- Logistic regression, SVM — no constraints
- Small MLP (< 100k parameters) — proof in seconds
- Image classification CNN (< 1M parameters) — proof in minutes
- LLM, diffusion models — not feasible in 2024–2025 without specialized hardware
2. Optimistic approach (fraud proofs)
Inference happens off-chain, result is published on-chain. Anyone can repeat the computation and challenge (contest) an incorrect result.
This approach is used by Giza Tech and Modulus Labs. Economics: challenge period = N blocks, challenger must provide fraud proof (partial execution trace). Works with sufficient number of verifiers — this is economic game theory, not mathematical guarantee.
// Simplified optimistic AI oracle scheme
contract OptimisticAIOracle {
struct InferenceResult {
bytes32 inputHash;
int256[] outputs;
address submitter;
uint256 submittedAt;
bool challenged;
bool resolved;
}
uint256 public constant CHALLENGE_PERIOD = 7200; // ~24h in blocks
uint256 public constant SUBMITTER_BOND = 1 ether;
mapping(uint256 => InferenceResult) public results;
function submitResult(
uint256 requestId,
bytes32 inputHash,
int256[] calldata outputs
) external payable {
require(msg.value >= SUBMITTER_BOND, "Insufficient bond");
results[requestId] = InferenceResult({
inputHash: inputHash,
outputs: outputs,
submitter: msg.sender,
submittedAt: block.number,
challenged: false,
resolved: false
});
}
function challenge(uint256 requestId, bytes calldata fraudProof) external {
InferenceResult storage result = results[requestId];
require(block.number < result.submittedAt + CHALLENGE_PERIOD, "Challenge period expired");
// Verify fraud proof — partial execution
bool isFraud = verifyFraudProof(result.inputHash, result.outputs, fraudProof);
if (isFraud) {
// Slash submitter bond, reward challenger
result.challenged = true;
}
}
}
Problem: verifyFraudProof itself is expensive on-chain computation. For ML models, partial execution verification is non-trivial.
3. Decentralized inference networks (TEE + economic security)
Galadriel — EVM-compatible L1 specifically designed for AI. Smart contracts can make system calls to LLM providers directly from Solidity:
import "./interfaces/IOracle.sol";
contract AIChatbot {
IOracle public oracle;
struct Message {
string role;
string content;
}
mapping(uint256 => Message[]) public conversations;
function startChat(string memory userMessage) external returns (uint256) {
uint256 runId = IOracle(oracle).createLlmCall(msg.sender);
conversations[runId].push(Message("user", userMessage));
return runId;
}
// Callback from oracle
function onOracleLlmResponse(
uint256 runId,
IOracle.LlmResponse memory response,
string memory errorMessage
) public {
require(msg.sender == address(oracle), "Only oracle");
conversations[runId].push(Message("assistant", response.content));
}
}
Ritual — decentralized inference network. Nodes use TEE (Trusted Execution Environment) to verify inference without revealing model weights. Eigen Layer restaking for economic security.
Opaque Labs — specializes in confidential inference via SGX/TDX.
Approach Selection
| Approach | Verification | Latency | Cost | Model Size |
|---|---|---|---|---|
| EZKL (zkML) | Mathematical | Minutes | High | < 1M param |
| Optimistic | Economic | Seconds | Medium | Any |
| Galadriel | Network trust | < 30s | Low | LLM-class |
| Ritual TEE | Hardware | Seconds | Medium | Any |
| Risc Zero | Mathematical | Minutes | High | < 10M param |
Production System Practical Architecture
Most real projects use a hybrid approach: zkML for small, critical models (e.g., fraud detection on transactions) and optimistic or TEE-based for LLM functions.
[User/Contract]
↓ inference request + deposit
[Inference Request Queue (on-chain)]
↓ event
[Off-chain inference node cluster]
↓ runs model, generates proof
[Proof submission + result]
↓
[On-chain verifier contract]
↓ verify ZK proof
[Callback to requesting contract]
On-chain Interface
Standard pattern — Oracle-style with callback:
interface IInferenceOracle {
struct InferenceRequest {
bytes model_id; // hash or CID of model
bytes input; // ABI-encoded or raw bytes
address callback; // contract for callback
bytes4 callbackSig; // callback function signature
uint256 maxFee; // max fee for inference
}
function requestInference(InferenceRequest calldata req)
external payable returns (uint256 requestId);
function fulfillInference(
uint256 requestId,
bytes calldata result,
bytes calldata proof
) external;
}
Working with Fixed-Point
EVM doesn't support float. All ML computation on-chain or in proof inputs requires quantization:
# Quantization before inference
import numpy as np
def quantize_input(x: np.ndarray, scale: int = 2**16) -> np.ndarray:
"""Convert float to Q16.16 fixed-point"""
return (x * scale).astype(np.int32)
def dequantize_output(x: np.ndarray, scale: int = 2**16) -> np.ndarray:
return x.astype(np.float32) / scale
For EZKL, quantization happens automatically during calibrate-settings — important to verify accuracy after quantization, losses > 1% are typically unacceptable for financial models.
Model Storage and Verifiability
Key question: how does the on-chain contract know inference was executed on the correct model, not a substitute? Solutions:
Model commitment — hash of model weights stored on-chain. Inference node proves (via ZK or TEE signature) that it used the model with this hash.
IPFS/Filecoin for weights — model CID is fixed on-chain. Anyone can download and verify.
contract ModelRegistry {
struct Model {
bytes32 weightsHash; // keccak256 of weights in ONNX
string ipfsCID; // for download
address owner;
bool isPublic;
}
mapping(bytes32 => Model) public models;
function registerModel(
bytes32 modelId,
bytes32 weightsHash,
string calldata ipfsCID
) external {
models[modelId] = Model(weightsHash, ipfsCID, msg.sender, true);
}
}
Where This is Being Used Now
DeFi risk scoring — on-chain credit scoring based on on-chain activity. Small models (logistic regression, XGBoost) fit well in zkML.
Generative NFTs — NFT attributes determined by ML model, result is verifiable. Models in 1–10M parameter range, tolerable proof time.
Autonomous agents — AI agent manages on-chain positions. Here zkML is still impractical for LLM-agents, TEE or optimistic approach is used.
On-chain fraud detection — every swap passes through fraud scoring, result affects fee tier or availability. Logistic regression — ideal candidate for zkML.
Development Stages
Phase 1 — Model selection & quantization (2–3 weeks). Select model architecture to suit zkML constraints, train, quantize, verify accuracy. Often the most painful stage — model must be simplified.
Phase 2 — Circuit compilation (1–2 weeks). Compile ONNX → Halo2 circuit via EZKL, calibrate settings, measure proof time and proof size.
Phase 3 — On-chain verifier (1–2 weeks). Deploy Solidity verifier, develop oracle interface, test end-to-end flow.
Phase 4 — Infrastructure (2–3 weeks). Inference nodes (GPU), proof generation pipeline, monitoring, fallback mechanisms.
Phase 5 — Integration (1–2 weeks). Integration into requesting contract, testnet testing, gas optimization for verification.
Total: 7–12 weeks to production. Main unknown factor: how well the specific model can be quantized without quality loss.







