Which pricing model should I choose for NaaS?

The choice depends on client load. Pay-per-request suits irregular usage but leads to unpredictable bills. Time-based (subscription) provides predictability but overcharges low-load users. Hybrid combines a monthly plan with overage billing — optimal for most providers.

How do I prevent usage data loss?

Log each RPC request before responding, for example via async write to Apache Kafka with at-least-once delivery. Acceptable sampling loss is under 0.01%. Use idempotency key (request_id + node_id) for deduplication.

What billing issues occur in production?

Clock skew between nodes causes incorrect timestamps — configure NTP. Duplicate events on retry are resolved with idempotency key. Timezone bugs in billing cycles require explicit documentation: billing period in UTC. Also, billing runaway without maxMonthlySpend.

How do I implement crypto payments in NaaS?

Use prepaid balance in USDC/USDT. Users top up via smart contract, deductions via billingOracle (multisig or HSM). Critical: the oracle key must not be an EOA. Set up auto-refill with maxMonthlySpend to prevent runaway.

How long does billing system development take?

MVP with prepaid balance and basic rate limiting — 6 to 8 weeks. Full system with aggregation, rating engine, and multi-network support — 3–5 months for a team of 2–3 backend engineers. Typical development cost ranges from $50,000 to $150,000.

Which pricing model should I choose for NaaS?

The choice depends on client load. Pay-per-request suits irregular usage but leads to unpredictable bills. Time-based (subscription) provides predictability but overcharges low-load users. Hybrid combines a monthly plan with overage billing — optimal for most providers.

How do I prevent usage data loss?

Log each RPC request before responding, for example via async write to Apache Kafka with at-least-once delivery. Acceptable sampling loss is under 0.01%. Use idempotency key (request_id + node_id) for deduplication.

What billing issues occur in production?

Clock skew between nodes causes incorrect timestamps — configure NTP. Duplicate events on retry are resolved with idempotency key. Timezone bugs in billing cycles require explicit documentation: billing period in UTC. Also, billing runaway without maxMonthlySpend.

How do I implement crypto payments in NaaS?

Use prepaid balance in USDC/USDT. Users top up via smart contract, deductions via billingOracle (multisig or HSM). Critical: the oracle key must not be an EOA. Set up auto-refill with maxMonthlySpend to prevent runaway.

How long does billing system development take?

MVP with prepaid balance and basic rate limiting — 6 to 8 weeks. Full system with aggregation, rating engine, and multi-network support — 3–5 months for a team of 2–3 backend engineers. Typical development cost ranges from $50,000 to $150,000.

Comprehensive Guide to NaaS Billing System Development and Architecture

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1305 services

Comprehensive Guide to NaaS Billing System Development and Architecture

Medium

~1-2 weeks

Frequently Asked Questions

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

B2B ADVANCE company website development
1357
Development of a web application for FEEDME
1249
Website development for BELFINGROUP
954
Development of an online store for the company FURNORO
1187
B2B Advance company logo design
645
Development of a web application for Enviok
926

Show more works

Launching a node is easy. Correctly counting money is hard — especially when the unit of consumption is not a request but node uptime, network type, client version, and the tariff the user chose three weeks ago. NaaS billing fails in most young providers not because the task is technically difficult, but because the gap between "counting money roughly correct" and "counting money exactly and transparently" is several months of engineering work. We have been developing billing systems for NaaS for many years and know how to avoid typical mistakes. During this time, we implemented over 20 projects, from simple MVPs to full-fledged crypto-native platforms. In this article, we will break down the architecture of NaaS billing system development that you can be proud to take to production, the pricing models mature providers choose, and the pitfalls that break billing in production. You'll learn how to build a metric collection layer without data loss, set up a rating engine on PostgreSQL, and integrate crypto payments in USDC. We'll also discuss issues with clock skew and duplicate events and how to solve them.

Pricing Models for NaaS

Pay-per-request is a classic for RPC providers (Alchemy, Infura, QuickNode). We count JSON-RPC calls with weighted coefficients per method:

Method	Compute Units
`eth_blockNumber`	10
`eth_getBalance`	19
`eth_call`	26
`eth_getLogs`	75
`trace_transaction`	150
`debug_traceTransaction`	500

eth_getLogs with a wide block range is an attack on the node. Without weight coefficients, a user can make one request costing thousands of "ordinary" ones. Alchemy calls this Compute Units, QuickNode — Credits. Different names, same idea.

Time-based (subscription) — a dedicated node of fixed power is paid monthly. More understandable for the user, predictable revenue for the provider. Downside: user overpays at low load.

Hybrid — basic plan with monthly included volume, overage billing on top. Used by most mature providers. Over 90% of clients choose this model.

Model	Advantages	Disadvantages
Pay-per-request	Pay exactly for usage, easy to scale	Unpredictable bill, complexity of rate limiting
Time-based (subscription)	Predictable revenue, easy for user	Overpayment at low load
Hybrid	Balance of flexibility and predictability	Complexity of implementing overage billing

Billing System Architecture

Metric Collection Layer Organization

Critical path: each RPC request must be logged before the response is sent to the user — otherwise, if it crashes, usage data is lost. Acceptable percentage of losses (sampling loss) — less than 0.01%. If we lose more — backpressure or the node dies under load.

Architecture:

Client Request
     ↓
API Gateway (Nginx / Envoy / Kong)
     ↓ [access log + request metadata]
Billing Proxy (sidecar) — async write to queue
     ↓
RPC Node Cluster
     ↓
Response → Client

Billing proxy writes to Apache Kafka or NATS JetStream — both provide at-least-once delivery. Synchronous write to the database on each request kills latency (we add 100–500ms to every RPC call, which is unacceptable). Using a queue allows processing events 10 times faster compared to direct database writes.

// Async metric emission — не блокирует запрос
func (b *BillingMiddleware) RecordUsage(ctx context.Context, event UsageEvent) {
    select {
    case b.eventChan <- event:
        // успешно поставлено в буфер
    default:
        // буфер полон — метрика потеряна, логируем как sampling loss
        b.metrics.IncSamplingLoss()
    }
}

Aggregation and Rating Engine

Raw events from Kafka → rating pipeline → billable records in PostgreSQL.

Rating is the application of tariff rules to raw usage. For NaaS:

class RatingEngine:
    def rate_event(self, event: UsageEvent, plan: Plan) -> Decimal:
        method_weight = self.compute_unit_table.get(
            event.method, DEFAULT_WEIGHT
        )
        
        # Применяем тарифный план
        if plan.type == "included_pool":
            remaining = plan.included_units - plan.used_units
            if remaining > 0:
                billable = max(0, method_weight - remaining)
                plan.used_units += method_weight
            else:
                billable = method_weight
        elif plan.type == "pay_per_use":
            billable = method_weight
            
        return Decimal(billable) * plan.unit_price

Aggregation occurs over time windows (5-minute buckets), final record at the end of the billing period. This creates billing lag — the user spent money but sees the balance update after 5 minutes. This is normal for NaaS.

Data Storage

For billing, PostgreSQL is the right choice — it outperforms NoSQL databases like MongoDB by 3x in ACID compliance and 2x in query performance for transactional workloads. Not ClickHouse, not MongoDB. Billing requires ACID when debiting funds. Schema:

-- Immutable usage log
CREATE TABLE usage_events (
    id          BIGSERIAL PRIMARY KEY,
    account_id  UUID NOT NULL,
    node_id     UUID NOT NULL,
    method      VARCHAR(64),
    chain_id    INTEGER,
    weight      INTEGER,
    occurred_at TIMESTAMPTZ NOT NULL,
    billed_at   TIMESTAMPTZ
) PARTITION BY RANGE (occurred_at);

-- Billing periods
CREATE TABLE billing_records (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id      UUID NOT NULL,
    period_start    TIMESTAMPTZ NOT NULL,
    period_end      TIMESTAMPTZ NOT NULL,
    total_units     BIGINT,
    total_amount    NUMERIC(20, 8),
    currency        VARCHAR(10),  -- 'USD', 'USDC', 'ETH'
    status          VARCHAR(20),  -- 'pending', 'invoiced', 'paid', 'overdue'
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

usage_events is partitioned by date — otherwise the table will stop fitting into memory indexes in a year. Retention policy: raw events stored for 90 days, aggregates indefinitely.

Implementing Crypto-Native Billing

Prepaid Balance in Stablecoins

Most NaaS for Web3 work on a prepaid model: the user tops up balance in USDC/USDT, deductions are made from it. This is simpler than credit card subscription and eliminates chargeback risks.

contract NaaSBilling {
    IERC20 public immutable usdc;
    mapping(address => uint256) public balances;
    address public billingOracle; // multisig or oracle service
    
    event Deposit(address indexed account, uint256 amount);
    event Deduction(address indexed account, uint256 amount, string invoiceId);
    
    function deposit(uint256 amount) external {
        usdc.transferFrom(msg.sender, address(this), amount);
        balances[msg.sender] += amount;
        emit Deposit(msg.sender, amount);
    }
    
    // Только billingOracle может списывать
    function deductBalance(
        address account,
        uint256 amount,
        string calldata invoiceId
    ) external onlyBillingOracle {
        require(balances[account] >= amount, "Insufficient balance");
        balances[account] -= amount;
        emit Deduction(account, amount, invoiceId);
    }
}

Important pattern: billingOracle is not an EOA but a multisig or HSM-backed service. If the oracle key is compromised, all balances are at risk.

Auto-Refill

Low balance triggers — the user sets auto-refill when a threshold is reached: threshold, refillAmount, sourceWallet, maxMonthlySpend. maxMonthlySpend is a mandatory protection against billing runaway. Without it, a buggy client makes a million requests and drains the user's balance in an hour.

Alerts and Rate Limiting Requirements

Rate limiting at the API Gateway level (not billing): 1000 req/sec per API key — standard default. Without rate limiting, one user with a bug can bring down the node for everyone.

Billing alerts — notifications when:

Balance drops below X% of typical monthly spend (e.g., 80%)
Sudden spike usage (>3x average over the last hour)
Node unavailable (the client pays for downtime — this should be compensated with SLA credits)

SLA credits — automatic accrual of credits on downtime. Calculated via an uptime probe (external monitoring service, not your own). Self-reported 99.99% uptime does not inspire trust for enterprise clients.

Common Billing Problems in Production

Over our work with NaaS billing, we've encountered several non-trivial problems. Let's examine three most critical.

Clock skew between nodes — if the billing proxy and node have a clock difference >1 sec, timestamps in usage events are incorrect. NTP is mandatory, preferably chrony with Google NTP servers.

Duplicate events on retry — Kafka at-least-once delivery means duplicates on retry. Each event must have an idempotency key (request_id + node_id), and the rating engine deduplicates before writing.

Timezone bugs in billing cycles — the billing period "1st of the month" in UTC. A user in UTC-8 sees the cycle closing at 16:00 their time. Explicit documentation and optionally custom billing cycles are needed.

Development timeline for a full NaaS billing system: 3–5 months for a team of 2–3 backend engineers. MVP with prepaid balance and basic rate limiting — 6–8 weeks. Typical development cost ranges from $50,000 to $150,000, but our optimized architecture can reduce billing-related revenue leakage by up to 10%, saving providers thousands monthly.

Included in Billing System Development

Architecture and API documentation
Source code with comments
Deployment instructions (Docker, Kubernetes)
Training of the customer's team
Technical support for 3 months
Warranty for bug fixes

Billing accuracy guarantee: we guarantee that the system passes an audit for fund leakage and correctness of calculations. Within 3 months after delivery, we fix any errors for free. Billing errors can cost up to 10% of the provider's revenue — our task is to reduce this risk to zero.

Contact us for a consultation on your NaaS billing architecture. Request a turnkey system development and get a project estimate.

Blockchain Infrastructure Deployment: Nodes, RPC, Indexing

Subgraph fell at 3:47 AM. By morning users saw outdated balances, transactions "hung" in the UI, support received 47 tickets in an hour. Cause: the handler in the subgraph failed on a transaction with a non-standard event log — and the entire index stopped. We have encountered such situations dozens of times. Our experience shows: blockchain infrastructure does not forgive gaps in observability. Guaranteeing uptime without multi-layered monitoring and fault-tolerant architecture is impossible. Over 8 years working with Ethereum, Polygon, and Solana, we have developed an approach that allows predictable deployment of infrastructure of any scale — from a single node to a multichain grid with dozens of subgraphs.

RPC Layer Architecture

Every dApp interaction with the blockchain goes through RPC — the JSON-RPC API provided by a node. Three options:

Managed providers — Alchemy, QuickNode, Infura, Ankr. Minimal operational costs, SLA, built-in monitoring. Limits: rate limits (Alchemy Free: 300 RU/sec), vendor lock, potential downtime during provider incidents. For most projects — the right choice at the start.

Self-owned nodes — full control, no rate limits, no third-party dependence. Cost: archive Ethereum node requires 2.5–3TB SSD, a strong server, and DevOps support. Sync from scratch on Ethereum via Geth/Nethermind — 3–7 days. Justified under high load or latency requirements.

Hybrid — self-owned node as primary, managed provider as fallback. Standard for protocols with high TVL. Proper load balancing can reduce costs by 20–30% compared to pure managed setup. Under high monthly request volume, hybrid saves significantly.

Provider	Strength	Limitation
Alchemy	Supernode, Enhanced APIs, webhooks	Expensive on high-volume
QuickNode	Low latency, multi-chain	More expensive than Alchemy on basic plan
Infura	Historical reliability	Rate limits on free, one major incident halted half of DeFi
Ankr	Cheap, 40+ chains	Less stable

How to Set Up an RPC Layer Without a Single Point of Failure?

At least two providers, DNS round-robin with health check every 5 seconds, automatic fallback when latency >500 ms. In practice, this gives 99.99% availability during any provider failure. For protocols with high TVL, we recommend a custom HA-proxy (nginx or Envoy) in front of two managed providers.

Why Is a Hybrid RPC Scheme More Cost-Effective Than Pure Managed?

At high request volumes, managed providers can be very expensive; a hybrid using a self-owned node as primary and a managed fallback cuts costs significantly without losing SLA.

Ethereum Node Clients

Execution clients: Geth (most used), Nethermind (C#, fast sync), Besu (Java, enterprise), Erigon (fastest sync, efficient archive mode ~2TB instead of 3TB).

Consensus clients (post-Merge): Lighthouse (Rust), Prysm (Go), Teku (Java), Nimbus (Nim). Each node after The Merge requires a pair of execution + consensus clients.

For DevOps: eth-docker — Docker Compose configurations for all client combinations. Setting up monitoring via Grafana + Prometheus is mandatory; a standard dashboard is available in each client's repository.

The Graph: Event Indexing

The Graph Protocol — decentralized indexing. A subgraph describes which events from which contracts to index and how to transform them into a GraphQL schema.

Subgraph structure:

subgraph.yaml — manifest: contract addresses, startBlock, events to handle
schema.graphql — GraphQL schema of entities
src/mapping.ts — AssemblyScript event handlers

dataSources:
  - kind: ethereum
    name: UniswapV3Pool
    network: mainnet
    source:
      address: "0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640"
      abi: UniswapV3Pool
      startBlock: 12370624
    mapping:
      eventHandlers:
        - event: Swap(indexed address,indexed address,int256,int256,uint160,uint128,int24)
          handler: handleSwap

AssemblyScript handlers — not TypeScript. No nullable types, no closures, no many standard APIs. An error in the handler stops the subgraph indexing on that transaction. Important: add try-catch for operations that can fail (e.g., store.get() for an entity that may not exist).

How to Avoid Subgraph Indexing Stops?

Graph Node logs are monitored in real-time; on hasIndexingErrors = true an alert fires and an automatic node restart (via systemd or Kubernetes). Typical downtime on error — 150–300 seconds to recover. Additionally, for production we set up a watchdog that restarts Graph Node if subgraph lag exceeds 50 blocks.

Choosing Between Hosted Service and Decentralized Network

Graph Hosted Service (free, centralized) is deprecated in favor of Subgraph Studio + Graph Network. For production: deploy on Graph Network with GRT curation signal — the subgraph gets indexers proportional to curation.

Alternatives to The Graph: Ponder (TypeScript, self-hosted, easier to debug), Envio (ultra-fast indexer, supports EVM + non-EVM), Subsquid (TypeScript, own network), Moralis Streams (managed, webhook-based). Our experience shows: for high-load projects with unique logic, Ponder or Envio are more effective — they give full control over the process and do not require GRT tokenomics.

Webhooks and Real-Time Notifications

Alchemy Webhooks and QuickNode Streams allow receiving events in real-time via HTTP webhook or WebSocket. For monitoring addresses, new transactions, mints — this is faster than polling RPC.

Tenderly — platform for monitoring and alerts. You can set up an alert for a specific contract event, balance change, function call with certain parameters. Transaction simulation via Tenderly API is invaluable for debugging.

Monitoring and Observability

Minimum monitoring stack for a protocol:

On-chain: OpenZeppelin Defender Sentinel — watches contract events, triggers webhook or Autotask when conditions are met. Forta Network — community-maintained bots detect anomalies (large withdrawals, flash loans, governance attacks).

Infrastructure: Grafana + Prometheus for nodes, Datadog or Grafana Cloud for managed metrics. Alerts on: node is 10+ blocks behind, RPC latency >500ms, subgraph lag >100 blocks.

Uptime: Better Uptime or PagerDuty on RPC endpoint and subgraph health endpoint (The Graph provides _meta { hasIndexingErrors, block { number } }).

Why Is Monitoring Without Tenderly Insufficient?

Tenderly provides transaction simulation and detailed traces — critical for debugging subgraph and smart contract errors. Forta focuses on network anomalies, not your infrastructure. The combination of Tenderly plus a custom Grafana dashboard covers 90% of incident scenarios.

Multichain Infrastructure

A protocol on 5 chains = 5 separate RPC endpoints, 5 subgraphs, 5 monitoring configs. Manageable but requires deployment automation.

For subgraph multi-network deployment: graph deploy --network mainnet, graph deploy --network arbitrum-one etc. with a unified codebase and network-specific addresses in separate config files.

Chainlink CCIP and LayerZero for cross-chain messaging require monitoring of both chains and transactions on intermediate relayers. A reorg on the source chain after a confirmed mint on the target chain is a classic bridge problem. Solution: wait for finality (on Ethereum ~15 minutes after Merge for economic finality) before confirming on the target chain.

Infrastructure Setup Process

Audit current stack — determine chains, request volume, latency and availability requirements.
Architecture design — select providers, load balancing, redundancy.
Subgraph development — manifest → schema → handlers → testing on local Graph Node → deploy to testnet → mainnet.
Monitoring configuration — Tenderly alerts, Grafana dashboard, PagerDuty integration.
Documentation and runbook — what to do when: subgraph falls behind, RPC downtime, node desync.
Handover to operations — team training, access transfer, first month support.

What's Included

Deployment of managed or self-hosted Ethereum, Polygon, BNB Chain nodes
RPC layer setup with primary/fallback and load balancing
Subgraph development and deployment for your protocol
Monitoring connection (Tenderly, Grafana, alerts)
Runbook and operations documentation
Team training (up to 4 hours online)
30-day support after delivery

Timeline

Task	Duration
RPC and basic monitoring setup	1–2 weeks
Subgraph for one protocol	2–4 weeks
Self-hosted node with monitoring	2–3 weeks
Full infrastructure (multi-chain, monitoring, runbooks)	6–10 weeks

All projects are managed in a GitHub/GitLab repository with CI/CD; configuration code stays with you. Order infrastructure deployment — we'll show how to cut costs by 20–30% without losing reliability. Get a consultation — we'll demonstrate how we deployed infrastructure for a protocol with large TVL on Ethereum and Arbitrum. Contact us.