Development of Node-as-a-Service Billing System
Running a node is simple. Billing correctly — especially when the unit of consumption isn't a request but node uptime, network type, client version, and the plan user chose three weeks ago — is hard. NaaS billing fails in most young providers not because it's technically difficult, but because the gap between "counting money roughly right" and "counting money precisely and transparently" is several months of engineering work.
Pricing Models: What's Actually Used
Pay-per-request — classic for RPC providers (Alchemy, Infura, QuickNode). Count JSON-RPC calls with weight coefficients by method:
| Method | Compute Units |
|---|---|
eth_blockNumber |
10 |
eth_getBalance |
19 |
eth_call |
26 |
eth_getLogs |
75 |
trace_transaction |
150 |
debug_traceTransaction |
500 |
eth_getLogs with wide block range — attack on node. Without weight coefficients, user can make one request worth thousands of "normal" ones. Alchemy calls this Compute Units, QuickNode — Credits. Different names, same idea.
Time-based (subscription) — dedicated fixed-power node billed monthly. More transparent to user, predictable revenue for provider. Downside: user overpays at low load.
Hybrid — base plan with monthly included volume, overage billing on top. Used by most mature providers.
Billing System Architecture
Metric Collection Layer
Critical path: each RPC request must be logged before response to user — otherwise on crash we lose usage data. Architecture:
Client Request
↓
API Gateway (Nginx / Envoy / Kong)
↓ [access log + request metadata]
Billing Proxy (sidecar) — async write to queue
↓
RPC Node Cluster
↓
Response → Client
Billing proxy writes to Apache Kafka or NATS JetStream — both provide at-least-once delivery. Synchronous database write per request kills latency (adds 100–500ms to each RPC call, unacceptable).
// Async metric emission — doesn't block request
func (b *BillingMiddleware) RecordUsage(ctx context.Context, event UsageEvent) {
select {
case b.eventChan <- event:
// successfully queued
default:
// buffer full — metric lost, log as sampling loss
b.metrics.IncSamplingLoss()
}
}
Acceptable sampling loss percentage for billing: <0.01%. Losing more means backpressure or node dies under load.
Aggregation and Rating Engine
Raw events from Kafka → rating pipeline → billable records in PostgreSQL.
Rating is applying tariff rules to raw usage. For NaaS:
class RatingEngine:
def rate_event(self, event: UsageEvent, plan: Plan) -> Decimal:
method_weight = self.compute_unit_table.get(
event.method, DEFAULT_WEIGHT
)
# Apply pricing plan
if plan.type == "included_pool":
remaining = plan.included_units - plan.used_units
if remaining > 0:
billable = max(0, method_weight - remaining)
plan.used_units += method_weight
else:
billable = method_weight
elif plan.type == "pay_per_use":
billable = method_weight
return Decimal(billable) * plan.unit_price
Aggregation happens in time windows (5-minute buckets), final record written at billing period end. This creates billing lag — user spent money but sees balance update in 5 minutes. Normal for NaaS.
Data Storage
PostgreSQL is the right choice for billing. Not ClickHouse, not MongoDB. Billing requires ACID on deductions. Schema:
-- Immutable usage log
CREATE TABLE usage_events (
id BIGSERIAL PRIMARY KEY,
account_id UUID NOT NULL,
node_id UUID NOT NULL,
method VARCHAR(64),
chain_id INTEGER,
weight INTEGER,
occurred_at TIMESTAMPTZ NOT NULL,
billed_at TIMESTAMPTZ
) PARTITION BY RANGE (occurred_at);
-- Billing periods
CREATE TABLE billing_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID NOT NULL,
period_start TIMESTAMPTZ NOT NULL,
period_end TIMESTAMPTZ NOT NULL,
total_units BIGINT,
total_amount NUMERIC(20, 8),
currency VARCHAR(10), -- 'USD', 'USDC', 'ETH'
status VARCHAR(20), -- 'pending', 'invoiced', 'paid', 'overdue'
created_at TIMESTAMPTZ DEFAULT NOW()
);
usage_events partitioned by date — otherwise in a year table won't fit in RAM indexes. Retention: raw events 90 days, aggregates indefinitely.
Crypto-Native Billing: Specifics
Prepaid Balance in Stablecoins
Most NaaS for Web3 work in prepaid: user tops up balance in USDC/USDT, deduction happens from it. Simpler than credit card subscription and no chargeback risk.
contract NaaSBilling {
IERC20 public immutable usdc;
mapping(address => uint256) public balances;
address public billingOracle; // multisig or oracle service
event Deposit(address indexed account, uint256 amount);
event Deduction(address indexed account, uint256 amount, string invoiceId);
function deposit(uint256 amount) external {
usdc.transferFrom(msg.sender, address(this), amount);
balances[msg.sender] += amount;
emit Deposit(msg.sender, amount);
}
// Only billingOracle can deduct
function deductBalance(
address account,
uint256 amount,
string calldata invoiceId
) external onlyBillingOracle {
require(balances[account] >= amount, "Insufficient balance");
balances[account] -= amount;
emit Deduction(account, amount, invoiceId);
}
}
Important pattern: billingOracle — not EOA, but multisig or HSM service. Compromised oracle key — all balances at risk.
Automatic Balance Refill
Low balance triggers — user configures auto-refill at threshold:
interface AutoRefillConfig {
threshold: bigint; // refill when balance < threshold
refillAmount: bigint; // refill by this amount
sourceWallet: string; // wallet for auto-deduction (needs approve)
maxMonthlySpend: bigint; // runaway protection
}
maxMonthlySpend — mandatory protection. Without it, buggy client makes million requests and depletes user's balance in an hour.
Alerts and Rate Limiting
Rate limiting at API Gateway level (not billing): 1000 req/sec per API key — standard default. Without it, one user with a bug can kill node for everyone.
Billing alerts — notifications on:
- Balance dropped below X% of normal monthly spend
- Usage spike (>3x average in last hour)
- Node unavailable (user pays for downtime — should be compensated with SLA credits)
SLA credits — automatic credit allocation on downtime. Counted via uptime probe (external monitoring service, not your own). Self-reported uptime 99.99% doesn't inspire enterprise client trust.
What Breaks in Production
Working with NaaS billing encountered several non-trivial problems:
Clock skew between nodes — if billing proxy and node have >1 sec clock drift, usage event timestamps incorrect. NTP mandatory, preferably chrony with Google NTP servers.
Duplicate events on retry — Kafka at-least-once delivery means retries create duplicates. Each event needs idempotency key (request_id + node_id), rating engine de-duplicates before write.
Timezone bugs in billing cycles — billing period "1st of month" in UTC. User from UTC-8 sees cycle closing at 16:00 their time. Need explicit documentation and optional custom billing cycles.
Development timeline for full NaaS billing system: 3–5 months for team of 2–3 backend engineers. MVP with prepaid balance and basic rate limiting — 6–8 weeks.







