Blockchain Project Technical Support
Blockchain infrastructure breaks differently than regular web services. Node can hang on specific block due to edge case in consensus client. Smart contract may behave correctly during testing but unpredictably on interaction with another protocol via flash loan. RPC provider may start returning stale data without explicit errors. Supporting such systems requires specific knowledge non-existent in typical DevOps team.
What's Included in Ongoing Support
Infrastructure Monitoring:
- On-chain smart contract monitoring (unusual calls, state changes)
- Node state (sync lag, peer count, client version)
- Validator / sequencer state
- Bridge contract operability
- Service account balances (relayer, deployer, keeper)
Incident Response:
- On-call schedule with SLA for first response
- Node problem diagnosis and fix
- Emergency contract pause on exploit detection
- Coordination with auditors on security incidents
Maintenance:
- Client updates (Geth, Lighthouse etc) on new releases
- Smart contract hotfixes via upgrade mechanism
- Service account key rotation
- RPC endpoint updates on provider degradation
Monitoring Stack
# docker-compose monitoring stack
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
alertmanager:
image: prom/alertmanager:latest
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
Prometheus alert rules for typical EVM project:
groups:
- name: blockchain
rules:
- alert: NodeSyncLag
expr: eth_syncing_current_block - eth_syncing_highest_block > 50
for: 5m
labels: { severity: warning }
annotations:
summary: "Node is {{ $value }} blocks behind head"
- alert: ServiceWalletLowBalance
expr: eth_balance{account="relayer"} < 0.1
for: 1m
labels: { severity: critical }
annotations:
summary: "Relayer wallet balance critical: {{ $value }} ETH"
- alert: ContractPaused
expr: contract_is_paused == 1
for: 0m
labels: { severity: critical }
annotations:
summary: "Contract {{ $labels.contract }} is paused"
On-Chain Monitoring via OpenZeppelin Defender
Defender Sentinel — monitor specific events and calls on contract:
// Sentinel configuration (via Defender UI or API)
{
"type": "BLOCK",
"network": "mainnet",
"addresses": ["0xYOUR_CONTRACT"],
"abi": [...],
"eventConditions": [
{ "eventSignature": "RoleGranted(bytes32,address,address)" },
{ "eventSignature": "Upgraded(address)" } // proxy upgrade event
],
"functionConditions": [
{ "functionSignature": "pause()" }
]
}
On trigger — webhook to your service or direct notification in Telegram/PagerDuty.
Security Incident Response
Every protocol with TVL should have runbook for security incidents. Typical scenario:
- Detection (automatic alert or external report)
- Assessment (5–15 minutes): damage size, active exploit, can pause
- Pause (if contract pausable): immediately, don't wait for full analysis
- Notification (15–30 minutes): team, token holders, auditors
- Investigation: analyze transactions in Tenderly, trace exploit
- Fix: hotfix contract, audit fix
- Post-mortem: public report on what happened
For "pause" step — pauser role should be configured on Gnosis Safe with 1/N threshold (fast response), and upgrade role — on Safe with N/M threshold (slow, secure change).
Node Support: Common Issues
Geth stuck on block:
# Diagnostics
curl -s -X POST localhost:8545 \
-d '{"jsonrpc":"2.0","method":"debug_traceBlockByNumber","params":["latest",{}],"id":1}'
# If node wedged — restart with --gcmode archive and --syncmode full
# If state corruption — resync from checkpoint
geth snapshot prune-state # frees space, sometimes helps
Consensus client doesn't see peers:
lighthouse bn --libp2p-addresses /ip4/0.0.0.0/tcp/9000 # check port open
# Check firewall:
iptables -L INPUT | grep 9000
op-batcher not publishing batches:
- Check batcher wallet balance (needs ETH for L1 transactions)
- Check L1 RPC availability
-
op-batcherlogs forchannel_fullortx_failed
SLA and Support Models
| Level | Response Time | Coverage | Suitable for |
|---|---|---|---|
| Basic monitoring | — / no on-call | 9×5 business hours | Testnet, pre-launch |
| Standard | 4h critical, 24h other | 5×12 | Mainnet with low TVL |
| Production | 30m critical, 4h other | 7×24 | Mainnet with active users |
| Enterprise | 15m critical, 1h other | 7×24 + dedicated | DeFi protocols, infrastructure |
For projects with TVL > $1M and active users minimum reasonable level — Production.







