What news sources do you connect?

We connect any sources with RSS/Atom feeds or open APIs: CoinDesk, CoinTelegraph, The Block, Decrypt, CryptoPanic, Messari, Santiment. If a source lacks an API, we use HTML parsing with cheerio/BeautifulSoup. On average, we include 10–15 sources per project.

How do you solve the problem of duplicate news?

We use two-level deduplication: first by unique GUID from RSS/API, then by headline similarity (cosine similarity or Levenshtein distance). This eliminates reposts of the same news across different sources.

How long does it take to develop an aggregator?

A typical project with 10–15 sources using API+RSS takes 2–3 weeks. If HTML parsers are needed, the timeline may extend to 4 weeks. The duration also depends on the complexity of tagging rules and integration with your system.

What technologies do you use for parsing?

Our main stack: Node.js with cheerio and rss-parser, Python with BeautifulSoup/lxml for complex scenarios. Storage: PostgreSQL with indexes on time and tags. Monitoring: Prometheus + Alertmanager. All parsers are wrapped in Docker containers.

Do you provide an uptime guarantee for parsing?

Yes, we guarantee 99.5% uptime for critical sources. We implement monitoring with alerts for extraction rate drops or data staleness. If a source changes its markup, the parser is fixed within 24 hours under SLA.

What news sources do you connect?

We connect any sources with RSS/Atom feeds or open APIs: CoinDesk, CoinTelegraph, The Block, Decrypt, CryptoPanic, Messari, Santiment. If a source lacks an API, we use HTML parsing with cheerio/BeautifulSoup. On average, we include 10–15 sources per project.

How do you solve the problem of duplicate news?

We use two-level deduplication: first by unique GUID from RSS/API, then by headline similarity (cosine similarity or Levenshtein distance). This eliminates reposts of the same news across different sources.

How long does it take to develop an aggregator?

A typical project with 10–15 sources using API+RSS takes 2–3 weeks. If HTML parsers are needed, the timeline may extend to 4 weeks. The duration also depends on the complexity of tagging rules and integration with your system.

What technologies do you use for parsing?

Our main stack: Node.js with cheerio and rss-parser, Python with BeautifulSoup/lxml for complex scenarios. Storage: PostgreSQL with indexes on time and tags. Monitoring: Prometheus + Alertmanager. All parsers are wrapped in Docker containers.

Do you provide an uptime guarantee for parsing?

Yes, we guarantee 99.5% uptime for critical sources. We implement monitoring with alerts for extraction rate drops or data staleness. If a source changes its markup, the parser is fixed within 24 hours under SLA.

Crypto News Parsing: RSS & API Aggregator

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1305 services

Crypto News Parsing: RSS & API Aggregator

Simple

from 1 day to 3 days

Frequently Asked Questions

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

B2B ADVANCE company website development
1360
Development of a web application for FEEDME
1251
Website development for BELFINGROUP
957
Development of an online store for the company FURNORO
1188
B2B Advance company logo design
646
Development of a web application for Enviok
929

Show more works

The crypto market reacts to news faster than tradfi: from the publication of an article about regulatory actions to price movement — sometimes seconds. For trading systems, risk monitoring, or sentiment analysis, you need a structured news stream with minimal latency. We are blockchain engineers with extensive production experience — we take on organizing such a stream turnkey. Over our work, we have implemented 7 aggregators for funds and traders, reducing the average delay to 3 seconds.

Why latency is critical

A 30-second delay can cost thousands of dollars in impulse strategy arbitrage. One of our clients lost about $500 on a single trade due to outdated data — after switching to our pipeline, latency dropped from 40 to 2 seconds. News sources vary in speed: CoinDesk RSS feeds update every 5–10 minutes, the CryptoPanic API is real-time, and HTML parsing adds another 10–30 seconds. We design the pipeline so that news reaches your system with minimal latency and guaranteed delivery.

Which sources to use and how?

Comparison of three approaches:

Method	Speed	Reliability	Complexity	Examples
RSS/Atom feeds	Medium	High	Low	CoinDesk, CoinTelegraph, The Block
Official API	High	High	Medium	CryptoPanic, Messari, Santiment
HTML parsing	Low	Low	High	Blockworks, exchange news

RSS/Atom feeds (most reliable)

CoinDesk, Cointelegraph, The Block, Decrypt — all have RSS. This is an official, stable channel:

import Parser from "rss-parser"

const parser = new Parser({
  customFields: {
    item: [["media:content", "media", { keepArray: false }]],
  },
})

const feeds: Record<string, string> = {
  coindesk:    "https://www.coindesk.com/arc/outboundfeeds/rss/",
  cointelegraph: "https://cointelegraph.com/rss",
  theblock:    "https://www.theblock.co/rss.xml",
  decrypt:     "https://decrypt.co/feed",
}

async function fetchFeed(source: string, url: string): Promise<NewsItem[]> {
  const feed = await parser.parseURL(url)
  return feed.items.map((item) => ({
    source,
    title: item.title ?? "",
    url: item.link ?? "",
    publishedAt: new Date(item.pubDate ?? ""),
    summary: item.contentSnippet ?? "",
    guid: item.guid ?? item.link ?? "",
  }))
}

Polling every 5 minutes — a reasonable balance between freshness and load on the source. Deduplication by guid.

Official APIs

CryptoPanic API — news aggregator with sentiment scoring:

GET https://cryptopanic.com/api/v1/posts/?auth_token={key}&currencies=BTC,ETH&kind=news

Returns structured data with bullish/bearish community votes.

Messari API — quality news with asset tags:

GET https://data.messari.io/api/v1/news?page=1&limit=50

Santiment — news + on-chain data + social metrics in one API.

HTML parsing (when no API)

For sources without RSS — cheerio (Node.js) or BeautifulSoup (Python). Fragile approach: any markup change breaks the parser. For critical sources — monitor parsing success and alert on extraction rate drop below 95%.

import * as cheerio from "cheerio"

async function scrapeBlockworks(html: string): Promise<NewsItem[]> {
  const $ = cheerio.load(html)
  return $("article.post-card").map((_, el) => ({
    title: $(el).find("h2.post-title").text().trim(),
    url: $(el).find("a").attr("href") ?? "",
    publishedAt: new Date($(el).find("time").attr("datetime") ?? ""),
    summary: $(el).find("p.excerpt").text().trim(),
  })).get()
}

How to effectively deduplicate news?

Deduplication is critical — the same news may appear in multiple sources. We use a two-level approach: first exact match by external_id (GUID from RSS), then fuzzy match by title (cosine similarity with a threshold of 0.85). This eliminates 99% of duplicates. We use PostgreSQL for storage:

CREATE TABLE news_items (
  id           BIGSERIAL PRIMARY KEY,
  source       VARCHAR(50)  NOT NULL,
  external_id  VARCHAR(255) NOT NULL,
  title        TEXT         NOT NULL,
  url          TEXT         NOT NULL,
  published_at TIMESTAMPTZ  NOT NULL,
  summary      TEXT,
  raw_content  TEXT,
  tags         TEXT[],
  UNIQUE (source, external_id)
);

CREATE INDEX idx_news_published ON news_items (published_at DESC);
CREATE INDEX idx_news_tags ON news_items USING GIN (tags);

Asset tagging — determine which crypto assets are mentioned in the news by a list of tickers and names. A simple regex-based approach gives 80–90% accuracy for major assets. For complex cases, we use an NLP model with 95% accuracy.

What matters in production

Monitor freshness: if the latest news from a source is older than 30 minutes — alert. RSS can stall without an explicit error. Extraction rate — percentage of successfully parsed elements: a drop below 90% is a critical alert. Respect robots.txt and rate limiting: avoid generating excessive load. Jitter between requests (uneven intervals). Set a proper User-Agent. Some sources block headless browser user agents.

How to set up news parsing: step-by-step plan

Source analysis and selection of suitable methods (RSS, API, HTML).
Data schema and pipeline design.
Module implementation with unit tests.
Deduplication and asset tagging setup.
Deployment in Docker/Kubernetes with monitoring.
Documentation and integration with the client's system.

Comparison of sentiment analysis tools

Tool	Type	Accuracy	Speed
CryptoPanic voting	community-based	70–80%	real-time
Santiment social trends	on-chain + social	85–90%	5–10 min
Vader / TextBlob	lexicon-based	75–85%	milliseconds
FinBERT	NLP model	90–95%	seconds

What's included

Architecture design for data collection (data schema, source selection, load balancing)
Implementation of parsing modules (RSS, API, HTML) with unit and integration tests
Deduplication and asset tagging setup
Deployment in Docker/Kubernetes with monitoring and alerting
API documentation for data access and instructions for adding new sources
Post-launch support: we fix parsers within 24 hours if source markup changes

Realistic timeline for an aggregator of 10–15 sources via API + storage: 2–3 weeks. Get a consultation — we will assess your project for free. Order an assessment — we will select the optimal architecture for your budget and requirements.

Blockchain Infrastructure Deployment: Nodes, RPC, Indexing

Subgraph fell at 3:47 AM. By morning users saw outdated balances, transactions "hung" in the UI, support received 47 tickets in an hour. Cause: the handler in the subgraph failed on a transaction with a non-standard event log — and the entire index stopped. We have encountered such situations dozens of times. Our experience shows: blockchain infrastructure does not forgive gaps in observability. Guaranteeing uptime without multi-layered monitoring and fault-tolerant architecture is impossible. Over 8 years working with Ethereum, Polygon, and Solana, we have developed an approach that allows predictable deployment of infrastructure of any scale — from a single node to a multichain grid with dozens of subgraphs.

RPC Layer Architecture

Every dApp interaction with the blockchain goes through RPC — the JSON-RPC API provided by a node. Three options:

Managed providers — Alchemy, QuickNode, Infura, Ankr. Minimal operational costs, SLA, built-in monitoring. Limits: rate limits (Alchemy Free: 300 RU/sec), vendor lock, potential downtime during provider incidents. For most projects — the right choice at the start.

Self-owned nodes — full control, no rate limits, no third-party dependence. Cost: archive Ethereum node requires 2.5–3TB SSD, a strong server, and DevOps support. Sync from scratch on Ethereum via Geth/Nethermind — 3–7 days. Justified under high load or latency requirements.

Hybrid — self-owned node as primary, managed provider as fallback. Standard for protocols with high TVL. Proper load balancing can reduce costs by 20–30% compared to pure managed setup. Under high monthly request volume, hybrid saves significantly.

Provider	Strength	Limitation
Alchemy	Supernode, Enhanced APIs, webhooks	Expensive on high-volume
QuickNode	Low latency, multi-chain	More expensive than Alchemy on basic plan
Infura	Historical reliability	Rate limits on free, one major incident halted half of DeFi
Ankr	Cheap, 40+ chains	Less stable

How to Set Up an RPC Layer Without a Single Point of Failure?

At least two providers, DNS round-robin with health check every 5 seconds, automatic fallback when latency >500 ms. In practice, this gives 99.99% availability during any provider failure. For protocols with high TVL, we recommend a custom HA-proxy (nginx or Envoy) in front of two managed providers.

Why Is a Hybrid RPC Scheme More Cost-Effective Than Pure Managed?

At high request volumes, managed providers can be very expensive; a hybrid using a self-owned node as primary and a managed fallback cuts costs significantly without losing SLA.

Ethereum Node Clients

Execution clients: Geth (most used), Nethermind (C#, fast sync), Besu (Java, enterprise), Erigon (fastest sync, efficient archive mode ~2TB instead of 3TB).

Consensus clients (post-Merge): Lighthouse (Rust), Prysm (Go), Teku (Java), Nimbus (Nim). Each node after The Merge requires a pair of execution + consensus clients.

For DevOps: eth-docker — Docker Compose configurations for all client combinations. Setting up monitoring via Grafana + Prometheus is mandatory; a standard dashboard is available in each client's repository.

The Graph: Event Indexing

The Graph Protocol — decentralized indexing. A subgraph describes which events from which contracts to index and how to transform them into a GraphQL schema.

Subgraph structure:

subgraph.yaml — manifest: contract addresses, startBlock, events to handle
schema.graphql — GraphQL schema of entities
src/mapping.ts — AssemblyScript event handlers

dataSources:
  - kind: ethereum
    name: UniswapV3Pool
    network: mainnet
    source:
      address: "0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640"
      abi: UniswapV3Pool
      startBlock: 12370624
    mapping:
      eventHandlers:
        - event: Swap(indexed address,indexed address,int256,int256,uint160,uint128,int24)
          handler: handleSwap

AssemblyScript handlers — not TypeScript. No nullable types, no closures, no many standard APIs. An error in the handler stops the subgraph indexing on that transaction. Important: add try-catch for operations that can fail (e.g., store.get() for an entity that may not exist).

How to Avoid Subgraph Indexing Stops?

Graph Node logs are monitored in real-time; on hasIndexingErrors = true an alert fires and an automatic node restart (via systemd or Kubernetes). Typical downtime on error — 150–300 seconds to recover. Additionally, for production we set up a watchdog that restarts Graph Node if subgraph lag exceeds 50 blocks.

Choosing Between Hosted Service and Decentralized Network

Graph Hosted Service (free, centralized) is deprecated in favor of Subgraph Studio + Graph Network. For production: deploy on Graph Network with GRT curation signal — the subgraph gets indexers proportional to curation.

Alternatives to The Graph: Ponder (TypeScript, self-hosted, easier to debug), Envio (ultra-fast indexer, supports EVM + non-EVM), Subsquid (TypeScript, own network), Moralis Streams (managed, webhook-based). Our experience shows: for high-load projects with unique logic, Ponder or Envio are more effective — they give full control over the process and do not require GRT tokenomics.

Webhooks and Real-Time Notifications

Alchemy Webhooks and QuickNode Streams allow receiving events in real-time via HTTP webhook or WebSocket. For monitoring addresses, new transactions, mints — this is faster than polling RPC.

Tenderly — platform for monitoring and alerts. You can set up an alert for a specific contract event, balance change, function call with certain parameters. Transaction simulation via Tenderly API is invaluable for debugging.

Monitoring and Observability

Minimum monitoring stack for a protocol:

On-chain: OpenZeppelin Defender Sentinel — watches contract events, triggers webhook or Autotask when conditions are met. Forta Network — community-maintained bots detect anomalies (large withdrawals, flash loans, governance attacks).

Infrastructure: Grafana + Prometheus for nodes, Datadog or Grafana Cloud for managed metrics. Alerts on: node is 10+ blocks behind, RPC latency >500ms, subgraph lag >100 blocks.

Uptime: Better Uptime or PagerDuty on RPC endpoint and subgraph health endpoint (The Graph provides _meta { hasIndexingErrors, block { number } }).

Why Is Monitoring Without Tenderly Insufficient?

Tenderly provides transaction simulation and detailed traces — critical for debugging subgraph and smart contract errors. Forta focuses on network anomalies, not your infrastructure. The combination of Tenderly plus a custom Grafana dashboard covers 90% of incident scenarios.

Multichain Infrastructure

A protocol on 5 chains = 5 separate RPC endpoints, 5 subgraphs, 5 monitoring configs. Manageable but requires deployment automation.

For subgraph multi-network deployment: graph deploy --network mainnet, graph deploy --network arbitrum-one etc. with a unified codebase and network-specific addresses in separate config files.

Chainlink CCIP and LayerZero for cross-chain messaging require monitoring of both chains and transactions on intermediate relayers. A reorg on the source chain after a confirmed mint on the target chain is a classic bridge problem. Solution: wait for finality (on Ethereum ~15 minutes after Merge for economic finality) before confirming on the target chain.

Infrastructure Setup Process

Audit current stack — determine chains, request volume, latency and availability requirements.
Architecture design — select providers, load balancing, redundancy.
Subgraph development — manifest → schema → handlers → testing on local Graph Node → deploy to testnet → mainnet.
Monitoring configuration — Tenderly alerts, Grafana dashboard, PagerDuty integration.
Documentation and runbook — what to do when: subgraph falls behind, RPC downtime, node desync.
Handover to operations — team training, access transfer, first month support.

What's Included

Deployment of managed or self-hosted Ethereum, Polygon, BNB Chain nodes
RPC layer setup with primary/fallback and load balancing
Subgraph development and deployment for your protocol
Monitoring connection (Tenderly, Grafana, alerts)
Runbook and operations documentation
Team training (up to 4 hours online)
30-day support after delivery

Timeline

Task	Duration
RPC and basic monitoring setup	1–2 weeks
Subgraph for one protocol	2–4 weeks
Self-hosted node with monitoring	2–3 weeks
Full infrastructure (multi-chain, monitoring, runbooks)	6–10 weeks

All projects are managed in a GitHub/GitLab repository with CI/CD; configuration code stays with you. Order infrastructure deployment — we'll show how to cut costs by 20–30% without losing reliability. Get a consultation — we'll demonstrate how we deployed infrastructure for a protocol with large TVL on Ethereum and Arbitrum. Contact us.