Social Media Crypto Data Scraping (Twitter/X, Telegram, Discord)

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Social Media Crypto Data Scraping (Twitter/X, Telegram, Discord)
Medium
~3-5 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1217
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1046
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

Crypto Data Scraping from Social Media (Twitter/X, Telegram, Discord)

Crypto community lives in Twitter/X, Telegram, and Discord. That's where signals appear first: insider leaks hours before announcement, panic building at depeg, pump coordination, early vulnerability discussions. For trading signals, sentiment analysis and security monitoring, need reliable data pipeline. Each platform has its own access specifics.

Twitter/X: API and workarounds

Official API

Twitter API v2 — only legal path. After X Corp restructuring, pricing became aggressive:

  • Free tier — write only, read limited. Useless for parsing.
  • Basic ($100/mo) — 10,000 posts/month. Barely covers one account.
  • Pro ($5000/mo) — 1M tweets/month, Filtered Stream access. Real for serious analytics.
  • Enterprise — Full Archive Search, Firehose. Price on request, tens of thousands monthly.

For crypto sentiment on Pro tier:

import tweepy

client = tweepy.Client(bearer_token=BEARER_TOKEN)

# Filtered Stream for real-time monitoring
class CryptoStreamListener(tweepy.StreamingClient):
    def on_tweet(self, tweet):
        if tweet.data:
            asyncio.create_task(self.process_tweet(tweet))

    async def process_tweet(self, tweet):
        await self.queue.put({
            "id": tweet.data.id,
            "text": tweet.data.text,
            "author_id": tweet.data.author_id,
            "created_at": tweet.data.created_at,
            "source": "twitter",
        })

stream = CryptoStreamListener(bearer_token=BEARER_TOKEN, queue=event_queue)
# Filter rules (AND/OR/NOT operators)
stream.add_rules(tweepy.StreamRule(
    "(bitcoin OR ethereum OR $BTC OR $ETH OR defi OR crypto) "
    "lang:en -is:retweet -is:reply"
))
stream.filter(tweet_fields=["created_at", "author_id", "public_metrics"])

Recent Search for historical data (up to 7 days on Pro):

# Pagination via next_token
tweets = []
paginator = tweepy.Paginator(
    client.search_recent_tweets,
    query="$BTC OR bitcoin lang:en -is:retweet",
    tweet_fields=["created_at", "public_metrics", "author_id"],
    max_results=100,
    limit=10,  # 10 pages = 1000 tweets
)
async for tweet in paginator:
    tweets.append(tweet)

Alternatives and limitations

Limited budget — third-party Twitter data providers: Brandwatch, Sprinklr, Tweetbinder. Sell access to historical and stream data at better prices.

Scraping via unofficial API (without keys) — ToS violation, legally risky for commercial projects. Technically possible via session cookies and reverse-engineered endpoints, but X Corp actively blocks.

Telegram: MTProto API

Telegram — main platform for crypto announcements. Most projects run official Telegram channels.

Telethon: User account API

Telegram provides two APIs: Bot API (limited) and MTProto API (full access via user account). For channel parsing need MTProto via Telethon library:

from telethon import TelegramClient, events
from telethon.tl.types import Channel

API_ID = int(os.getenv("TELEGRAM_API_ID"))
API_HASH = os.getenv("TELEGRAM_API_HASH")

async def monitor_channels(channel_usernames: list[str]):
    async with TelegramClient("session", API_ID, API_HASH) as client:
        # Subscribe to new messages
        @client.on(events.NewMessage(chats=channel_usernames))
        async def handler(event):
            msg = event.message
            await process_message({
                "channel": event.chat.username,
                "message_id": msg.id,
                "text": msg.text or "",
                "date": msg.date,
                "views": msg.views,
                "forwards": msg.forwards,
                "has_media": bool(msg.media),
            })

        # Get channel history
        async def fetch_history(channel: str, limit: int = 1000):
            messages = []
            async for msg in client.iter_messages(channel, limit=limit):
                messages.append({
                    "id": msg.id,
                    "text": msg.text or "",
                    "date": msg.date,
                    "views": msg.views,
                })
            return messages

        await client.run_until_disconnected()

Important: Telethon uses real user account. Telegram blocks accounts on suspicious activity (too many requests, parsing too many channels). Use dedicated account, respect rate limits, don't parse private groups without permission.

Bot API works only if bot is added to group/channel. Public channels inaccessible to bot without joining.

Telegram anomaly detection

Activity spike in channel — signal:

async def detect_activity_spike(channel: str, window_minutes: int = 60):
    # Count messages last hour vs previous hour
    now = datetime.utcnow()
    hour_ago = now - timedelta(hours=1)
    two_hours_ago = now - timedelta(hours=2)

    recent_count = await db.count_messages(channel, hour_ago, now)
    prev_count = await db.count_messages(channel, two_hours_ago, hour_ago)

    if prev_count > 0:
        spike_ratio = recent_count / prev_count
        if spike_ratio > 3:  # 3x normal
            await alert(f"Activity spike in {channel}: {spike_ratio:.1f}x")

Discord: Bot API

Most DeFi projects use Discord for community. Technical discussions, early announcements, sometimes attack coordination.

Discord Bot

Need bot token from Discord Developer Portal and bot added to server:

import discord
from discord.ext import commands

intents = discord.Intents.default()
intents.message_content = True  # Privileged intent — needs Discord approval
bot = commands.Bot(command_prefix="!", intents=intents)

TARGET_SERVERS = {
    "1234567890": ["general", "announcements", "alpha-calls"],
}

@bot.event
async def on_message(message: discord.Message):
    if message.author.bot:
        return

    guild_id = str(message.guild.id) if message.guild else None
    if guild_id not in TARGET_SERVERS:
        return

    channel_name = message.channel.name
    if channel_name not in TARGET_SERVERS[guild_id]:
        return

    await process_message({
        "platform": "discord",
        "server": message.guild.name,
        "channel": channel_name,
        "author": str(message.author),
        "content": message.content,
        "timestamp": message.created_at,
        "attachments": [a.url for a in message.attachments],
    })

Limitation: message_content — privileged intent. Discord requires bot verification (100+ servers) for it. Works on small servers without verification, on large ones — needs approval.

Message history available via channel.history(), but only for servers where bot already present. Can't retroactively get history.

Storage and Processing

Unified schema for messages from all platforms:

CREATE TABLE social_messages (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    platform    TEXT NOT NULL,          -- 'twitter', 'telegram', 'discord'
    source_id   TEXT NOT NULL,          -- original message ID
    channel     TEXT,                   -- @username, channel_name, server/channel
    author      TEXT,
    content     TEXT NOT NULL,
    metadata    JSONB,                  -- platform-specific: views, likes, reactions
    captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    published_at TIMESTAMPTZ,
    UNIQUE (platform, source_id)
);

CREATE INDEX idx_social_platform_channel ON social_messages (platform, channel, published_at DESC);
CREATE INDEX idx_social_content_fts ON social_messages USING gin(to_tsvector('english', content));

GIN index for full-text search — needed for token mentions, keywords, contract addresses.

Sentiment Analysis

For crypto-specific sentiment, raw text needs processing:

Keyword extraction: ticker mentions ($BTC, $ETH, $PEPE), contract addresses (0x...), protocols.

Sentiment: specialized models beat general ones. FinBERT and CryptoBERT — fine-tuned BERT for financial/crypto content. Via HuggingFace:

from transformers import pipeline

sentiment = pipeline(
    "sentiment-analysis",
    model="ElKulako/cryptobert",
    device=0,  # GPU
)

def analyze_sentiment(text: str) -> dict:
    result = sentiment(text[:512])[0]  # BERT limited to 512 tokens
    return {
        "label": result["label"],   # Bullish/Bearish/Neutral
        "score": result["score"],
    }

Volume-weighted sentiment — weight sentiment by reach: tweet with 100k impressions weighs more than 100. For Telegram — by message views.

Operational Limitations

Social media monitoring — legally sensitive. ToS of most platforms forbid commercial scraping without official API. Practical limits:

  • Twitter: official API mandatory for any commercial use
  • Telegram: public channel parsing via MTProto — gray area, not explicitly forbidden
  • Discord: only via official Bot API, no web scraping

Development pipeline for two platforms (Twitter + Telegram) with sentiment and storage — 2-3 weeks. Add Discord and custom ML models — another 1-2 weeks.