AI System Development for Public Opinion and Open Data Analysis
Government agencies, analytical centers, and large companies need systematic monitoring of public discourse: what concerns people, how attitudes toward regulation change, what topics gain popularity. AI system aggregates data from open sources and transforms them into actionable analytics.
Data Sources
Social Networks and Forums: VKontakte API, Odnoklassniki API, Telegram (via MTProto or public channel parsing), Reddit, Pikabu. Public groups, comments, posts — without personal data.
Media and News Aggregators: RSS feeds, Yandex.News API, MediaMetrics, Google News API. Over 50,000 sources.
Government Open Data: data.gov.ru, regional open data portals, FTS registries, Rosstat API.
Petition Platforms: Change.org, RCI (Russian Public Initiative) — topics and signature dynamics.
Government Services Reviews: Government Services portal (public ratings), regional portals, Active Citizen platform.
Topic Modeling
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
class PublicOpinionAnalyzer:
def __init__(self):
self.embedder = SentenceTransformer("sentence-transformers/paraphrase-multilingual-mpnet-base-v2")
self.topic_model = BERTopic(
embedding_model=self.embedder,
language="russian",
min_topic_size=50,
nr_topics="auto"
)
def discover_topics(self, texts: list[str], timestamps: list[datetime]) -> TopicAnalysis:
embeddings = self.embedder.encode(texts, batch_size=512)
# Dynamic topic modeling — how topics change over time
topics, probs = self.topic_model.fit_transform(texts, embeddings)
topics_over_time = self.topic_model.topics_over_time(texts, timestamps)
return TopicAnalysis(
topics=self.topic_model.get_topic_info(),
temporal_dynamics=topics_over_time,
trending=self._detect_trending(topics_over_time)
)
def _detect_trending(self, topics_over_time) -> list[TrendingTopic]:
# Topics with growth > 2σ over last 7 days
...
Sentiment by Population Groups
Analysis not only of overall tone, but also differences between groups: youth vs elderly (by audience characteristics), regions, professional communities. Reveals what concerns specific segments, not averaged "audience."
class SegmentedSentiment(BaseModel):
topic: str
segments: dict[str, SentimentScore] # segment → sentiment
overall: SentimentScore
divergence_score: float # how much segments disagree
sample_quotes: dict[str, list[str]] # sample quotes by segment
Public Trust Index
For government agencies key metric is trust dynamics toward specific agency, policy, decision:
- Share of positive mentions in topic context
- Tone change relative to baseline (before decision announcement)
- Comparison with similar agencies / regions
- Correlation with media activity (effect of press releases and official statements)
Manipulation and Bot Detection
Coordinated campaigns, petition manipulation, artificial hype — system detects anomalies:
- Sharp spike in similar messages over short period
- Accounts with bot signs (age, activity, vocabulary)
- Coordinated posting — same texts across channels
- Detected manipulations marked and excluded from analytics
Reporting and Visualization
Weekly automated reports with: top-10 trending topics, sentiment dynamics, comparison with previous period, expert quotes. Interactive dashboard with time series, maps (regional view), word clouds by topic.







