AI Automated Journalism System

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
AI Automated Journalism System
Medium
~2-4 weeks
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Development of AI System for Automated Journalism

Automated journalism is news text generation from structured data: financial reports, sports statistics, election data, weather summaries. The technology works where there is data and a clear narrative template. AP and Reuters publish tens of thousands of such stories annually.

Where Automation is Justified

Financial Reporting: quarterly company results — data from EDGAR/stock exchanges → text with key metrics, dynamics, comparison with forecasts. One template covers thousands of companies.

Sports Statistics: match results, game statistics — standard narrative with variation on key moments.

Registry Summaries: property registry data on deals, traffic accident data, bankruptcy registries — automatic summaries with anomalies.

Weather Summaries and Alerts: weather forecast into readable text with emphasis on hazardous phenomena.

Data-to-Text System Architecture

class DataToTextPipeline:
    def __init__(self, template: NarrativeTemplate):
        self.template = template
        self.data_analyzer = DataAnalyzer()
        self.text_generator = TextGenerator()

    def generate(self, data: dict) -> GeneratedArticle:
        # 1. Data analysis: key fact extraction
        key_facts = self.data_analyzer.extract_key_facts(data, self.template.fact_rules)

        # 2. Determine article "angle"
        angle = self.data_analyzer.determine_angle(key_facts, self.template.angle_rules)

        # 3. Generate text by narrative template
        text = self.text_generator.generate(
            facts=key_facts,
            angle=angle,
            template=self.template,
            style_guide=self.template.style_guide
        )

        # 4. Post-processing: fact checking, number formatting
        text = self.postprocess(text, data)

        return GeneratedArticle(
            headline=self.generate_headline(key_facts, angle),
            body=text,
            data_sources=data.get("sources", []),
            generated_at=datetime.utcnow(),
            template_version=self.template.version
        )

    def postprocess(self, text: str, data: dict) -> str:
        # Verification: every number in text must match source data
        return FactChecker(data).verify_and_fix(text)

Narrative Templates

A template defines narrative logic, not specific text. For financial reporting:

class EarningsReportTemplate(NarrativeTemplate):
    fact_rules = [
        FactRule("revenue", comparisons=["yoy", "qoq", "consensus"]),
        FactRule("net_income", comparisons=["yoy", "consensus"]),
        FactRule("eps", comparisons=["consensus", "guidance"]),
        FactRule("guidance_next_quarter", type="forward_looking"),
    ]

    angle_rules = [
        AngleRule(condition="revenue_beat > 5%", angle="strong_beat"),
        AngleRule(condition="revenue_miss > 5%", angle="disappointment"),
        AngleRule(condition="guidance_raised", angle="optimism"),
        AngleRule(condition="guidance_lowered", angle="caution"),
    ]

Variability and Anti-Boilerplate

One problem with automated journalism is text uniformity. Several techniques:

  • Synonym variation: multiple versions of each key phrase, random selection
  • Sentence structure variation: reordering of facts depending on "angle"
  • Contextual enrichment: adding context (industry trends, company history) from knowledge base
  • LLM rewriting: final pass through LLM for style variety while preserving facts

Fact Verification

Critical: every numerical claim in text must be traceable to source data. Automatic verification:

def verify_facts(article_text: str, source_data: dict) -> VerificationResult:
    # Extract all numerical claims from text
    claims = extract_numerical_claims(article_text)

    errors = []
    for claim in claims:
        # Find corresponding value in source data
        source_value = find_in_data(source_data, claim.entity, claim.metric)
        if source_value is None:
            errors.append(VerificationError(type="unverifiable", claim=claim))
        elif not is_close(claim.value, source_value, tolerance=0.01):
            errors.append(VerificationError(
                type="mismatch",
                claim=claim,
                expected=source_value
            ))

    return VerificationResult(is_valid=len(errors) == 0, errors=errors)

Metadata and Transparency

All automatically generated content is marked: "Automatically generated based on data [source]". Reader can access source data. This is AP Automation standard and media ethics requirement.

Performance

One system instance on GPU A100: ~500 stories per hour at average 300 words. For a news agency this means complete coverage of financial reporting for all exchange companies on results publication day.