Promptfoo Integration for Automated Prompt Testing

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Promptfoo Integration for Automated Prompt Testing
Simple
from 4 hours to 2 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Promptfoo integration for automated prompt testing

Promptfoo is an open-source CLI and library for testing LLM prompts. It allows you to run the same set of test cases against different models and prompts, automatically compare results, and integrate into CI/CD.

Installation and configuration

npm install -g promptfoo
# или
pip install promptfoo

promptfooconfig.yaml:

providers:
  - openai:gpt-4o
  - openai:gpt-4o-mini
  - anthropic:claude-3-5-sonnet-20241022

prompts:
  - "Summarize the following text in 3 sentences: {{text}}"
  - "Create a concise 3-sentence summary of: {{text}}"

tests:
  - vars:
      text: "Long article about machine learning..."
    assert:
      - type: contains
        value: "machine learning"
      - type: llm-rubric
        value: "The summary is accurate and covers the main points"
      - type: javascript
        value: "output.split('.').length >= 3"  # Минимум 3 предложения

  - vars:
      text: "Another test document..."
    assert:
      - type: not-contains
        value: "I cannot"  # Модель не должна отказываться
      - type: rouge-n
        value: reference_summary
        threshold: 0.5

Running tests

# Запуск evaluation
promptfoo eval

# Просмотр результатов
promptfoo view

# CI mode (exit code 1 при неудаче)
promptfoo eval --ci

Integration into Python code

import promptfoo

results = promptfoo.evaluate({
    "prompts": ["Classify sentiment: {{text}}"],
    "providers": ["openai:gpt-4o-mini"],
    "tests": [
        {
            "vars": {"text": "Great product!"},
            "assert": [{"type": "contains", "value": "positive"}]
        }
    ]
})

print(f"Pass rate: {results.stats.successes}/{results.stats.total}")

Promptfoo allows you to create a regression test suite for all application prompts in 1-2 days and run it in GitHub Actions with each PR, preventing accidental quality degradation.