Promptfoo Integration for Automated Prompt Testing

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Offered services

Showing 1 of 1 servicesAll 1566 services

Simple

from 4 hours to 2 business days

FAQ

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1212
Development of a web application for FEEDME
1161
Website development for BELFINGROUP
852
Development of an online store for the company FURNORO
1041
B2B Advance company logo design
561
Development of a web application for Enviok
822

Show more works

Promptfoo integration for automated prompt testing

Promptfoo is an open-source CLI and library for testing LLM prompts. It allows you to run the same set of test cases against different models and prompts, automatically compare results, and integrate into CI/CD.

Installation and configuration

npm install -g promptfoo
# или
pip install promptfoo

promptfooconfig.yaml:

providers:
  - openai:gpt-4o
  - openai:gpt-4o-mini
  - anthropic:claude-3-5-sonnet-20241022

prompts:
  - "Summarize the following text in 3 sentences: {{text}}"
  - "Create a concise 3-sentence summary of: {{text}}"

tests:
  - vars:
      text: "Long article about machine learning..."
    assert:
      - type: contains
        value: "machine learning"
      - type: llm-rubric
        value: "The summary is accurate and covers the main points"
      - type: javascript
        value: "output.split('.').length >= 3"  # Минимум 3 предложения

  - vars:
      text: "Another test document..."
    assert:
      - type: not-contains
        value: "I cannot"  # Модель не должна отказываться
      - type: rouge-n
        value: reference_summary
        threshold: 0.5

Running tests

# Запуск evaluation
promptfoo eval

# Просмотр результатов
promptfoo view

# CI mode (exit code 1 при неудаче)
promptfoo eval --ci

Integration into Python code

import promptfoo

results = promptfoo.evaluate({
    "prompts": ["Classify sentiment: {{text}}"],
    "providers": ["openai:gpt-4o-mini"],
    "tests": [
        {
            "vars": {"text": "Great product!"},
            "assert": [{"type": "contains", "value": "positive"}]
        }
    ]
})

print(f"Pass rate: {results.stats.successes}/{results.stats.total}")

Promptfoo allows you to create a regression test suite for all application prompts in 1-2 days and run it in GitHub Actions with each PR, preventing accidental quality degradation.