Promptfoo integration for automated prompt testing
Promptfoo is an open-source CLI and library for testing LLM prompts. It allows you to run the same set of test cases against different models and prompts, automatically compare results, and integrate into CI/CD.
Installation and configuration
npm install -g promptfoo
# или
pip install promptfoo
promptfooconfig.yaml:
providers:
- openai:gpt-4o
- openai:gpt-4o-mini
- anthropic:claude-3-5-sonnet-20241022
prompts:
- "Summarize the following text in 3 sentences: {{text}}"
- "Create a concise 3-sentence summary of: {{text}}"
tests:
- vars:
text: "Long article about machine learning..."
assert:
- type: contains
value: "machine learning"
- type: llm-rubric
value: "The summary is accurate and covers the main points"
- type: javascript
value: "output.split('.').length >= 3" # Минимум 3 предложения
- vars:
text: "Another test document..."
assert:
- type: not-contains
value: "I cannot" # Модель не должна отказываться
- type: rouge-n
value: reference_summary
threshold: 0.5
Running tests
# Запуск evaluation
promptfoo eval
# Просмотр результатов
promptfoo view
# CI mode (exit code 1 при неудаче)
promptfoo eval --ci
Integration into Python code
import promptfoo
results = promptfoo.evaluate({
"prompts": ["Classify sentiment: {{text}}"],
"providers": ["openai:gpt-4o-mini"],
"tests": [
{
"vars": {"text": "Great product!"},
"assert": [{"type": "contains", "value": "positive"}]
}
]
})
print(f"Pass rate: {results.stats.successes}/{results.stats.total}")
Promptfoo allows you to create a regression test suite for all application prompts in 1-2 days and run it in GitHub Actions with each PR, preventing accidental quality degradation.







