Humanloop Integration for Prompt Management and LLM Assessment
Humanloop is a platform for managing LLM applications: prompt versioning, A/B testing, human feedback collection, and automated evaluation. It differs from PromptLayer in its deeper integration with the evaluation pipeline.
Installation and configuration
pip install humanloop
from humanloop import Humanloop
hl = Humanloop(api_key="hl_...")
# Вызов через Humanloop с трекингом
response = hl.chat(
project="customer-support",
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful customer support agent."},
{"role": "user", "content": user_message}
],
inputs={"customer_name": customer_name}, # Переменные промпта
)
# Логирование обратной связи
hl.log(
project="customer-support",
data_id=response.data_id,
feedback=[{
"type": "rating",
"value": "positive" # или "negative"
}]
)
A/B testing of prompts
# Определение эксперимента
experiment = hl.experiments.create(
project="customer-support",
name="prompt-ab-test-v3",
config=[
{
"model": "gpt-4o",
"template": "{{system_prompt_v1}}",
"traffic_split": 50
},
{
"model": "gpt-4o",
"template": "{{system_prompt_v2}}",
"traffic_split": 50
}
]
)
# Запрос автоматически роутится в одну из групп
response = hl.chat(
project="customer-support",
experiment_id=experiment.id,
messages=[{"role": "user", "content": user_message}]
)
Evaluation pipeline
Humanloop supports both human evaluation (via UI) and automated evaluation (LLM-as-judge):
evaluator = hl.evaluators.create(
name="response-quality",
type="llm",
spec={
"model": "gpt-4o",
"prompt": """Rate the following customer support response on a scale 1-5.
Response: {{output}}
Customer query: {{inputs.query}}
Return only a number 1-5.""",
"return_type": "number"
}
)
Humanloop is well suited for teams that need a full cycle: from versioning prompts to structured user feedback collection and automated quality assessment.







