OpenAI Structured Outputs Integration for Response Parsing
Structured Outputs guarantees that the model's response exactly matches the specified JSON schema. Unlike response_format: json_object (which simply requests JSON return), Structured Outputs ensures compliance with a specific schema through constrained decoding — the model physically cannot return invalid JSON.
Basic Integration with Pydantic
from openai import OpenAI
from pydantic import BaseModel
from typing import Literal, Optional
client = OpenAI()
# Schema for data extraction
class Invoice(BaseModel):
vendor_name: str
invoice_number: str
date: str
total_amount: float
currency: str
line_items: list["InvoiceItem"]
vat_amount: Optional[float] = None
class InvoiceItem(BaseModel):
description: str
quantity: float
unit_price: float
total: float
Invoice.model_rebuild() # Required for forward references
# Parsing — guaranteed schema compliance
def extract_invoice(text: str) -> Invoice:
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "Extract invoice data from text"},
{"role": "user", "content": text}
],
response_format=Invoice,
)
return response.choices[0].message.parsed # Directly a Pydantic object
Classification with Enum
from enum import Enum
class TicketCategory(str, Enum):
technical = "technical"
billing = "billing"
feature_request = "feature_request"
complaint = "complaint"
general = "general"
class TicketClassification(BaseModel):
category: TicketCategory
priority: Literal["low", "medium", "high", "critical"]
sentiment: Literal["positive", "neutral", "negative", "angry"]
requires_human: bool
summary: str
tags: list[str]
def classify_ticket(text: str) -> TicketClassification:
response = client.beta.chat.completions.parse(
model="gpt-4o-mini", # Structured Outputs available in mini too
messages=[{"role": "user", "content": f"Classify ticket: {text}"}],
response_format=TicketClassification,
temperature=0,
)
return response.choices[0].message.parsed
Structured Outputs via JSON Schema (without Pydantic)
# For languages without Pydantic or complex schemas
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Product data"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "product_data",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"},
"categories": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "price", "in_stock", "categories"],
"additionalProperties": False,
}
}
}
)
import json
data = json.loads(response.choices[0].message.content)
Structured Outputs Limitations
-
strict: TruerequiresadditionalProperties: Falseat all levels - Not supported: nullable fields via
"type": ["string", "null"](useanyOf) - Maximum nesting depth: 5 levels
- For recursive schemas — use
$ref
When to Use
| Scenario | Method |
|---|---|
| Data extraction from documents | Structured Outputs |
| Classification | Structured Outputs |
| Responses with predictable structure | Structured Outputs |
| Free-form JSON (unknown structure) | json_object mode |
| Simple responses | Plain text |
Timeline
- Basic extraction with Pydantic: 0.5-1 day
- Complex nested schemas: 1-2 days







