Hugging Face Inference API Integration for AI Models
The Hugging Face Inference API provides access to over 100,000 models via a REST API. Two options are available: Serverless Inference API (free, with limitations) and Inference Endpoints (managed deployment on a dedicated GPU with a guaranteed SLA).
Serverless Inference API
import requests
API_URL = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2"
headers = {"Authorization": "Bearer hf_..."}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
# Text generation
output = query({
"inputs": "<s>[INST] Summarize this document: ... [/INST]",
"parameters": {
"max_new_tokens": 512,
"temperature": 0.3,
"return_full_text": False
}
})
Inference Endpoints (dedicated deployment)
from huggingface_hub import InferenceClient
# Подключение к выделенному Inference Endpoint
client = InferenceClient(
model="https://xyz.us-east-1.aws.endpoints.huggingface.cloud",
token="hf_..."
)
# Text generation
response = client.text_generation(
"Explain RLHF in simple terms:",
max_new_tokens=256,
temperature=0.7,
stream=True # Streaming поддерживается
)
for token in response:
print(token, end="", flush=True)
Specialized tasks
# Classification
classifier = InferenceClient(model="cardiffnlp/twitter-roberta-base-sentiment-latest")
result = classifier.text_classification("This product is amazing!")
# [{'label': 'positive', 'score': 0.97}]
# Embeddings
embedder = InferenceClient(model="sentence-transformers/all-MiniLM-L6-v2")
embedding = embedder.feature_extraction("Text to embed")
# numpy array (384,)
# Image classification
vision = InferenceClient(model="google/vit-base-patch16-224")
result = vision.image_classification("path/to/image.jpg")
Choosing Between Serverless and Endpoints
Serverless is suitable for development and low load. Inference Endpoints are for production with latency (no cold start) and throughput requirements. Endpoints support auto-scaling from 0 to N replicas. For sustained loads of >100 requests/hour, Endpoints are more cost-effective than Serverless.







