Together AI Integration for Open LLM Inference

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Together AI Integration for Open LLM Inference
Simple
~1 business day
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Together AI Integration for Running Open LLMs

Together AI provides cloud inference for 200+ open models: Llama 3.1, Mistral, Qwen, DeepSeek, Yi and others. OpenAI-compatible API allows migrating existing code without rewriting. Key advantages: ability to run any open-source model without own GPU infrastructure, fine-tuning your own models.

Basic Integration

from openai import OpenAI, AsyncOpenAI

# Together uses OpenAI SDK
client = OpenAI(
    api_key="TOGETHER_API_KEY",
    base_url="https://api.together.xyz/v1",
)

# Model selection
MODELS = {
    "quality": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
    "balanced": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    "fast": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    "code": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "reasoning": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
}

response = client.chat.completions.create(
    model=MODELS["balanced"],
    messages=[{"role": "user", "content": "Task"}],
    temperature=0.1,
    max_tokens=2048,
)
print(response.choices[0].message.content)

Fine-tuning Your Own Models

# Together allows fine-tuning open models on your own data
import together

together.api_key = "TOGETHER_API_KEY"

# Upload dataset (JSONL format: {"prompt": "...", "completion": "..."})
file_response = together.Files.upload(file="training_data.jsonl")
file_id = file_response["id"]

# Start fine-tuning
ft_response = together.Finetune.create(
    training_file=file_id,
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
    n_epochs=3,
    batch_size=16,
    learning_rate=1e-5,
    suffix="my-custom-model",
)
ft_job_id = ft_response["id"]

# Check status
status = together.Finetune.retrieve(ft_job_id)
print(status["status"])  # "running" | "completed" | "failed"

Embeddings

response = client.embeddings.create(
    model="BAAI/bge-large-en-v1.5",  # One of the best for search
    input=["First text", "Second text"],
)
embeddings = [item.embedding for item in response.data]

Model Comparison on Together AI

Model Quality Speed (tokens/s) Cost (1M)
Llama 3.1 405B Excellent ~50 $3.50
Llama 3.1 70B Very Good ~150 $0.88
Llama 3.1 8B Good ~400 $0.18
Qwen2.5-Coder 32B Code-specific ~120 $0.80

Timeline

  • Basic integration: 0.5 day
  • Fine-tuning pipeline: 3–5 days (+ training time)
  • A/B testing models: 1–2 days