Fine-Tuning Command R Language Model (Cohere)
Command R and Command R+ are language models from Cohere, specialized for RAG tasks, tool use, and enterprise applications. Cohere provides managed fine-tuning through its own API and platform. Command R's key differentiator from most LLMs is optimization for RAG scenarios out of the box: the model is trained for proper source citation and reduced hallucinations when working with documents.
Command R Family
| Model | Parameters | Context | Key Feature |
|---|---|---|---|
| Command R | 35B | 128K | RAG, citation |
| Command R+ | 104B | 128K | Complex tasks, reasoning |
| Command R7B | 7B | 128K | Fast, cheap |
| Command A | — | 256K | Latest generation |
Cohere also provides open weights of Command R through Hugging Face (CohereForAI/c4ai-command-r-v01), enabling self-hosted fine-tuning.
Fine-Tuning via Cohere API
import cohere
co = cohere.Client(api_key="...")
# Create dataset
dataset = co.datasets.create(
name="legal-analysis-dataset",
type="chat-finetune-input",
data=open("train.jsonl", "rb"),
eval_data=open("val.jsonl", "rb"),
)
# Run fine-tuning
ft = co.finetuning.create_finetune(
request=cohere.finetuning.CreateFinetune(
name="command-r-legal",
model="command-r-plus",
settings=cohere.finetuning.Settings(
base_model=cohere.finetuning.BaseModel(
base_type=cohere.finetuning.BaseType.BASE_TYPE_CHAT,
name="command-r-plus-04-2024",
),
dataset_id=dataset.dataset.id,
train_epochs=5,
learning_rate=0.001,
),
)
)
Data Format: Chat with Preamble
Command R uses special chat format with system prompt support (preamble), documents for RAG, and dialogue history:
{
"messages": [
{
"role": "System",
"message": "You are a legal assistant. Always cite specific law articles."
},
{
"role": "User",
"message": "What is the statute of limitations for property sale contracts?"
},
{
"role": "Chatbot",
"message": "The statute of limitations for property sale contracts is **3 years** (Article 196 of Civil Code). For void contracts — also 3 years from when person knew or should have known of violation (Article 181)..."
}
]
}
RAG-Specific: Fine-Tuning with Documents
Unique capability of Command R — training with documents in context. This allows fine-tuning the model for specific citation style and detail level when working with corporate documents:
{
"messages": [...],
"documents": [
{
"title": "Claim Processing Regulations",
"snippet": "3.4. Claim review period — no more than 30 calendar days..."
}
]
}
With this approach, model learns not just to generate an answer, but properly extract relevant document fragments from provided documents.
Practical Case: Legal Assistant for Corporate Law
Task: assistant for legal department of large company — contract analysis, answers on internal regulations, work with regulatory base.
Dataset: 2800 examples (question + relevant document fragment → answer with source reference). Data from real lawyer requests to knowledge base.
Critical metric: faithfulness — share of answers completely based on provided documents without hallucination.
Results:
- Faithfulness (RAGAS): 0.71 → 0.93
- Answer relevancy: 0.78 → 0.91
- Citation accuracy (references to correct sources): 64% → 89%
- Hallucination rate: 18% → 4%
Self-Hosted Option via Open Weights
For on-premise deployment of Command R via Hugging Face:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
model = AutoModelForCausalLM.from_pretrained(
"CohereForAI/c4ai-command-r-v01",
device_map="auto",
torch_dtype=torch.bfloat16,
)
# Command R uses specific tokenizer with Cohere chat template
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")
lora_config = LoraConfig(
r=16,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
Managed vs Self-Hosted Comparison
| Parameter | Cohere API Fine-Tuning | Self-Hosted (Open Weights) |
|---|---|---|
| Infrastructure | Managed | Need GPU cluster |
| Weight control | No | Yes |
| On-premise | No | Yes |
| RAG citation | Native | Native (same weights) |
| Cost at high volume | Higher | Lower |
Timeline
- Dataset preparation with documents: 3–6 weeks
- Training (Cohere API): 2–5 days (managed)
- Training (self-hosted, 35B, QLoRA): 12–36 hours
- RAG quality testing: 1–2 weeks
- Total: 6–10 weeks







