Command R (Cohere) Language Model Fine-Tuning

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Command R (Cohere) Language Model Fine-Tuning
Complex
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Fine-Tuning Command R Language Model (Cohere)

Command R and Command R+ are language models from Cohere, specialized for RAG tasks, tool use, and enterprise applications. Cohere provides managed fine-tuning through its own API and platform. Command R's key differentiator from most LLMs is optimization for RAG scenarios out of the box: the model is trained for proper source citation and reduced hallucinations when working with documents.

Command R Family

Model Parameters Context Key Feature
Command R 35B 128K RAG, citation
Command R+ 104B 128K Complex tasks, reasoning
Command R7B 7B 128K Fast, cheap
Command A 256K Latest generation

Cohere also provides open weights of Command R through Hugging Face (CohereForAI/c4ai-command-r-v01), enabling self-hosted fine-tuning.

Fine-Tuning via Cohere API

import cohere

co = cohere.Client(api_key="...")

# Create dataset
dataset = co.datasets.create(
    name="legal-analysis-dataset",
    type="chat-finetune-input",
    data=open("train.jsonl", "rb"),
    eval_data=open("val.jsonl", "rb"),
)

# Run fine-tuning
ft = co.finetuning.create_finetune(
    request=cohere.finetuning.CreateFinetune(
        name="command-r-legal",
        model="command-r-plus",
        settings=cohere.finetuning.Settings(
            base_model=cohere.finetuning.BaseModel(
                base_type=cohere.finetuning.BaseType.BASE_TYPE_CHAT,
                name="command-r-plus-04-2024",
            ),
            dataset_id=dataset.dataset.id,
            train_epochs=5,
            learning_rate=0.001,
        ),
    )
)

Data Format: Chat with Preamble

Command R uses special chat format with system prompt support (preamble), documents for RAG, and dialogue history:

{
  "messages": [
    {
      "role": "System",
      "message": "You are a legal assistant. Always cite specific law articles."
    },
    {
      "role": "User",
      "message": "What is the statute of limitations for property sale contracts?"
    },
    {
      "role": "Chatbot",
      "message": "The statute of limitations for property sale contracts is **3 years** (Article 196 of Civil Code). For void contracts — also 3 years from when person knew or should have known of violation (Article 181)..."
    }
  ]
}

RAG-Specific: Fine-Tuning with Documents

Unique capability of Command R — training with documents in context. This allows fine-tuning the model for specific citation style and detail level when working with corporate documents:

{
  "messages": [...],
  "documents": [
    {
      "title": "Claim Processing Regulations",
      "snippet": "3.4. Claim review period — no more than 30 calendar days..."
    }
  ]
}

With this approach, model learns not just to generate an answer, but properly extract relevant document fragments from provided documents.

Practical Case: Legal Assistant for Corporate Law

Task: assistant for legal department of large company — contract analysis, answers on internal regulations, work with regulatory base.

Dataset: 2800 examples (question + relevant document fragment → answer with source reference). Data from real lawyer requests to knowledge base.

Critical metric: faithfulness — share of answers completely based on provided documents without hallucination.

Results:

  • Faithfulness (RAGAS): 0.71 → 0.93
  • Answer relevancy: 0.78 → 0.91
  • Citation accuracy (references to correct sources): 64% → 89%
  • Hallucination rate: 18% → 4%

Self-Hosted Option via Open Weights

For on-premise deployment of Command R via Hugging Face:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained(
    "CohereForAI/c4ai-command-r-v01",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Command R uses specific tokenizer with Cohere chat template
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")

lora_config = LoraConfig(
    r=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

Managed vs Self-Hosted Comparison

Parameter Cohere API Fine-Tuning Self-Hosted (Open Weights)
Infrastructure Managed Need GPU cluster
Weight control No Yes
On-premise No Yes
RAG citation Native Native (same weights)
Cost at high volume Higher Lower

Timeline

  • Dataset preparation with documents: 3–6 weeks
  • Training (Cohere API): 2–5 days (managed)
  • Training (self-hosted, 35B, QLoRA): 12–36 hours
  • RAG quality testing: 1–2 weeks
  • Total: 6–10 weeks