Qwen (Alibaba) Language Model Fine-Tuning

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Qwen (Alibaba) Language Model Fine-Tuning
Complex
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Fine-Tuning Qwen Language Models (Alibaba)

Qwen is a family of open-source language models from Alibaba Cloud, released under Apache 2.0 license (base versions) and Tongyi Qianwen License (larger versions). The Qwen2.5 family includes models from 0.5B to 72B parameters, plus specialized versions: Qwen2.5-Coder (programming), Qwen2.5-Math (mathematics), Qwen-VL (multimodal). By MMLU and HumanEval benchmarks, Qwen2.5-72B competes with Llama 3.1 70B.

Qwen2.5 Model Lineup for Fine-Tuning

Model Parameters VRAM (bf16) Feature
Qwen2.5-0.5B 0.5B 1 GB Edge/IoT
Qwen2.5-1.5B 1.5B 3 GB Mobile
Qwen2.5-7B 7B 14 GB Main workhorse
Qwen2.5-14B 14B 28 GB Quality/resource balance
Qwen2.5-32B 32B 64 GB High quality
Qwen2.5-72B 72B 144 GB State-of-the-art open
Qwen2.5-Coder-32B 32B 64 GB Code, SQL, algorithms

Qwen Advantages for Specific Tasks

Multilingual support: Qwen is trained on data with significant Chinese, English, and 27 other languages. Russian is represented much better than in many Western models, important when working with Russian-language corpora.

Long context: Qwen2.5 supports up to 128K tokens context. For fine-tuning tasks with long documents (contracts, research papers, regulations) this is a critical advantage.

Qwen2.5-Coder: specialized version outperforming most open-source models of the same size on HumanEval. When fine-tuned on corporate codebases, provides better starting point than fine-tuning general model.

Fine-Tuning via LLaMA-Factory

LLaMA-Factory is the most convenient tool for Qwen fine-tuning, supporting full spectrum of methods (Full, LoRA, QLoRA, DoRA) with unified config format:

# config.yaml
model_name_or_path: Qwen/Qwen2.5-7B-Instruct
method: lora
dataset: my_dataset
template: qwen
finetuning_type: lora
lora_rank: 16
lora_alpha: 32
lora_target: q_proj,v_proj
output_dir: ./qwen25-7b-finetuned
num_train_epochs: 3
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 2.0e-4
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
llamafactory-cli train config.yaml

Alternatively, use swift from ModelScope (Alibaba):

swift sft \
  --model_type qwen2_5_7b_instruct \
  --dataset my_dataset \
  --train_type lora \
  --output_dir ./output

Data Format: Qwen Chat Template

Qwen2.5 uses specific chat template with <|im_start|> and <|im_end|> tags:

<|im_start|>system
You are an assistant for financial reporting analysis.<|im_end|>
<|im_start|>user
Calculate EBITDA from: revenue 850M, COGS 420M, OpEx 180M, DA 45M<|im_end|>
<|im_start|>assistant
EBITDA = Revenue - COGS - OpEx + DA = 850 - 420 - 180 + 45 = **295M**<|im_end|>

When using transformers directly, apply tokenizer.apply_chat_template() for correct formatting.

Practical Case: Financial Analysis on Qwen2.5-14B

Task: automatic analysis of quarterly company reports (IFRS), extraction of key metrics, calculation of financial ratios, anomaly flags.

Dataset: 1800 examples: reporting data input → structured analysis (JSON + text summary).

Training: Qwen2.5-14B Instruct, QLoRA (r=32, alpha=64), 4 epochs, 2×A100 40GB, 6 hours.

Results:

  • Coefficient calculation correctness: 71% → 94%
  • Anomaly flag accuracy (F1): 0.67 → 0.88
  • Text summary quality (human eval, 1–5): 3.1 → 4.4
  • Tokens per request (avg): unchanged (~1800)

Deploying Fine-Tuned Qwen via vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="./qwen25-14b-merged",
    dtype="bfloat16",
    tensor_parallel_size=2,  # 2 GPU
    max_model_len=32768,
    gpu_memory_utilization=0.9
)

sampling_params = SamplingParams(temperature=0.1, max_tokens=2048)
outputs = llm.generate(prompts, sampling_params)

vLLM provides continuous batching and PagedAttention, which at batch size 16 gives throughput ~240 tok/s on 2×A100.

Timeline

  • Dataset preparation: 2–5 weeks
  • Training (7B, QLoRA): 3–8 hours
  • Training (72B, QLoRA, 4×A100): 24–72 hours
  • Iterations and evaluation: 1–2 weeks
  • Total: 4–8 weeks