Fine-Tuning Phi Language Models (Microsoft)
Phi is a family of compact language models from Microsoft Research, optimized on the principle that "data quality matters more than parameter count". Phi-3 and Phi-4 show results comparable to models 3–5× larger on reasoning and programming tasks. This makes them attractive for edge deployment, mobile applications, and scenarios with limited compute resources.
Phi Model Lineup
| Model | Parameters | VRAM (fp16) | Key Feature |
|---|---|---|---|
| Phi-3-mini-4k | 3.8B | 7.6 GB | Edge/mobile |
| Phi-3-mini-128k | 3.8B | 7.6 GB | Long context |
| Phi-3-small | 7B | 14 GB | Balance |
| Phi-3-medium | 14B | 28 GB | High quality |
| Phi-4 | 14B | 28 GB | Current flagship |
| Phi-4-mini | 3.8B | 7.6 GB | Compact flagship |
Phi-4 with 14B parameters outperforms Llama 3.1 70B on several math and programming benchmarks — result of high-quality training data (synthetic data, textbooks).
LoRA Fine-Tuning Phi-4 via Transformers + TRL
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
import torch
model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-4",
quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16),
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
trainer = SFTTrainer(
model=model,
args=SFTConfig(
output_dir="./phi4-finetuned",
num_train_epochs=4,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=1e-4,
bf16=True,
max_seq_length=8192,
),
peft_config=LoraConfig(
r=16, lora_alpha=32,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
task_type="CAUSAL_LM"
),
train_dataset=dataset,
)
trainer.train()
Specifics: Fine-Tuning for Edge and Mobile
Phi-3/4-mini (3.8B) is the most popular choice for deployment to mobile apps and browser extensions. After fine-tuning and quantization:
- GGUF Q4_K_M: ~2.2 GB, runs on CPU (MacBook M-series: ~12 tok/s)
- ONNX INT4: used in ONNX Runtime for Windows/Android
- ExecuTorch: deployment to iPhone/Android via PyTorch Mobile
Microsoft provides ONNX versions of Phi through microsoft/Phi-3-mini-4k-instruct-onnx, simplifying integration into .NET and Windows applications.
Practical Case: Offline Assistant for Field Engineers
Task: mobile app for maintenance engineers of industrial equipment. Assistant works offline (no internet at sites), answers questions about maintenance procedures and helps diagnose malfunctions.
Base model: Phi-3-mini-128k-instruct (3.8B, 128K context needed for long technical manuals).
Dataset: 1400 pairs (documentation fragment / engineer question → answer with procedure number and steps).
Result:
- Answer accuracy (compliance with procedures): 58% → 86%
- Hallucination rate (invents non-existent steps): 31% → 8%
- Model after GGUF Q4_K_M: 2.1 GB, 9 tok/s on smartphone CPU (Snapdragon 8 Gen 3)
Timeline
- Dataset preparation: 2–4 weeks
- Training (Phi-4 14B, QLoRA, A100): 4–10 hours
- Quantization and device testing: 3–5 days
- Total: 3–6 weeks







