Phi (Microsoft) Language Model Fine-Tuning

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Phi (Microsoft) Language Model Fine-Tuning
Medium
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Fine-Tuning Phi Language Models (Microsoft)

Phi is a family of compact language models from Microsoft Research, optimized on the principle that "data quality matters more than parameter count". Phi-3 and Phi-4 show results comparable to models 3–5× larger on reasoning and programming tasks. This makes them attractive for edge deployment, mobile applications, and scenarios with limited compute resources.

Phi Model Lineup

Model Parameters VRAM (fp16) Key Feature
Phi-3-mini-4k 3.8B 7.6 GB Edge/mobile
Phi-3-mini-128k 3.8B 7.6 GB Long context
Phi-3-small 7B 14 GB Balance
Phi-3-medium 14B 28 GB High quality
Phi-4 14B 28 GB Current flagship
Phi-4-mini 3.8B 7.6 GB Compact flagship

Phi-4 with 14B parameters outperforms Llama 3.1 70B on several math and programming benchmarks — result of high-quality training data (synthetic data, textbooks).

LoRA Fine-Tuning Phi-4 via Transformers + TRL

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
import torch

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-4",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16),
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(
        output_dir="./phi4-finetuned",
        num_train_epochs=4,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=1e-4,
        bf16=True,
        max_seq_length=8192,
    ),
    peft_config=LoraConfig(
        r=16, lora_alpha=32,
        target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
        task_type="CAUSAL_LM"
    ),
    train_dataset=dataset,
)
trainer.train()

Specifics: Fine-Tuning for Edge and Mobile

Phi-3/4-mini (3.8B) is the most popular choice for deployment to mobile apps and browser extensions. After fine-tuning and quantization:

  • GGUF Q4_K_M: ~2.2 GB, runs on CPU (MacBook M-series: ~12 tok/s)
  • ONNX INT4: used in ONNX Runtime for Windows/Android
  • ExecuTorch: deployment to iPhone/Android via PyTorch Mobile

Microsoft provides ONNX versions of Phi through microsoft/Phi-3-mini-4k-instruct-onnx, simplifying integration into .NET and Windows applications.

Practical Case: Offline Assistant for Field Engineers

Task: mobile app for maintenance engineers of industrial equipment. Assistant works offline (no internet at sites), answers questions about maintenance procedures and helps diagnose malfunctions.

Base model: Phi-3-mini-128k-instruct (3.8B, 128K context needed for long technical manuals).

Dataset: 1400 pairs (documentation fragment / engineer question → answer with procedure number and steps).

Result:

  • Answer accuracy (compliance with procedures): 58% → 86%
  • Hallucination rate (invents non-existent steps): 31% → 8%
  • Model after GGUF Q4_K_M: 2.1 GB, 9 tok/s on smartphone CPU (Snapdragon 8 Gen 3)

Timeline

  • Dataset preparation: 2–4 weeks
  • Training (Phi-4 14B, QLoRA, A100): 4–10 hours
  • Quantization and device testing: 3–5 days
  • Total: 3–6 weeks