AI-Based Video Frame Interpolation
Converting 24fps → 60fps or 30fps → 120fps by frame duplication causes stuttering on fast motion. AI frame interpolation synthesizes intermediate frames using optical flow — result is smoother than any mechanical method.
RIFE — Practical Tool
RIFE (Real-Time Intermediate Flow Estimation) is the fastest open-source method. RTX 3080, 1080p: ~30 frames/second at 2x interpolation.
import torch
import numpy as np
import cv2
from pathlib import Path
# Load RIFE model (IFNet)
from model.RIFE_HDv3 import Model
def interpolate_video_rife(
input_path: str,
output_path: str,
multiplier: int = 2, # 2x, 4x, 8x — only powers of two in RIFE
scale: float = 1.0, # scale for optical flow (0.5 with weak GPU)
fp16: bool = True
) -> None:
device = torch.device('cuda')
model = Model()
model.load_model('train_log', -1)
model.eval().device(device)
cap = cv2.VideoCapture(input_path)
fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out_fps = fps * multiplier
writer = cv2.VideoWriter(
output_path,
cv2.VideoWriter_fourcc(*'mp4v'),
out_fps, (w, h)
)
ret, prev_frame = cap.read()
while ret:
ret, curr_frame = cap.read()
if not ret:
break
# Convert to tensors
I0 = torch.from_numpy(prev_frame).permute(2,0,1).float() / 255.0
I1 = torch.from_numpy(curr_frame).permute(2,0,1).float() / 255.0
if fp16:
I0 = I0.half()
I1 = I1.half()
I0 = I0.unsqueeze(0).to(device)
I1 = I1.unsqueeze(0).to(device)
# Pad to multiple of 32
pad_h = (32 - h % 32) % 32
pad_w = (32 - w % 32) % 32
I0 = torch.nn.functional.pad(I0, [0, pad_w, 0, pad_h])
I1 = torch.nn.functional.pad(I1, [0, pad_w, 0, pad_h])
writer.write(prev_frame)
# Synthesize (multiplier-1) intermediate frames
for i in range(1, multiplier):
t = i / multiplier
with torch.no_grad():
middle = model.inference(I0, I1, scale=scale)
mid_np = (middle[0].float().cpu().permute(1,2,0).numpy()
* 255).astype(np.uint8)
writer.write(mid_np[:h, :w])
prev_frame = curr_frame
writer.write(prev_frame)
cap.release()
writer.release()
EMA-VFI for Complex Scenes
RIFE loses quality on scenes with occlusions and nonlinear motion. EMA-VFI (Event-based Motion-Aware VFI) is more accurate but 3–4x slower.
Typical Artifacts and Solutions
Ghosting — semi-transparent double of object. Occurs on fast motions where optical flow fails. Solution: reduce scale or switch to EMA-VFI.
Warping artifacts — distortion of text and sharp edges. RIFE handles text on screens poorly. Solution: mask static regions and don't interpolate them.
Flickering on shot cuts — RIFE doesn't detect scene changes and synthesizes frame between different scenes. Preprocessing required: shot detection via PySceneDetect.
from scenedetect import detect, ContentDetector, AdaptiveDetector
def find_scene_cuts(video_path: str, threshold: float = 27.0) -> list[int]:
"""
Returns frame numbers where scene changes occur.
"""
scenes = detect(video_path, ContentDetector(threshold=threshold))
return [int(scene[0].get_frames()) for scene in scenes]
| Method | Speed (1080p) | Quality | VRAM |
|---|---|---|---|
| RIFE | ~30 FPS (2x) | Very Good | 6–8 GB |
| EMA-VFI | ~8 FPS (2x) | Excellent | 8–10 GB |
| DAIN | ~2 FPS (2x) | Excellent | 11 GB |
| Super-SloMo | ~3 FPS (8x) | Good | 6 GB |
| Task | Timeline |
|---|---|
| Basic frame interpolation (2x-4x) | 1–2 weeks |
| Production pipeline with shot detection | 3–4 weeks |
| 8x interpolation with quality assurance | 6–8 weeks |







