AI Super-Resolution — Image Upscaling
Bicubic interpolation provides 4x upscaling with blurring. AI super-resolution recovers details: skin texture, text on signs, fabric structure. The difference is visible when comparing PSNR: bicubic — 28–30 dB, Real-ESRGAN — 32–36 dB on photographs.
Real-ESRGAN — Practical Standard
import torch
import numpy as np
from PIL import Image
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer
def upscale_image(
image_path: str,
scale: int = 4,
model_name: str = 'RealESRGAN_x4plus', # or 'RealESRGAN_x4plus_anime_6B'
tile_size: int = 512, # for large images — tile-based processing
half_precision: bool = True
) -> np.ndarray:
"""
tile_size=512 with VRAM 6GB, tile_size=0 (whole image) with VRAM 24GB.
half=True — FP16, saves ~50% VRAM.
"""
model = RRDBNet(
num_in_ch=3, num_out_ch=3,
num_feat=64, num_block=23, num_grow_ch=32,
scale=scale
)
upsampler = RealESRGANer(
scale=scale,
model_path=f'weights/{model_name}.pth',
model=model,
tile=tile_size,
tile_pad=10, # tile overlap for smooth seams
pre_pad=0,
half=half_precision,
device='cuda'
)
img = np.array(Image.open(image_path).convert('RGB'))
output, _ = upsampler.enhance(img, outscale=scale)
return output
GFPGAN for Face Restoration
Real-ESRGAN on portraits sometimes creates artifacts on faces. GFPGAN adds face restoration on top of SR:
from gfpgan import GFPGANer
def restore_face_photo(
degraded_image: np.ndarray,
upscale: int = 2,
arch: str = 'clean', # 'clean' | 'RestoreFormer'
channel_multiplier: int = 2,
weight: float = 0.5 # 0=pure GFPGAN, 1=without face enhancement
) -> np.ndarray:
"""
weight=0.5 — compromise between restoration and feature preservation.
At weight=0 faces look "glossy".
"""
restorer = GFPGANer(
model_path='weights/GFPGANv1.4.pth',
upscale=upscale,
arch=arch,
channel_multiplier=channel_multiplier,
bg_upsampler=None # can pass RealESRGANer for background
)
_, _, restored = restorer.enhance(
degraded_image,
has_aligned=False,
only_center_face=False,
paste_back=True,
weight=weight
)
return restored
Metrics and Model Comparison
| Model | PSNR (Set5 4x) | SSIM | Speed 1080p→4K | Application |
|---|---|---|---|---|
| Bicubic | 28.42 | 0.810 | Instant | Baseline |
| SRCNN | 30.48 | 0.862 | Fast | Outdated |
| ESRGAN | 32.73 | 0.901 | ~2s RTX3080 | Photos |
| Real-ESRGAN x4+ | 33.98 | 0.918 | ~3s RTX3080 | Photos, text |
| SwinIR-L | 34.97 | 0.932 | ~8s RTX3080 | Maximum quality |
| GFPGAN v1.4 | — | — | ~4s RTX3080 | Portraits |
PSNR is not the only criterion: human perception correlates with LPIPS (perceptual loss). Real-ESRGAN, despite lower PSNR than SwinIR, often looks better subjectively due to higher-frequency details.
Batch Processing of Large Volumes
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
class ImageDataset(Dataset):
def __init__(self, image_paths: list[str], size: int = 256):
self.paths = image_paths
self.transform = transforms.Compose([
transforms.Resize((size, size)),
transforms.ToTensor()
])
def __len__(self): return len(self.paths)
def __getitem__(self, idx):
img = Image.open(self.paths[idx]).convert('RGB')
return self.transform(img), self.paths[idx]
def batch_upscale_pipeline(
input_dir: str,
output_dir: str,
batch_size: int = 4, # with VRAM 12GB and tile_size=0
scale: int = 4
):
paths = list(Path(input_dir).glob('*.{jpg,jpeg,png}'))
Path(output_dir).mkdir(exist_ok=True)
# For batch inference use direct forward pass
# (RealESRGANer does not support batches, requires direct model call)
model = RRDBNet(
num_in_ch=3, num_out_ch=3,
num_feat=64, num_block=23, num_grow_ch=32, scale=scale
)
model.load_state_dict(
torch.load(f'weights/RealESRGAN_x4plus.pth')['params_ema']
)
model.eval().cuda().half()
for path in paths:
with torch.no_grad(), torch.cuda.amp.autocast():
img_t = transforms.ToTensor()(
Image.open(path).convert('RGB')
).unsqueeze(0).half().cuda()
out = model(img_t).squeeze(0).float().cpu()
out_img = transforms.ToPILImage()(out.clamp(0, 1))
out_img.save(
Path(output_dir) / (Path(path).stem + '_4x.png')
)
Limitations and Common Issues
- Texture hallucinations — Real-ESRGAN can add non-existent text on signs. Unacceptable in forensics applications
-
OOM on large images — 12-megapixel photo at 4x upscale = 192Mp, doesn't fit in memory. Solution:
tile_size=512withtile_pad=10 - JPEG artifacts — JPEG blockiness is amplified by SR. Preprocessing: JPEG-aware denoising (nf_denoise from BasicSR)
Timelines
| Task | Timeline |
|---|---|
| SR API service (Real-ESRGAN) | 1–2 weeks |
| Fine-tuning on specific domain | 4–6 weeks |
| Custom SR model from scratch | 10–16 weeks |







