Setting up NVIDIA CUDA/cuDNN for GPU Computing
CUDA and cuDNN are the foundation of GPU-accelerated computing for ML. Correct installation and compatibility of CUDA/cuDNN/driver/framework versions is a common source of problems when setting up an ML environment.
Compatibility matrix
Before installation, it is important to check the official compatibility table:
| CUDA | cuDNN | PyTorch | TensorFlow | Min Driver |
|---|---|---|---|---|
| 12.1 | 8.9 | 2.1.x | 2.14.x | 530.30 |
| 12.2 | 8.9 | 2.2.x | — | 535.54 |
| 11.8 | 8.7 | 2.0.x | 2.12.x | 520.61 |
Golden rule: driver version must support CUDA version or higher (NVIDIA Driver ≥ required CUDA version).
Installing NVIDIA Driver
# Ubuntu 22.04
# Рекомендуемый способ — через ubuntu-drivers
sudo ubuntu-drivers autoinstall
# Или вручную
sudo apt install nvidia-driver-545
# Проверка
nvidia-smi
# Вывод: Driver Version: 545.xx.xx | CUDA Version: 12.3
Installing the CUDA Toolkit
# Через runfile (рекомендуется для точного контроля версии)
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
sudo sh cuda_12.2.0_535.54.03_linux.run --silent --toolkit --no-drm
# Добавить в ~/.bashrc
export PATH=/usr/local/cuda-12.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
# Проверка
nvcc --version
Installing cuDNN
# Скачать cuDNN с developer.nvidia.com (требует аккаунт)
tar -xzvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
sudo ldconfig
# Через apt (альтернативный способ)
sudo apt install libcudnn8=8.9.7.29-1+cuda12.2 libcudnn8-dev=8.9.7.29-1+cuda12.2
Installing via Conda (the easiest way)
# Conda управляет CUDA/cuDNN автоматически
conda create -n ml python=3.11
conda activate ml
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Проверка
python -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"
Checking the installation for correctness
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"cuDNN version: {torch.backends.cudnn.version()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")
# Тест производительности матричного умножения
x = torch.randn(8192, 8192, device='cuda', dtype=torch.float16)
y = torch.randn(8192, 8192, device='cuda', dtype=torch.float16)
torch.cuda.synchronize()
import time
t = time.time()
z = x @ y
torch.cuda.synchronize()
print(f"MatMul 8192x8192 (FP16): {time.time()-t:.3f}s")
# A100: ~0.05s, RTX 3090: ~0.12s
Common problems
"CUDA error: no kernel image is available" — PyTorch is compiled for a different CUDA version. Reinstall PyTorch for the desired CUDA version.
"libcuda.so not found" — LD_LIBRARY_PATH is not set. Add /usr/local/cuda/lib64 to the config.
Low performance - check nvidia-smi -q|grep "Performance State". GPU in P8 mode instead of P0: run nvidia-smi -pm 1 for Persistence Mode.







