Setting Up a GPU Server for AI Development: CUDA, PyTorch, TensorFlow
Setting up a GPU server for AI development takes 2-4 hours of work, saving days of "it works in the cloud, but it doesn't work locally" issues. Key components: the correct NVIDIA driver + CUDA + cuDNN versions, isolated Python environments, and GPU monitoring tools.
Minimum stack
# 1. NVIDIA Driver (Ubuntu 22.04)
sudo apt install -y nvidia-driver-545
sudo reboot
# 2. CUDA 12.2
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
sudo apt install -y cuda-toolkit-12-2
# 3. Добавить в ~/.bashrc
echo 'export PATH=/usr/local/cuda-12.2/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
# 4. Проверка
nvidia-smi && nvcc --version
Conda environments for different frameworks
# PyTorch
conda create -n pytorch python=3.11 -y
conda activate pytorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# TensorFlow
conda create -n tensorflow python=3.11 -y
conda activate tensorflow
pip install tensorflow[and-cuda]==2.15.0
# Проверка GPU доступности
python -c "import torch; print('PyTorch GPU:', torch.cuda.get_device_name(0))"
python -c "import tensorflow as tf; print('TF GPUs:', tf.config.list_physical_devices('GPU'))"
GPU monitoring
# Установка nvtop — htop для GPU
sudo apt install nvtop
nvtop # Интерактивный мониторинг
# gpustat — компактный вывод
pip install gpustat
gpustat --watch # Обновление каждую секунду
watch -n 1 nvidia-smi # Классический вариант
Performance optimization
Persistence mode - eliminates delays when first accessing the GPU:
sudo nvidia-smi -pm 1
# Добавить в /etc/rc.local для автозапуска
sudo nvidia-smi --auto-boost-default=0 — disable boosting for deterministic benchmark results. For maximum performance, sudo nvidia-smi -ac 1215,1410 (optimal frequencies for the A100).







