0%

Ubuntu20.04 CUDA 开发环境配置

环境描述

注意:不要轻易重启

  1. ubuntu20.04
  2. nvidia gtx 1050 mobile
  3. 16G MEM
  4. cuda 11
  5. ffmpeg 4.3.1 https://github.com/FFmpeg/FFmpeg/commit/6b6b9e593dd4d3aaf75f48d40a13ef03bdef9fdb

Cuda 安装

  • https://developer.nvidia.com/cuda-downloads
  • 图像界面可以使用 NVIDIA X Server Setting 修改独显集显,命令行下可以使用 prime-select 来切换
    • Usage:/usr/bin/prime-select nvidia|intel|on-demand|query

deb (local)

按照官方的来即可
下面给出 cuda11的安装步骤

1
2
3
4
5
6
7
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu2004-11-0-local_11.0.3-450.51.06-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-0-local_11.0.3-450.51.06-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-0-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

deb (Network Install)

考虑网络环境问题,使用 aliyun 的源: https://developer.aliyun.com/mirror/nvidia-cuda

  • 下面给出官方源的使用方法
1
2
3
4
5
6
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
  • Aliyun
1
2
3
4
5
6
wget https://developer.aliyun.com/mirror/nvidia-cuda/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.aliyun.com/mirror/nvidia-cuda/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.aliyun.com/mirror/nvidia-cuda/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
  • 如果 sudo apt-get update 出现问题的话,修改源 (vim /etc/apt/sources.list) 为如下即可:deb https://mirrors.aliyun.com/nvidia-cuda/ubuntu2004/x86_64/ /. 即去掉该条源focal 即可.

cuDNN

cuDNN: https://developer.nvidia.com/rdp/cudnn-download

  1. 下载与 cuda 和 系统相对应的版本
  2. 对于 ubuntu20.04 nvidia 目前 (2020-8-17) 还未提供对应版本 (cuda 11) 的 cuDNN, 可以选择下载 ubuntu18.04cuda 11cuDNN 使用
  3. 安装:sudo dpkg -i *.deb

NVCC

See: https://www.jetbrains.com/help/clion/cuda-projects.html#set-nvcc

  1. 默认安装路径在 /usr/local/cuda/
  2. 添加到 PATH 即可: sudo vim /etc/environment/usr/local/cuda/bin 添加进 PATH 即可
  3. source /etc/environment
  4. nvcc -V
1
2
3
4
5
6
❯ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

  1. Clion 测试: File -> New Project -> CUDA Executable -> Create
  2. 默认会创建一个输出 hello worldmain.cu 文件
  3. 运行即可输出 hello world

测试

  1. 使用 nvidia-smi 来验证 cuda 的安装结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
❯ nvidia-smi
Mon Aug 17 22:16:59 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 On | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 N/A / N/A | 719MiB / 4042MiB | 7% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1209 G /usr/lib/xorg/Xorg 224MiB |
| 0 N/A N/A 3264 G /usr/bin/kwin_x11 80MiB |
| 0 N/A N/A 3284 G /usr/bin/plasmashell 71MiB |
| 0 N/A N/A 4653 G ...AAAAAAAAA= --shared-files 218MiB |
| 0 N/A N/A 64873 G ...token=3860532832627472219 119MiB |
+-----------------------------------------------------------------------------+

  1. 使用软件包管理器来查看目前使用的附加驱动,应当使用的是 专有驱动 或者 手动安装 的驱动
  2. reboot

nvidia-smi 说明

See: https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf
See: nvidia-smi --help
See: man nvidia-smi
See: https://blog.csdn.net/C_chuxin/article/details/82993350

说明

nvidia-smi是nvidia 的系统管理界面 ,其中smi是System management interface的缩写,它可以收集各种级别的信息,查看显存使用情况。此外, 可以启用和禁用 GPU 配置选项 (如 ECC 内存功能)。

alt

  • GPU:本机中的GPU编号
  • Name:GPU 类型
  • Persistence-M:
  • Fan:风扇转速
  • Temp:温度,单位摄氏度
  • Perf:表征性能状态,从P0到P12,P0表示最大性能,P12表示状态最小性能
  • Pwr:Usage/Cap:能耗表示
  • Bus-Id:涉及GPU总线的相关信息;
  • Disp.A:Display Active,表示GPU的显示是否初始化
  • Memory-Usage:显存使用率
  • Volatile GPU-Util:浮动的GPU利用率
  • Uncorr. ECC:关于ECC的东西
  • Compute M.:计算模式
  • Processes 显示每块GPU上每个进程所使用的显存情况。

常用命令

命令描述
nvidia-smi -L列出所有可用的 NVIDIA 设备
nvidia-smi topo --matrix查看系统拓扑
nvidia-smi -q -d CLOCK查看当前的 GPU 时钟速度、默认时钟速度和最大可能的时钟速度
nvidia-smi -q -d SUPPORTED_CLOCKS显示每个 GPU 的可用时钟速度列表
nvidia-smi vgpu查看当前vGPU的状态信息
nvidia-smi vgpu -p循环显示虚拟桌面中应用程序对GPU资源的占用情况
nvidia-smi -q查看当前所有GPU的信息,也可以通过参数i指定具体的GPU

Ffmpeg 安装

一些说明

  1. ERROR: opencv not found using pkg-config

opencv4 放弃了对于 pkg-config 的支持,但是可以通过自己编译生成 opencv.pc 文件
See: https://github.com/opencv/opencv/issues/13154

1
2
❯ pkg-config --list-all | grep opencv
opencv4 OpenCV - Open Source Computer Vision Library
  1. “ERROR: failed checking for nvcc”

This error happens because since the latest CUDA SDK (11.0) only accepts compute capability 5.2 - 8.0 (Maxwell, Pascal, Volta, Turing, Ampere). Compute_30
See: https://trac.ffmpeg.org/ticket/8790

修复方法: cd /path/to/ffmpeg && vim configure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 第 4313 到 4320 行,替换为以下内容
#if enabled cuda_nvcc; then
# nvcc_default="nvcc"
# nvccflags_default="-gencode arch=compute_30,code=sm_30 -O2"
#else
# nvcc_default="clang"
# nvccflags_default="--cuda-gpu-arch=sm_30 -O2"
# NVCC_C=""
#fi
if enabled cuda_nvcc; then
nvcc_default="nvcc"
nvccflags_default="-gencode arch=compute_52,code=sm_52 -O2"
else
nvcc_default="clang"
nvccflags_default="--cuda-gpu-arch=sm_52 -O2"
NVCC_C=""
fi

基本要求

  1. cuda 已正确安装
  2. nvcc 已添加到 PATH

安装依赖

使用 apt 安装的 opencv

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sudo apt-get update && sudo apt-get upgrade -y
sudo apt install libavcodec-dev \
libavformat-dev libavutil-dev \
libavfilter-dev libavdevice-dev \
libswresample-dev libswscale-dev \
libchromaprint-dev \
libchromaprint-tools \
libavcodec-dev libavformat-dev libavutil-dev \
libchromaprint1 \
libchromaprint-dev \
pkg-config libopencv-dev \
libxml2-dev \
libopenal-dev \
libomxil-bellagio-dev \
yasm libc6 libc6-dev \
wget curl unzip cmake libtool

编译

1
2
3
4
5
6
7
8
9
#安装 *Video Codec SDK*
git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers && sudo make install && cd –
# ffmpeg
cd /path/to/ffmpeg;
./configure --enable-cuda-nvcc --enable-cuvid --enable-nvenc --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --extra-cflags=-DOPENCV_GENERATE_PKGCONFIG=ON --enable-libx264 --enable-libx265 --enable-gpl --enable-libmp3lame --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-frei0r --enable-libx264 --enable-shared
# 第一次编译可能会出现一点问题,多尝试几次即可
core=$(cat /proc/cpuinfo| grep "processor"| wc -l)
sudo make -j $(expr $core \* 2) && sudo make install

验证

1
2
3
4
5
6
7
8
9
ffmpeg -version
ffprobe -version
ffplay -version
ffmpeg -y -vsync 0 -hwaccel cuvid \
-c:v h264_cuvid -i \
input.mp4 \
-c:a copy -c:v h264_nvenc \
-b:v 5M -f mp4 \
output.mp4

NVIDIA Docker

See: https://github.com/NVIDIA/nvidia-docker

基本要求

  1. cuda 已正确安装
  2. docker 19.03+ 已安装
  • Docker 推荐配置文件
    • sudo vim /lib/systemd/system/docker.service
    • 删除 -H 及其参数即可,比如修改 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sockExecStart=/usr/bin/dockerd --containerd=/run/containerd/containerd.sock
    • sudo systemctl daemon-reload && sudo systemctl restart docker.service

公网服务器推荐配置
See: https://gist.github.com/kekru/974e40bb1cd4b947a53cca5ba4b0bbe5
See: https://github.com/kekru/linux-utils/blob/master/cert-generate/create-certs.sh

1
2
3
4
5
6
7
8
9
10
11
12
{
"hosts": [
"unix:///var/run/docker.sock",
"tcp://0.0.0.0:2376"
],
"tls": true,
"tlscacert": "/etc/docker/ssl/ca.pem",
"tlscert": "/etc/docker/ssl/server-cert.pem",
"tlskey": "/etc/docker/ssl/server-key.pem",
"tlsverify": true,
"iptables": false
}

本地推荐配置

1
2
3
4
5
6
7
8
9
10
{
"hosts": [
"unix:///var/run/docker.sock",
"tcp://127.0.0.1:2375"
],
"registry-mirrors": [
"https://dockerhub.azk8s.cn"
],
"iptables": false
}

镜像 nvidia-docker

See: https://nvidia.github.io/nvidia-docker/
See: https://github.com/NVIDIA/nvidia-docker/issues/635#issuecomment-365160098
See: https://github.com/NVIDIA/nvidia-docker/issues/706#issuecomment-382578153

Clone 下来以后可以使用 nginx 之类的 web server 对外提供访问
See: WebServer 基本配置文件

1
2
3
4
5
6
7
8
9
10
server{
listen 80;
server_name _;
root /var/www/html/; #指定目录所在路径
autoindex on; #开启目录浏览
autoindex_format html; #以html风格将目录展示在浏览器中
autoindex_exact_size off; #切换为 off 后,以可读的方式显示文件大小,单位为 KB、MB 或者 GB
autoindex_localtime on; #以服务器的文件时间作为显示的时间
charset utf-8,gbk; #展示中文文件名
}
1
2
3
4
mkdir -p $LOCALDIR && cd $LOCALDIR
git clone -b gh-pages https://github.com/NVIDIA/libnvidia-container.git
git clone -b gh-pages https://github.com/NVIDIA/nvidia-container-runtime.git
git clone -b gh-pages https://github.com/NVIDIA/nvidia-docker.git

安装 NVIDA Docker

1
2
3
4
5
6
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L $LOCALDIR/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L $LOCALDIR/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

测试

  1. docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
❯ docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
Tue Aug 18 02:30:00 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 On | 00000000:01:00.0 Off | N/A |
| N/A 52C P0 N/A / N/A | 588MiB / 4042MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:11.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:11.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:11.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:11.0-base nvidia-smi

结语

  • 到目前为止 cuda 开发环境基本是 ok 的了
  • 还请留意最后更新时间,可能会年久失修
  • Enjoy yourself
-------------本文结束再接再厉-------------

本文标题:Ubuntu20.04 CUDA 开发环境配置

文章作者:IITII

发布时间:2020年08月17日 - 21:08

最后更新:2020年08月18日 - 10:08

原始链接:https://iitii.github.io/2020/08/17/1/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。