新手刚接触docker,用来训练深度学习模型
docker run --name digits -d -p 8080:5000 -v /path/to/this/repository:/data/repo kaixhin/digits但在digits中调用的是CPU过程太慢,于是寻找到了nvidia-docker,开始了奇怪的旅程~~~
nvidia-docker很奇怪的问题,安装完后利用 docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi 验证安装结果,显示为
===============================================================
bc@bc-pc:~$ nvidia-smi
Tue Apr 30 10:21:30 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 0% 43C P0 36W / 166W | 148MiB / 8116MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1473 G /usr/lib/xorg/Xorg 146MiB |
+-----------------------------------------------------------------------------+
Tue Apr 30 02:21:43 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 0% 44C P0 35W / 166W | 148MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
================================================================
但是用nvidia-docker去训练模型时
nvidia-docker run --name digits -d -p 8080:5000 -v /path/to/this/repository:/data/repo kaixhin/digits结果还是显示的用CPU完成的,不知道是我的nvidia-docker的用法不对么?我直接在原来的语句前加了nvidia-。
还有,我在用systemctl start nvidia-docker启动nvidia-docker的服务时,提示如图2错误,这个路径根本不存在。
=============================================================
bc@bc-pc:~$ sudo systemctl start nvidia-docker
Failed to start nvidia-docker.service: Unit nvidia-docker.service not found.
bc@bc-pc:~$ sudo systemctl status nvidia-docker
● nvidia-docker.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)
=============================================================
docker和nvidia-docker的版本信息。
==============================================================
bc@bc-pc:~$ sudo docker version
[sudo] password for bc:
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:44:24 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
bc@bc-pc:~$ sudo nvidia-docker version
NVIDIA Docker: 2.0.3
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:44:24 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
回贴