最近开始学习机器学习,所以需要配一台电脑。本文主要写的是系统环境配置的内容,依据前人经验总结自己的安装过程,希望可以给大家一个参考。

主机配置:i7-6700 + 24G内存 + GTX 1080

系统环境配置:

  • Ubuntu 16.04 LTS 64位
  • CUDA 8.0
  • cuDNN v5.1
  • TensorFlow v0.12.0 RC1
  • Python 2.7
  • Bazel 0.4.2

在整个环境配置过程中,有许多东西可以提前下载好,在配置时便可以节省时间了。

  • CUDA 8.0 (1.4GB):Linux > x86_64 > Ubuntu > 16.04 > runfile(local)
  • cuDNN v5.1 (100MB): 需要注册Nvidia开发者账号,Download cuDNN v5.1 (August 10, 2016), for CUDA 8.0 > cuDNN v5.1 Library for Linux。最好在Linux系统下下载,格式为.tgz。在Windows下下载的格式会识别成.solitairetheme8格式。
  • TensorFlow 源码release版 (10MB+):下载v0.12.0 RC1,zip或者tar.gz均可
  • TensorFlow pip安装包 (CPU版40MB+,GPU版80MB+):选择Linux和Python2的版本,CPU和GPU的都下。pip安装包只会下载最新版本
  • Bazel 源码 (100MB+):下载0.4.2版本,选择bazel-0.4.2-installer-linux-x86_64.sh

一、安装Nvidia显卡驱动(GTX 1080)

1、安装完Ubuntu16.04系统后,第一次进入系统分辨率很低,所以先简单修改一下分辨率。

打开terminal,执行:sudo gedit /etc/default/grub

有一行内容是 #GRUB_GFXMODE=640x480,然后把#号去掉,后面的640x480改为1024x768。

然后执行:sudo update-grub

重启电脑后,看起来比刚才舒服一些了

2、更新软件源,这里用的是中科大的源

cd /etc/apt/

sudo cp sources.list sources.list.bak

sudo gedit sources.list

把下面的内容添加到sources.list文件头部:

1
2
3
4
5
6
7
8
9
10
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse

然后更新源和更新已安装的包

sudo apt-get update

sudo apt-get upgrade

3、安装Nvidia显卡驱动

先添加Ubuntu社区建立的Graphics Drivers PPA

sudo add-apt-repository ppa:graphics-drivers/ppa

出现一系列内容后,回车后继续

sudo apt-get update

此时,有两种方式可以安装驱动

1)第一种是进入System Settings > Software&Updates > Additional Drivers,然后选择想安装的驱动,然后点apply,等待系统提醒重启系统即可

2)第二种是看一下软件源中有哪些Nvidia驱动

sudo apt-cache search nvidia

我选择安装最新的驱动版本375

sudo apt-get install nvidia-375

重启电脑后驱动生效

可以执行:nvidia-smi来查看信息,或者执行:nvidia-settings查看更详细的信息

二、安装CUDA8.0

1、先去Nvidia开发者网站下载CUDA8.0

下载好后,执行:sudo sh cuda_8.0.44_linux.run进行安装。如果出现了提示空间不足,那么执行:sudo sh cuda_8.0.27_linux.run --tmpdir=/opt/temp/

执行后会有一系列提示需要确认,其中,询问是否安装Nvidia显卡驱动的时候选n,因为我们之前已经装过了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/gai ]:

Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

Installing the CUDA Samples in /home/gai ...
Copying samples to /home/gai/NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/gai, but missing recommended libraries

Please make sure that
- PATH includes /usr/local/cuda-8.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_6100.log

可以发现系统提示缺少一些推荐安装的库:libGLU.so、libX11.so、libXi.so、libXmu.so

使用Ubuntu Packages Search进行搜索,在Search the contents of packages处分别键入以上四个库,可以发现需要安装以下软件包:libglu1-mesa-dev、libx11-dev、libxi-dev、libxmu-dev

所以执行:sudo apt-get install libglu1-mesa-dev libx11-dev libxi-dev libxmu-dev

再参考下官方的安装指南,发现还要配置环境变量,在home目录下执行:sudo gedit .bashrc,然后在文件末尾添加上下面两行内容

1
2
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

然后执行:source ~/.bashrc更新一下

2、测试CUDA

测试几个官方CUDA的例子

在CUDA例子的1_Utilities/deviceQuery目录下执行:make

1
2
3
4
5
6
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery deviceQuery.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release

编译完成,然后执行:./deviceQuery,会得到如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8110 MBytes (8504279040 bytes)
(20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1810 MHz (1.81 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS

再测试另外一个例子,在5_Simulations/nbody目录下执行make,如果提示cannot find -lglut,那么需要执行下:sudo apt-get install freeglut3-dev

编译完成后执行:./nbody -benchmark -numbodies=256000 -device=0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 1080
> Compute 6.1 CUDA device: [GeForce GTX 1080]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 2395.682 ms
= 273.559 billion interactions per second
= 5471.177 single-precision GFLOP/s at 20 flops per interaction

至此CUDA8.0安装完成。

三、安装cuDNN v5.1

先去Nvidia开发者网站下载cuDNN,这个需要注册账号后才能下载。

选择Download cuDNN v5.1 (August 10, 2016), for CUDA 8.0,然后点cuDNN v5.1 Library for Linux,会下载一个.tgz的文件

在文件所在目录下执行:tar -zxvf cudnn-8.0-linux-x64-v5.0-ga.tgz

虽然官方没有提供安装说明,但是google下就能查到。执行下列命令

1
2
3
4
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

这样cuDNN v5.1就安装完成了。

四、安装TensorFlow

TensorFlow有自己的官方安装文档,提供了多种安装方式,我使用过其中的两种:pip安装和源码编译安装。简单说下两种安装方式的特点,pip安装过程便捷,几句命令就搞定。源码编译安装过程复杂,并且容易遇到各种问题。所以一般我们都会选择pip安装方式。

安装过程还有些小插曲,第一次安装的时候先尝试了使用pip安装,结果安装完后TensorFlow不识别GPU,捣鼓了很久并且google各种资料安装过程都没有问题,遂放弃pip转为源码编译安装成功了。过了几天重做了系统,又要安装TensorFlow,这次抱着试一试的心态使用pip安装,结果成功了= =然而并不知道为什么,所以第二次也就没用源码安装了。

本机安装环境为Python2.7,安装的TensorFlow版本为 v0.12.0 RC1

1、pip安装

首先安装pip

sudo apt-get install python-pip python-dev

如果使用的是Python3则换成pip3

安装TensorFlow

sudo pip install tensorflow

安装TensorFlow的GPU版本

sudo pip install tensorflow-gpu

OK,完成!是不是超简单= =哈哈,下面说下注意事项

如果网速不好的话可以在GitHub上先下载好TensorFlow的安装包,然后进行本地安装,当然还是会联网下一些依赖包,但是都比较小
在安装包目录下执行

sudo pip install tensorflow-0.12.0rc1-cp27-none-linux_x86_64.whl

sudo pip install tensorflow_gpu-0.12.0rc1-cp27-none-linux_x86_64.whl

install后面那部分就是下载的安装包的文件名

2、源码编译安装

首先先安装相关依赖包

sudo apt-get install python-pip

sudo apt-get install python-numpy swig python-dev python-wheel

在本地编译TensorFlow源码的话需要使用Google开源的一个构建工具——Bazel,也有官方安装文档

首先需要下载Bazel,推荐去GitHub上下载,下载对应版本的安装包installer-linux-x86_64.sh。之前下载的是0.3.0版本的,结果在配置TensorFlow的时候出了一堆问题,后来换成了0.4.2版本问题解决。所以推荐下载最新版本的

在Bazel安装包所在目录下执行
sudo chmod +x bazel-0.4.2-installer-linux-x86_64.sh

sudo ./bazel-0.4.2-installer-linux-x86_64.sh --user

需要注意的是Bazel需要Java环境,如果没有的话需要安装,直接使用apt-get安装即可

sudo apt-get update

sudo apt-get install default-jre

sudo apt-get install default-jdk

安装完成后再执行sudo ./bazel-0.4.2-installer-linux-x86_64.sh --user即可

然后在 ~/.bashrc中追加:

1
2
source /home/gai/.bazel/bin/bazel-complete.bash
export PATH=$PATH:/home/gai/.bazel/bin

需要注意的是,把gai换成自己系统的用户名

至于为什么要追加这个内容,我在Bazel安装文档中找到以下内容

Bazel comes with a bash completion script. To install it:

  • Build it with Bazel: bazel build //scripts:bazel-complete.bash.
  • Copy the script bazel-bin/scripts/bazel-complete.bash to your completion folder (/etc/bash_completion.d directory under Ubuntu). If you don’t have a completion folder, you can copy it wherever suits you and simply insert source /path/to/bazel-complete.bash in your ~/.bashrc file (under OS X, put it in your ~/.bash_profile file).

我在使用源码编译安装的时候直接在~/.bashrc追加了那两行内容,没有做Bazel安装文档中步骤。等下次使用源码编译安装的时候再尝试下

在~/.bashrc追加完后执行source ~/.bashrc更新一下

至此Bazel安装完成,下一步开始编译TensorFlow

首先先下载TensorFlow源码,可以使用git 命令从GitHub上克隆下来:git clone https://github.com/tensorflow/tensorflow ,这是最新版,也可以自己去GitHub上下载release版,推荐后者

下载完成后进入TensorFlow主目录,执行:

./configure

开始配置TensorFlow,接下来会有一系列问题需要确认,我印象中会询问是否需要支持Google Cloud、Hadoop、OpenCL、CUDA等,其中CUDA是我们需要的,所以我除了CUDA选了yes以外其他的都选了no

如果配置成功的话会出现下列信息,可能不完全一样,但至少会有Configuration finished

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.....
____Loading package: tensorflow/contrib/util
____Loading package: tensorflow/tools/test
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 97,938 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 451,148 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 802,540 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 1,317,340 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,055,608 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,247,228 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,328,350 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,457,050 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 2,585,750 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,518,110 bytes
____Downloading http://bazel-mirror.storage.googleapis.com/github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 3,711,160 bytes
INFO: All external dependencies fetched successfully.
Configuration finished

然后使用Bazel来编译TensorFlow,在TensorFlow源码目录下执行以下命令

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer

编译需要一段时间,配置为i7-6700+24G内存大约耗时20分钟

编译完成后会显示以下信息

1
2
3
Target //tensorflow/cc:tutorials_example_trainer up-to-date:
bazel-bin/tensorflow/cc/tutorials_example_trainer
INFO: Elapsed time: 1196.829s, Critical Path: 986.68s

执行一下TensorFlow官方提供的例子,看看能否成功调用GPU

bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

如果看到successfully opened CUDA library、Creating TensorFlow device (/gpu:0)、显卡信息以及下面的运算过程,那说明成功调用了GPU

下面将TensorFlow源码编译成pip安装包供Python使用

编译CPU版本

bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

如果要编译GPU版本的,不用执行上一句,只需执行以下命令

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

然后执行

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.0rc1-cp27-cp27mu-linux_x86_64.whl

安装完成~

3、测试TensorFlow

下面测试下TensorFlow是否安装成功,并且是否能调用GPU

首先先配置环境变量,在home目录下执行

sudo gedit .bash_profile

然后在里面添加下面两行内容

1
2
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

然后执行:source ~/.bash_profile更新一下

执行下列代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ python
...
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8095
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.47GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
>>> print(sess.run(hello))
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print(sess.run(a + b))
42
>>>

首先import tensorflow没出错并且能输出Hello, TensorFlow!42,这表明TensorFlow是可以使用的

然后看到successfully opened CUDA libraryCreating TensorFlow device (/gpu:0)以及显卡信息,这表明是能够调用GPU

此外如何使用GPU以及是否使用了GPU可以参考这篇内容

还可以测试一个TensorFlow的neural net model

执行:python /usr/local/lib/python2.7/dist-packages/tensorflow/models/image/mnist/convolutional.py

第一次执行会下载一些东西然后开始执行。我的电脑执行的时间大约是40s

4、注意事项

  • 如果采用pip安装TensorFlow并要启用GPU支持的话,tensorflow和tensorflow-gpu都要安装,并且顺序不可以错。如果卸载了tensorflow,保留了tensorflow-gpu,此时TensorFlow是不好用的,并且在有tensorflow-gpu的情况下再装tensorflow,这时候TensorFlow是无法识别GPU的。所以如果要卸载那么把tensorflow和tensorflow-gpu都卸载,然后再按照先tensorflow后tensorflow-gpu的顺序安装。
  • 如果采用源码编译安装TensorFlow并要启用GPU支持的话,直接编译GPU版本的pip安装包然后安装即可。
  • 采用源码编译安装TensorFlow,在进行TensorFlow配置时,如果系统中没有安装OpenCL而在Do you wish to build TensorFlow with OpenCL support? [y/N]时又选择了y,那么会出现”Invalid SYCL 1.2 library path. /usr/local/computecpp/lib/libComputeCpp.so cannot be found “这个错误。
  • 当初使用Bazel 0.3.0版本,进行TensorFlow配置时出现的错误如下。解决办法是使用了Bazel最新版0.4.2(在Bazel0.3.0时还使用过该命令tensorflow$ git pull –recurse-submodules,然后后来换成的0.4.2,不知是否有影响 )。这个错误当初参考了以下信息https://github.com/tensorflow/tensorflow/issues/4365https://github.com/tensorflow/tensorflow/issues/5357https://github.com/tensorflow/tensorflow/issues/4319
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:17:3: //external:eigen_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:17:3: //external:eigen_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:28:3: //external:libxsmm_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:28:3: //external:libxsmm_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:44:3: //external:com_googlesource_code_re2: no such attribute 'urls' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:44:3: //external:com_googlesource_code_re2: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:54:3: //external:gemmlowp: no such attribute 'urls' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:54:3: //external:gemmlowp: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:64:3: //external:farmhash_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:64:3: //external:farmhash_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:80:3: //external:highwayhash: no such attribute 'urls' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:80:3: //external:highwayhash: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:90:3: //external:nasm: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:90:3: //external:nasm: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:101:3: //external:jpeg: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:101:3: //external:jpeg: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:112:3: //external:png_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:112:3: //external:png_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:123:3: //external:gif_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:123:3: //external:gif_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:135:3: //external:six_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:135:3: //external:six_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:151:3: //external:protobuf: no such attribute 'urls' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:151:3: //external:protobuf: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:161:3: //external:gmock_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:161:3: //external:gmock_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:187:3: //external:pcre: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:187:3: //external:pcre: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:198:3: //external:swig: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:198:3: //external:swig: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:222:3: //external:grpc: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:222:3: //external:grpc: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:245:3: //external:linenoise: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:245:3: //external:linenoise: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:258:3: //external:llvm: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:258:3: //external:llvm: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:269:3: //external:jsoncpp_git: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:269:3: //external:jsoncpp_git: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:285:3: //external:boringssl: no such attribute 'urls' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:285:3: //external:boringssl: missing value for mandatory attribute 'url' in 'http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:295:3: //external:nanopb_git: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:295:3: //external:nanopb_git: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:311:3: //external:zlib_archive: no such attribute 'urls' in 'new_http_archive' rule.
ERROR: /home/gai/tensorflow/tensorflow/workspace.bzl:311:3: //external:zlib_archive: missing value for mandatory attribute 'url' in 'new_http_archive' rule.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': error loading package 'external': Could not load //external package.
ERROR: missing fetch expression. Type 'bazel help fetch' for syntax and help.
  • 使用GPU执行TensorFlow时可能会出现以下内容,对实际运行没看出来有什么影响
1
2
E tensorflow/core/framework/op_kernel.cc:925] OpKernel ('op: "NegTrain" device_type: "CPU"') for unknown op: NegTrain
E tensorflow/core/framework/op_kernel.cc:925] OpKernel ('op: "Skipgram" device_type: "CPU"') for unknown op: Skipgram

参考:

深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0

深度学习主机环境配置: Ubuntu16.04+GeForce GTX 1080+TensorFlow

ubuntu14.04 安装 tensorflow

Install GPU TensorFlow From Sources w/ Ubuntu 16.04 and Cuda 8.0

安裝 tensorflow 教學 GPU:Nvidia1070(from source))

课外小知识

为什么使用Bazel来编译TensorFlow,他们是什么关系?

Bazel是一个构建工具(构建工具:依据文件之间的依赖关系来决定文件编译的顺序),类似于Linux下的make命令。Bazel的出现是Google为了解决自己的问题:Google所有的源代码都在一个源代码仓库,而Google是一个跨国公司,世界各地的程序员都需要下载代码然后编译,所以在项目的构建过程中,性能问题是最关键的需求。

Bazel相对于其他的构建工具相比,更加强调结构化和速度。

在Bazel中文件编译顺序的自由度很高。构建过程可以用这样的二分图来表示。在一次构建过程中并发度越高(即二分图宽度越宽),执行时间就越短,如果机器足够多,那么一次构建的时间就主要由二分图的高度来决定。所以便可以在许多机器上执行分布式并发构建。

通过这个二分图可以看出,Bazel还具有增量构建的特点。当只有小部分源代码更改的时候,只需构建相应的部分即可。

构建行为具有函数式特点,即输入是相同的则输出也是相同的。依据这个特点,来缓存和复用构建结果。

Bazel和TensorFlow师出同门都来自于Google,在Google内部编译都使用Bazel,所以TensorFlow的编译过程自然也使用Bazel啦。

想进一步了解Bazel的同学可以看看以下内容:

Google开发Bazel的背景知识

http://google-engtools.blogspot.co.uk/2011/06/build-in-cloud-accessing-source-code.html

http://google-engtools.blogspot.tw/2011/08/build-in-cloud-how-build-system-works.html

http://google-engtools.blogspot.jp/2011/09/build-in-cloud-distributing-build-steps.html

http://google-engtools.blogspot.tw/2011/10/build-in-cloud-distributing-build.html