Tensorflow – Re-build Tensorflow with desired optimization flags

tensorflow

and thanks in advance for your consideration,

I just installed tensorflow (on a new machine with Ubuntu 16.04 and CUDA 8.0 already installed) using the following procedure:

Initially, I used --copt=-march=native.
I received the message

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

So in an attempt to fix this, I searched for solutions and used the answer to the following
How to compile Tensorflow with SSE4.2 and AVX instructions?

by using the above procedure from nVidia, starting from

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

but am still receiving the same messages as above. I feel as though I'm making a very simple error, any ideas?

Thanks!

Best Answer

By following the NVIDIA instructions you are resetting the TensorFlow repository to an older commit, before the SIMD instructions optimization was made available (1.0r):

git reset --hard 70de76e

This commit dates back to a previous release when this feature was not yet implemented, so it is actually working as it's supposed to.

The solution is to follow the official TensorFlow documentation.

For future situations, it is always recommended to use official resources before reaching out for third party solutions, as more they may be helpful, the official ones are more reliable and better maintained.

Notice during configure you're not prompted which CPU instructions you want to build TF with as it should due to the reason above, and therefore, you're unable to build with them.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Follow the official docs accordingly and it will work. If you have any follow up questions feel free to ask or if you face any problems open an issue on Github :)

Related Solutions

Tensorflow – How to compile Tensorflow with SSE4.2 and AVX instructions

I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

without getting any warning or errors.

Probably the best choice for any system is:

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

(Update: the build scripts may be eating -march=native, possibly because it contains an =.)

-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)

I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.

What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)

x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.

-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.

TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.

-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.

This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)

Tensorflow – Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

What is this warning about?

Modern CPUs provide a lot of low-level instructions, besides the usual arithmetic and logic, known as extensions, e.g. SSE2, SSE4, AVX, etc. From the Wikipedia:

Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

In particular, AVX introduces fused multiply-accumulate (FMA) operations, which speed up linear algebra computation, namely dot-product, matrix multiply, convolution, etc. Almost every machine-learning training involves a great deal of these operations, hence will be faster on a CPU that supports AVX and FMA (up to 300%). The warning states that your CPU does support AVX (hooray!).

I'd like to stress here: it's all about CPU only.

Why isn't it used then?

Because tensorflow default distribution is built without CPU extensions, such as SSE4.1, SSE4.2, AVX, AVX2, FMA, etc. The default builds (ones from pip install tensorflow) are intended to be compatible with as many CPUs as possible. Another argument is that even with these extensions CPU is a lot slower than a GPU, and it's expected for medium- and large-scale machine-learning training to be performed on a GPU.

What should you do?

If you have a GPU, you shouldn't care about AVX support, because most expensive ops will be dispatched on a GPU device (unless explicitly set not to). In this case, you can simply ignore this warning by

# Just disables the warning, doesn't take advantage of AVX/FMA to run faster
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

... or by setting export TF_CPP_MIN_LOG_LEVEL=2 if you're on Unix. Tensorflow is working fine anyway, but you won't see these annoying warnings.

If you don't have a GPU and want to utilize CPU as much as possible, you should build tensorflow from the source optimized for your CPU with AVX, AVX2, and FMA enabled if your CPU supports them. It's been discussed in this question and also this GitHub issue. Tensorflow uses an ad-hoc build system called bazel and building it is not that trivial, but is certainly doable. After this, not only will the warning disappear, tensorflow performance should also improve.