Skip to content

Commit de15f21

Browse files
committed
Merge branch 'shwetaoj/add_tensorflow_container_readme' into 'r1.6.1'
Adding Readme to generate cpx launch container from public TF resources See merge request intelai/models!178
2 parents 9ad15d2 + b935b6d commit de15f21

File tree

1 file changed

+119
-0
lines changed

1 file changed

+119
-0
lines changed
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Steps to generate a container with Intel® Optimization for TensorFlow
2+
3+
This guide will help you generate a container with Intel's cpx launch release candidate.
4+
5+
## Steps:
6+
7+
1. Clone intel-tensorflow cpx launch branch:
8+
9+
```
10+
$ git clone https://github.com/Intel-tensorflow/tensorflow.git --branch=bf16/base --single-branch
11+
$ cd tensorflow
12+
$ git checkout 34305e8bf7ad4e7ffb9339b434385e3b28896924
13+
# Run "git log" and check for the right git hash
14+
```
15+
16+
2. Go to the directory that has Intel mkl docker files:
17+
18+
```
19+
$ cd tensorflow/tools/ci_build/linux/mkl/
20+
```
21+
22+
3. Run build-dev-container.sh by passing the following env parameters:
23+
24+
```
25+
$ env ROOT_CONTAINER=tensorflow/tensorflow \
26+
ROOT_CONTAINER_TAG=devel \
27+
TF_DOCKER_BUILD_DEVEL_BRANCH=34305e8bf7ad4e7ffb9339b434385e3b28896924 \
28+
TF_REPO=https://github.com/Intel-tensorflow/tensorflow \
29+
TF_DOCKER_BUILD_VERSION=tensorflow-2.2-bf16-nightly \
30+
BUILD_SKX_CONTAINERS=yes \
31+
BUILD_TF_V2_CONTAINERS=yes \
32+
BUILD_TF_BFLOAT16_CONTAINERS=yes \
33+
BAZEL_VERSION= \
34+
ENABLE_DNNL1=yes \
35+
ENABLE_SECURE_BUILD=yes \
36+
./build-dev-container.sh > ./container_build.log
37+
```
38+
39+
4. Open a second terminal session at the same location and run `tail -f container_build.log` to monitor container build progress
40+
or wait until the build finishes and then open the log file <container_build.log> ...
41+
42+
```
43+
INFO: Build completed successfully, 18811 total actions.
44+
```
45+
46+
Below output indicates that the container has intel-optimized tensorflow:
47+
48+
```
49+
PASS: MKL enabled test in <intermediate container name>
50+
```
51+
52+
5. Check if the image was built successfully and tag it:
53+
54+
```
55+
$ docker images
56+
intel-mkl/tensorflow:tensorflow-2.2-bf16-nightly-avx512-devel-mkl
57+
58+
$ docker tag intel-mkl/tensorflow:tensorflow-2.2-bf16-nightly-avx512-devel-mkl intel/intel-optimized-tensorflow:tensorflow-2.2-bf16-nightly
59+
```
60+
61+
6. Run the image in privileged mode and install OpenMPI, OpenSSH and Horovod:
62+
Example of docker run command:
63+
64+
```
65+
$ docker run --init --privileged -it --env <Proxy setup or anything else> -v <mount dir> --name container_name <imageid> /bin/bash
66+
```
67+
68+
Install Open MPI
69+
70+
```
71+
$ apt-get clean && apt-get update -y
72+
$ apt-get install -y --no-install-recommends --fix-missing openmpi-bin openmpi-common libopenmpi-dev
73+
74+
# Check OpenMPI installation:
75+
$ mpirun --version
76+
# You should see the following message:
77+
mpirun (Open MPI) 2.1.1
78+
```
79+
Install OpenSSH for MPI to communicate between containers
80+
```
81+
82+
$ apt-get install -y --no-install-recommends --fix-missing openssh-client openssh-server libnuma-dev
83+
$ mkdir -p /var/run/sshd
84+
# Allow OpenSSH to talk to containers without asking for confirmation
85+
$ cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new
86+
$ echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new
87+
$ mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config
88+
```
89+
Install Horovod
90+
91+
```
92+
$ HOROVOD_WITH_TENSORFLOW=1
93+
$ python3 -m pip install --no-cache-dir horovod==0.19.1
94+
```
95+
96+
If Horovod installation was successful you will see the following message:
97+
98+
```
99+
Successfully installed cffi-1.14.0 cloudpickle-1.4.1 horovod-0.19.1 psutil-5.7.0 pycparser-2.20 pyyaml-5.3.1
100+
```
101+
102+
Check Horovod installation:
103+
104+
```
105+
$ python -c "import tensorflow as tf; import horovod.tensorflow as hvd;"
106+
```
107+
You should not see an error.
108+
109+
7. Save this image:
110+
111+
```
112+
$ exit
113+
$ docker ps -a
114+
$ docker commit [container ID] intel/intel-optimized-tensorflow:tensorflow-2.2-bf16-nightly
115+
```
116+
117+
Substitute this docker image when using the parameter `--docker-image` in running benchmarks [`launch_benchmark.py`](/benchmarks/launch_benchmark.py).
118+
119+

0 commit comments

Comments
 (0)