Skip to content

README should include an example of a build command with explicit component paths #206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mikaelhg opened this issue Feb 3, 2021 · 5 comments · May be fixed by #235
Open

README should include an example of a build command with explicit component paths #206

mikaelhg opened this issue Feb 3, 2021 · 5 comments · May be fixed by #235

Comments

@mikaelhg
Copy link

mikaelhg commented Feb 3, 2021

System information

Describe the documentation issue

The README.md file should provide a build example for an arbitrary environment, which contains all of the required components, but not necessarily configured as the default choices. This is often the case with ML development workstations, which have multiple versions of CUDA, cuDNN, gcc, and friends, since different projects require the developer to use different versions for reproducibility reasons.

This example uses Bazelisk to manage multiple Bazel versions.

TF_CUDA_COMPUTE_CAPABILITIES will require #195 to be merged.

TF_CUDA_COMPUTE_CAPABILITIES=3.5,3.7,6.1,7.0,7.5 \
GCC_HOST_COMPILER_PATH=/usr/bin/gcc-8 \
CC=/usr/bin/gcc-8 \
CXX=/usr/bin/g++-8 \
TF_CUDA_PATHS=/usr/local/cuda-10.2.89,/usr/local/cudnn-10.2-7.6.5.32 \
CUDA_TOOLKIT_PATH=/usr/local/cuda-10.2.89 \
CUDNN_INSTALL_PATH=/usr/local/cudnn-10.2-7.6.5.32 \
USE_BAZEL_VERSION=3.1.0 \
TMP=/tmp \
mvn install -Dmaven.test.skip=true -Djavacpp.platform.extension=-gpu

We welcome contributions by users. Will you be able to update submit a PR (use the doc style guide) to fix the doc Issue?

No.

@rnett
Copy link
Contributor

rnett commented Feb 3, 2021

Since 99% of the native build is just building tensorflow, configuration should be handled in the same way (i.e. the configure script, or anything else). This is mentioned in #195, and clarified in the latest commit. Is there any reason that wouldn't work for you?

@mikaelhg
Copy link
Author

It's not that it's completely impossible to find out the practical information required to create a custom build, it's just that it would be incredibly easy to make finding this information convenient.

As a result, hundreds or thousands of individual users wouldn't need to spend time searching for and learning this trivia, which they'll probably never need again, just to accomplish the task of building the binaries for compute capabilities other than 3.5 and 7.0.

@rnett
Copy link
Contributor

rnett commented Feb 11, 2021

You don't have to go hunting for any information though, you just clone tensorflow and run the configuration script. The reason we (and tensorflow) delegate to that is any hard-coded paths we provide will be wrong more often than they are right, so the script tries its best to autodetect them. Plus a lot of the configuration has been moved to the bazelrc files in the latest tensorflow release, so there should be even less required configuration.

What exactly is so hard to find?

@karllessard
Copy link
Collaborator

It is possible though to simply use the TF archive that Bazel downloads and unzips just before building TF Java, instead of cloning the repo.

I'm wondering if it would work to run the configure script from that unzipped archive during the Maven build directly, e.g. by passing a parameter like -Dnative.build.custom=true. I never tried running a blocking goal that requires user input in Maven though, is that possible?

Or we simply add a new script in the Java repo that invoke Bazel to download the archive and run configure on it, bypassing our default .bazelrc config.

@rnett
Copy link
Contributor

rnett commented Feb 13, 2021

Or we simply add a new script in the Java repo that invoke Bazel to download the archive and run configure on it, bypassing our default .bazelrc config.

I like this option much more. It would just be our own version of the configure script.

@rnett rnett linked a pull request Mar 9, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants