Skip to content

Commit a22d08e

Browse files
Add HPC docs page. (#278)
* Add HPC docs page with placeholder titles. * Consolidate linker information from other pages into one location. * Typo fixes courtesy of @jwallwork23 Co-authored-by: Joe Wallwork <[email protected]> * Add info on linking and compiling files as suggested by @TomMelt. --------- Co-authored-by: Joe Wallwork <[email protected]>
1 parent 78914a9 commit a22d08e

File tree

3 files changed

+161
-35
lines changed

3 files changed

+161
-35
lines changed

pages/cmake.md

+3-16
Original file line numberDiff line numberDiff line change
@@ -130,25 +130,12 @@ when running CMake.
130130

131131
## Building other projects with make
132132

133-
To build a project with make you need to include the FTorch library when compiling
133+
To build a project with `make` you need to include the FTorch library when compiling
134134
and link the executable against it.
135135

136-
To compile with make add the following compiler flag when compiling files that
137-
use ftorch:
138-
```
139-
FCFLAGS += -I<path/to/install/location>/include/ftorch
140-
```
136+
For full details of the flags to set and the linking process see the
137+
[HPC build pages](page/hpc.html).
141138

142-
When compiling the final executable add the following link flag:
143-
```
144-
LDFLAGS += -L<path/to/install/location>/lib64 -lftorch
145-
```
146-
147-
You may also need to add the location of the `.so` files to your `LD_LIBRARY_PATH`
148-
unless installing in a default location:
149-
```
150-
export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:<path/to/installation>/lib64
151-
```
152139

153140
## Conda Support
154141

pages/examples.md

+4-19
Original file line numberDiff line numberDiff line change
@@ -116,26 +116,11 @@ and using the `-DCMAKE_PREFIX_PATH=</path/to/install/location>` flag when runnin
116116
> then you should use the same path for `</path/to/install/location>`._
117117
118118
##### Make
119-
To build with make we need to include the library when compiling and link the executable
120-
against it.
119+
To build with `make` we need to _include_ the library and _link_ the
120+
executable against it when compiling.
121121

122-
To compile with make we need add the following compiler flag when compiling files that
123-
use FTorch:
124-
```
125-
FCFLAGS += -I<path/to/install/location>/include/ftorch
126-
```
127-
128-
When compiling the final executable add the following link flag:
129-
```
130-
LDFLAGS += -L<path/to/install/location>/lib -lftorch
131-
```
132-
133-
You may also need to add the location of the `.so` files to your `LD_LIBRARY_PATH`
134-
unless installing in a default location:
135-
```
136-
export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:<path/to/install/location>/lib
137-
```
138-
> Note: _Depending on your system and architecture `lib` may be `lib64` or something similar._
122+
For full details of the flags to set and the linking process see the
123+
[HPC build pages](page/hpc.html/#building-projects-and-linking-to-ftorch).
139124

140125
### Running on GPUs
141126

pages/hpc.md

+154
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
title: Guidance for use in High Performance Computing (HPC)
2+
3+
[TOC]
4+
5+
A common application of FTorch (indeed, the driving one for development) is the
6+
coupling of machine learning components to models running on HPC systems.
7+
8+
Here we provide some guidance/hints to help with deployment in these settings.
9+
10+
## Installation
11+
12+
### Building for basic use
13+
14+
The basic installation procedure is the same as described in the
15+
[main documentation](pages/cmake.html) and README, cloning from
16+
[GitHub](https://github.com/Cambridge-ICCS/FTorch) and building using CMake.
17+
18+
### Obtaining LibTorch
19+
20+
For use on a HPC system we advise linking to an installation of LibTorch rather than
21+
installing full PyTorch.
22+
This will reduce the dependencies and remove any requirement of Python.
23+
LibTorch can be obtained from the
24+
[PyTorch website](https://pytorch.org/get-started/locally/).
25+
The assumption here is that any Python/PyTorch development is done elsewhere with a
26+
model being saved to TorchScript for use by FTorch.
27+
28+
Once you have successfully tested and deployed FTorch in your code we recommend speaking
29+
to your administrator/software stack manager to make your chosen version of libtorch
30+
loadable as a `module`.
31+
This will improve reproducibility and simplify the process for future users on your
32+
system.
33+
See the [information below](#libtorch-as-a-module) for further details.
34+
35+
### Environment management
36+
37+
It is important that FTorch is built using the same environment and compilers as the
38+
software to which it will be linked.
39+
40+
Therefore before starting the build you should ensure that you match the environment to
41+
that which your code will be built with.
42+
This will usually be done by using the same `module` commands as you would use to build
43+
the model:
44+
```sh
45+
module purge
46+
module load ...
47+
```
48+
49+
Alternatively you may be provided with a shell script that runs these commands and sets
50+
environment variables etc. that can be sourced:
51+
```sh
52+
source model_environment.sh
53+
```
54+
55+
Complex models with custom build systems may obfuscate this process, and you might need
56+
to probe the build system/scripts for this information.
57+
If in doubt speak to the maintainer of the software for your system, or the manager of
58+
the software stack on the machine.
59+
60+
Because of the need to match compilers it is strongly recommended to specify the
61+
`CMAKE_Fortran_COMPILER`, `CMAKE_C_COMPILER`, and `CMAKE_CXX_COMPILER` when building
62+
with CMake to enforce this.
63+
64+
### Building Projects and Linking to FTorch
65+
66+
Whilst we describe how to link to FTorch using CMake to build a project on our main
67+
page, many HPC models do not use CMake and rely on `make` or more elaborate build
68+
systems.
69+
To build a project with `make` or similar you need to _include_ the FTorch's
70+
header (`.h`) and module (`.mod`) files and _link_ the executable
71+
to the Ftorch library (e.g., `.so`, `.dll`, `.dylib` depending on your system) when
72+
compiling.
73+
74+
To compile with make add the following compiler flag when compiling files that
75+
use ftorch to _include_ the library:
76+
```sh
77+
-I<path/to/FTorch/install/location>/include/ftorch
78+
```
79+
This is often done by appending to an `FCFLAGS` compiler flags variable or similar:
80+
```sh
81+
FCFLAGS += -I<path/to/FTorch/install/location>/include/ftorch
82+
```
83+
84+
When compiling the final executable add the following _link_ flag:
85+
```sh
86+
-L<path/to/FTorch/install/location>/lib64 -lftorch
87+
```
88+
This is often done by appending to an `LDFLAGS` linker flags variable or similar:
89+
```sh
90+
LDFLAGS += -L<path/to/FTorch/install/location>/lib64 -lftorch
91+
```
92+
93+
You may also need to add the location of the dynamic library `.so` files to your
94+
`LD_LIBRARY_PATH` environment variable unless installing in a default location:
95+
```sh
96+
export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:<path/to/FTorch/installation>/lib64
97+
```
98+
99+
> Note: _Depending on your system and architecture `lib` may be `lib64` or something similar._
100+
101+
> Note: _On MacOS devices you will need to set `DYLD_LIBRARY_PATH` rather than `LD_LIBRARY_PATH`._
102+
103+
Whilst experimenting it may be useful to build FTorch using the `CMAKE_BUILD_TYPE=RELEASE`
104+
CMake flag to allow useful error messages and investigation with debugging tools.
105+
106+
107+
### Module systems
108+
109+
Most HPC systems are managed using [Environment Modules](https://modules.sourceforge.net/).
110+
To build FTorch it is important you
111+
[match the environment in which you build FTorch to that of the executable](#environment-management)
112+
by loading the same modules as when building the main code.
113+
114+
As a minimal requirement you will need to load modules for compilers and CMake.
115+
Further functionalities may require loading of additional modules such as an
116+
MPI installation and CUDA.
117+
Some systems may also have pFUnit available as a loadable module to save you needing to
118+
build from scratch per the documentation if you are running FTorch's test suite.
119+
120+
#### LibTorch as a module
121+
122+
Once you have a working build of FTorch it is advisable to pin the version of LibTorch
123+
and make it a loadable module to improve reproducibility and simplify the build process
124+
for subsequent users on the system.
125+
126+
This can be done by the software manager after which you can use
127+
```sh
128+
module load libtorch
129+
```
130+
or similar instead of downloading the binary from the PyTorch website.
131+
132+
Note that the module name on your system may include additional information about the
133+
version, compilers used, and a hash code.
134+
135+
#### FTorch as a module
136+
137+
If there are many users who want to use FTorch on a system it may be worth building
138+
and making it loadable as a module itself.
139+
The module should be labelled with the compilers it was built with (see the
140+
[importance of environment matching](#environment-management)) and automatically load
141+
any subdependencies (CUDA)
142+
143+
The build should be completed for `CMAKE_BUILD_TYPE=RELEASE` and run the unit tests to
144+
check successful installation.
145+
146+
Once complete it should be possible to:
147+
```sh
148+
module load ftorch
149+
```
150+
or similar.
151+
152+
This process should also add FTorch to the `LD_LIBRARY_PATH` and `CMAKE_PREFIX_PATH`
153+
rather than requiring the user to specify them manually as suggested elsewhere in this
154+
documentation.

0 commit comments

Comments
 (0)