|
| 1 | +title: Guidance for use in High Performance Computing (HPC) |
| 2 | + |
| 3 | +[TOC] |
| 4 | + |
| 5 | +A common application of FTorch (indeed, the driving one for development) is the |
| 6 | +coupling of machine learning components to models running on HPC systems. |
| 7 | + |
| 8 | +Here we provide some guidance/hints to help with deployment in these settings. |
| 9 | + |
| 10 | +## Installation |
| 11 | + |
| 12 | +### Building for basic use |
| 13 | + |
| 14 | +The basic installation procedure is the same as described in the |
| 15 | +[main documentation](pages/cmake.html) and README, cloning from |
| 16 | +[GitHub](https://github.com/Cambridge-ICCS/FTorch) and building using CMake. |
| 17 | + |
| 18 | +### Obtaining LibTorch |
| 19 | + |
| 20 | +For use on a HPC system we advise linking to an installation of LibTorch rather than |
| 21 | +installing full PyTorch. |
| 22 | +This will reduce the dependencies and remove any requirement of Python. |
| 23 | +LibTorch can be obtained from the |
| 24 | +[PyTorch website](https://pytorch.org/get-started/locally/). |
| 25 | +The assumption here is that any Python/PyTorch development is done elsewhere with a |
| 26 | +model being saved to TorchScript for use by FTorch. |
| 27 | + |
| 28 | +Once you have successfully tested and deployed FTorch in your code we recommend speaking |
| 29 | +to your administrator/software stack manager to make your chosen version of libtorch |
| 30 | +loadable as a `module`. |
| 31 | +This will improve reproducibility and simplify the process for future users on your |
| 32 | +system. |
| 33 | +See the [information below](#libtorch-as-a-module) for further details. |
| 34 | + |
| 35 | +### Environment management |
| 36 | + |
| 37 | +It is important that FTorch is built using the same environment and compilers as the |
| 38 | +software to which it will be linked. |
| 39 | + |
| 40 | +Therefore before starting the build you should ensure that you match the environment to |
| 41 | +that which your code will be built with. |
| 42 | +This will usually be done by using the same `module` commands as you would use to build |
| 43 | +the model: |
| 44 | +```sh |
| 45 | +module purge |
| 46 | +module load ... |
| 47 | +``` |
| 48 | + |
| 49 | +Alternatively you may be provided with a shell script that runs these commands and sets |
| 50 | +environment variables etc. that can be sourced: |
| 51 | +```sh |
| 52 | +source model_environment.sh |
| 53 | +``` |
| 54 | + |
| 55 | +Complex models with custom build systems may obfuscate this process, and you might need |
| 56 | +to probe the build system/scripts for this information. |
| 57 | +If in doubt speak to the maintainer of the software for your system, or the manager of |
| 58 | +the software stack on the machine. |
| 59 | + |
| 60 | +Because of the need to match compilers it is strongly recommended to specify the |
| 61 | +`CMAKE_Fortran_COMPILER`, `CMAKE_C_COMPILER`, and `CMAKE_CXX_COMPILER` when building |
| 62 | +with CMake to enforce this. |
| 63 | + |
| 64 | +### Building Projects and Linking to FTorch |
| 65 | + |
| 66 | +Whilst we describe how to link to FTorch using CMake to build a project on our main |
| 67 | +page, many HPC models do not use CMake and rely on `make` or more elaborate build |
| 68 | +systems. |
| 69 | +To build a project with `make` or similar you need to _include_ the FTorch's |
| 70 | +header (`.h`) and module (`.mod`) files and _link_ the executable |
| 71 | +to the Ftorch library (e.g., `.so`, `.dll`, `.dylib` depending on your system) when |
| 72 | +compiling. |
| 73 | + |
| 74 | +To compile with make add the following compiler flag when compiling files that |
| 75 | +use ftorch to _include_ the library: |
| 76 | +```sh |
| 77 | +-I<path/to/FTorch/install/location>/include/ftorch |
| 78 | +``` |
| 79 | +This is often done by appending to an `FCFLAGS` compiler flags variable or similar: |
| 80 | +```sh |
| 81 | +FCFLAGS += -I<path/to/FTorch/install/location>/include/ftorch |
| 82 | +``` |
| 83 | + |
| 84 | +When compiling the final executable add the following _link_ flag: |
| 85 | +```sh |
| 86 | +-L<path/to/FTorch/install/location>/lib64 -lftorch |
| 87 | +``` |
| 88 | +This is often done by appending to an `LDFLAGS` linker flags variable or similar: |
| 89 | +```sh |
| 90 | +LDFLAGS += -L<path/to/FTorch/install/location>/lib64 -lftorch |
| 91 | +``` |
| 92 | + |
| 93 | +You may also need to add the location of the dynamic library `.so` files to your |
| 94 | +`LD_LIBRARY_PATH` environment variable unless installing in a default location: |
| 95 | +```sh |
| 96 | +export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:<path/to/FTorch/installation>/lib64 |
| 97 | +``` |
| 98 | + |
| 99 | +> Note: _Depending on your system and architecture `lib` may be `lib64` or something similar._ |
| 100 | +
|
| 101 | +> Note: _On MacOS devices you will need to set `DYLD_LIBRARY_PATH` rather than `LD_LIBRARY_PATH`._ |
| 102 | +
|
| 103 | +Whilst experimenting it may be useful to build FTorch using the `CMAKE_BUILD_TYPE=RELEASE` |
| 104 | +CMake flag to allow useful error messages and investigation with debugging tools. |
| 105 | + |
| 106 | + |
| 107 | +### Module systems |
| 108 | + |
| 109 | +Most HPC systems are managed using [Environment Modules](https://modules.sourceforge.net/). |
| 110 | +To build FTorch it is important you |
| 111 | +[match the environment in which you build FTorch to that of the executable](#environment-management) |
| 112 | +by loading the same modules as when building the main code. |
| 113 | + |
| 114 | +As a minimal requirement you will need to load modules for compilers and CMake. |
| 115 | +Further functionalities may require loading of additional modules such as an |
| 116 | +MPI installation and CUDA. |
| 117 | +Some systems may also have pFUnit available as a loadable module to save you needing to |
| 118 | +build from scratch per the documentation if you are running FTorch's test suite. |
| 119 | + |
| 120 | +#### LibTorch as a module |
| 121 | + |
| 122 | +Once you have a working build of FTorch it is advisable to pin the version of LibTorch |
| 123 | +and make it a loadable module to improve reproducibility and simplify the build process |
| 124 | +for subsequent users on the system. |
| 125 | + |
| 126 | +This can be done by the software manager after which you can use |
| 127 | +```sh |
| 128 | +module load libtorch |
| 129 | +``` |
| 130 | +or similar instead of downloading the binary from the PyTorch website. |
| 131 | + |
| 132 | +Note that the module name on your system may include additional information about the |
| 133 | +version, compilers used, and a hash code. |
| 134 | + |
| 135 | +#### FTorch as a module |
| 136 | + |
| 137 | +If there are many users who want to use FTorch on a system it may be worth building |
| 138 | +and making it loadable as a module itself. |
| 139 | +The module should be labelled with the compilers it was built with (see the |
| 140 | +[importance of environment matching](#environment-management)) and automatically load |
| 141 | +any subdependencies (CUDA) |
| 142 | + |
| 143 | +The build should be completed for `CMAKE_BUILD_TYPE=RELEASE` and run the unit tests to |
| 144 | +check successful installation. |
| 145 | + |
| 146 | +Once complete it should be possible to: |
| 147 | +```sh |
| 148 | +module load ftorch |
| 149 | +``` |
| 150 | +or similar. |
| 151 | + |
| 152 | +This process should also add FTorch to the `LD_LIBRARY_PATH` and `CMAKE_PREFIX_PATH` |
| 153 | +rather than requiring the user to specify them manually as suggested elsewhere in this |
| 154 | +documentation. |
0 commit comments