-
Notifications
You must be signed in to change notification settings - Fork 15
Update gt4py: support for literal precision #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,102 +1,13 @@ | ||
| # NDSL Documentation | ||
|
|
||
| NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer. | ||
| NDSL is a middleware for climate and weather modelling developed jointly by NOAA and NASA. It allows atmospheric scientists to focus on what matters in model development and essentially decouples performance engineering from model development. | ||
|
|
||
| ## Quick Start | ||
| ## Portable performance | ||
|
|
||
| Python `3.11.x` is required for NDSL and all its third party dependencies for installation. | ||
| NDSL brings together [GT4Py](https://github.com/GridTools/gt4py/) and [DaCe](https://github.com/spcl/dace/), two libraries developed for high-performance and portability. On top of those pillars, NDSL deploys a series of optimized APIs for common operations, e.g. halo exchange or domain decomposition, and tools to port existing models. | ||
|
|
||
| NDSL submodules `gt4py` and `dace` to point to vetted versions, use `git clone --recurse-submodule` to update the git submodules. | ||
| ## Batteries-included for FV-based models | ||
|
|
||
| NDSL is **NOT** available on `pypi`. Installation of the package has to be local, via `pip install ./NDSL` (`-e` supported). The packages have a few options: | ||
| Historically, NDSL was developed to port the FV3 dynamical core on the cubed-sphere. Therefore, the middleware ships with ready-to-execute specialization for models based on cubed-sphere grids and FV-based models in particular. | ||
|
|
||
| - `ndsl[test]`: installs the test packages (based on `pytest`) | ||
| - `ndsl[develop]`: installs tools for development and tests. | ||
|
|
||
| NDSL uses pytest for its unit tests, the tests are available via: | ||
|
|
||
| - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed) | ||
| - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed) | ||
|
|
||
| ## Requirements & supported compilers | ||
|
|
||
| For CPU backends: | ||
|
|
||
| - 3.11.x >= Python < 3.12.x | ||
| - Compilers: | ||
| - GNU 11.2+ | ||
|
|
||
| For GPU backends (the above plus): | ||
|
|
||
| - CUDA 11.2+ | ||
| - Python package: | ||
| - `cupy` (latest with proper driver support [see install notes](https://docs.cupy.dev/en/stable/install.html)) | ||
| - Libraries: | ||
| - MPI compiled with cuda support | ||
|
|
||
| ## NDSL installation and testing | ||
|
|
||
| NDSL is not available at `pypi`, it uses | ||
|
|
||
| ```bash | ||
| pip install NDSL | ||
| ``` | ||
|
|
||
| to install NDSL locally. | ||
|
|
||
| NDSL has a few options: | ||
|
|
||
| - `ndsl[test]`: installs the test packages (based on `pytest`) | ||
| - `ndsl[develop]`: installs tools for development and tests. | ||
|
|
||
| Tests are available via: | ||
|
|
||
| - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed) | ||
| - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed) | ||
|
|
||
| ## Configurations for Pace | ||
|
|
||
| Configurations for Pace to use NDSL with different backend: | ||
|
|
||
| - FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior | ||
|
|
||
| - Python: default, use stencil only, no full program optimization | ||
|
|
||
| - Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`) | ||
|
|
||
| - BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`) | ||
|
|
||
| - Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`) | ||
|
|
||
| - NDSL_LITERAL_PRECISION=64 controls the floating point precision throughout the program. | ||
|
|
||
| Install Pace with different NDSL backend: | ||
|
|
||
| - Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`. | ||
| - When cloning Pace you will need to update the repository's submodules as well: | ||
|
|
||
| ```bash | ||
| git clone --recursive https://github.com/ai2cm/pace.git | ||
| ``` | ||
|
|
||
| or if you have already cloned the repository: | ||
|
|
||
| ```bash | ||
| git submodule update --init --recursive | ||
| ``` | ||
|
|
||
| - Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend. | ||
| - We recommend creating a python `venv` or conda environment specifically for Pace. | ||
|
|
||
| ```bash | ||
| python3 -m venv venv_name | ||
| source venv_name/bin/activate | ||
| ``` | ||
|
|
||
| - Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace: | ||
|
|
||
| ```bash | ||
| pip3 install -r requirements_dev.txt -c constraints.txt | ||
| ``` | ||
|
|
||
| - There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation (`requirements_docs.txt`). | ||
| Next: get [up and running](./quickstart.md). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| # Quickstart | ||
|
|
||
| Alright - let's get you up an running! | ||
|
|
||
| NDSL requires Python version `3.11` and a GNU compiler. We strongly recommend using a conda or virtual environment. | ||
|
|
||
| ```shell | ||
| # We have submodules for GT4Py and DaCe. Don't forget to pull them | ||
| git clone --recurse-submodules [email protected]:NOAA-GFDL/NDSL.git | ||
|
|
||
| cd NDSL/ | ||
|
|
||
| # We strongly recommend using conda or a virtual environment | ||
| python -m venv .venv/ | ||
| source ./venv/bin/activate | ||
|
|
||
| # [optional] Install MPI if you don't have a system installation. | ||
| pip install openmpi | ||
|
|
||
| # Finally, install NDSL | ||
| pip install .[demos] | ||
| ``` | ||
|
|
||
| Now you can run through the Jupyter notebooks in `examples/NDSL` :rocket:. | ||
|
|
||
| Read on in the [user manual](./user/index.md). | ||
|
|
||
| !!! note "Supported compilers" | ||
|
|
||
| NDSL currently only works with the GNU compiler. Using `clang` will result in errors related to undefined OpenMP flags. | ||
|
|
||
| For MacOS users, we know that `gcc` version 14 from homebrew works. | ||
|
|
||
| !!! question "Why cloning the repository?" | ||
|
|
||
| We are cloning the repository because NDSL is not available on `pypi`. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,51 @@ | ||
| # Usage documentation | ||
|
|
||
| This part of the documentation is geared towards users of NDSL. | ||
|
|
||
| ## Up and running | ||
|
|
||
| See our [quickstart guide](./quickstart.md) on how to get up and running. | ||
|
|
||
| ## Configuration | ||
|
|
||
| NDSL tries to have sensible defaults. In cases you want tweak something, here are some pointers: | ||
|
|
||
| ### Literal precision (float/int) | ||
|
|
||
| Unspecified integer and floating point literals (e.g. `42` and `3.1415`) default to 64-bit precision. This can be changed with the environment variable `PACE_FLOAT_PRECISION`. | ||
|
|
||
| For mixed precision code, you can specify the "hard coded" precision with type hints and casts, e.g. | ||
|
|
||
| ```python | ||
| with computation(PARALLEL), interval(...): | ||
| # Either 32-bit or 64-bit depending on `PACE_FLOAT_PRECISION` | ||
| my_int = 42 | ||
| my_float = 3.1415 | ||
|
|
||
| # Always 32-bit | ||
| my_int32: int32 = 42 | ||
| my_float32: float32 = 3.1415 | ||
|
|
||
| # Explicit 64-bit cast within otherwise unspecified calculation | ||
| factor = 0.5 * float64(3.1415 + 2.71828) | ||
| ``` | ||
|
|
||
| ### Full program optimizer | ||
|
|
||
| The behavior of the full program optimizer is controlled by `FV3_DACEMODE`. Valid values are: | ||
|
|
||
| `Python` | ||
|
|
||
| : The default. Disables full program optimization and only accelerates stencil code. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the acceleration under this option via numpy or gt4py only (reading this as a complete newbie to NDSL)?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, so this is a bit out of scope for what we are doing here. But since we are here, let me give you the high-level overview. Basically, NDSL has two major modes of how it can run. In the default mode, every Now, numerical weather prediction (NWP) codes are kind of "fragmented" with many Our working hypothesis is that this second mode is much more potent for getting to portable performance because it allows us to do large-scale changes to code. However, full program optimization doesn't just magically work out of the box. It only works with the Long story short: This part obviously needs rewording/rephrasing. I'm just reformatting existing docs here. I'd suggest to do this in a separate PR. Even the above write-up is probably too complicated for complete new users. Docs are currently very much work in progress and I expect sections to move around a lot until we settle on a frist version that we think we can (automatically) deploy. Most likely, the whole section on changing defaults with environment variables doesn't belong on the index page of the user documentation 😉. |
||
|
|
||
| `Build` | ||
|
|
||
| : Build the program, then exit. This mode is only available for backends `dace:gpu` and `dace:cpu`. | ||
|
|
||
| `BuildAndRun` | ||
|
|
||
| : Build the program, then run it immediately. This mode is only available for backends `dace:gpu` and `dace:cpu`. | ||
|
|
||
| `Run` | ||
|
|
||
| : Load a pre-compiled program and run it. Fails if the pre-compiled program can not be found. This mode is only available for backends `dace:gpu` and `dace:cpu`. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmalatino - there was work to allow the use of Intel, did that not make it into the release streams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears not. Was this from Xingqiu's work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will track it down and make a subsequent PR or suggest the changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanc here is a build script that Xingqiu wrote for our post-processing/analysis machine, which sets the flags for using the 2021.3.0 Intel compilers. I am not sure if it makes sense to amend this portion of the docs to indicate that a build and installation is possible with these compilers as well, or if we should hold off, test with that is more current, and make a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make a follow-up PR for (intel) compiler support. Think a bit where this might end up in the docs. Here we are in the quickstart section. That's not the place to be super technical. Imo, if we have anything else then "we know this works", then it should be a discussion on a dedicated page that we can just link to from here.