Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 6 additions & 95 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,13 @@
# NDSL Documentation

NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer.
NDSL is a middleware for climate and weather modelling developed jointly by NOAA and NASA. It allows atmospheric scientists to focus on what matters in model development and essentially decouples performance engineering from model development.

## Quick Start
## Portable performance

Python `3.11.x` is required for NDSL and all its third party dependencies for installation.
NDSL brings together [GT4Py](https://github.com/GridTools/gt4py/) and [DaCe](https://github.com/spcl/dace/), two libraries developed for high-performance and portability. On top of those pillars, NDSL deploys a series of optimized APIs for common operations, e.g. halo exchange or domain decomposition, and tools to port existing models.

NDSL submodules `gt4py` and `dace` to point to vetted versions, use `git clone --recurse-submodule` to update the git submodules.
## Batteries-included for FV-based models

NDSL is **NOT** available on `pypi`. Installation of the package has to be local, via `pip install ./NDSL` (`-e` supported). The packages have a few options:
Historically, NDSL was developed to port the FV3 dynamical core on the cubed-sphere. Therefore, the middleware ships with ready-to-execute specialization for models based on cubed-sphere grids and FV-based models in particular.

- `ndsl[test]`: installs the test packages (based on `pytest`)
- `ndsl[develop]`: installs tools for development and tests.

NDSL uses pytest for its unit tests, the tests are available via:

- `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
- `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)

## Requirements & supported compilers

For CPU backends:

- 3.11.x >= Python < 3.12.x
- Compilers:
- GNU 11.2+

For GPU backends (the above plus):

- CUDA 11.2+
- Python package:
- `cupy` (latest with proper driver support [see install notes](https://docs.cupy.dev/en/stable/install.html))
- Libraries:
- MPI compiled with cuda support

## NDSL installation and testing

NDSL is not available at `pypi`, it uses

```bash
pip install NDSL
```

to install NDSL locally.

NDSL has a few options:

- `ndsl[test]`: installs the test packages (based on `pytest`)
- `ndsl[develop]`: installs tools for development and tests.

Tests are available via:

- `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
- `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)

## Configurations for Pace

Configurations for Pace to use NDSL with different backend:

- FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior

- Python: default, use stencil only, no full program optimization

- Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`)

- BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`)

- Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`)

- NDSL_LITERAL_PRECISION=64 controls the floating point precision throughout the program.

Install Pace with different NDSL backend:

- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
- When cloning Pace you will need to update the repository's submodules as well:

```bash
git clone --recursive https://github.com/ai2cm/pace.git
```

or if you have already cloned the repository:

```bash
git submodule update --init --recursive
```

- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
- We recommend creating a python `venv` or conda environment specifically for Pace.

```bash
python3 -m venv venv_name
source venv_name/bin/activate
```

- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:

```bash
pip3 install -r requirements_dev.txt -c constraints.txt
```

- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation (`requirements_docs.txt`).
Next: get [up and running](./quickstart.md).
36 changes: 36 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Quickstart

Alright - let's get you up an running!

NDSL requires Python version `3.11` and a GNU compiler. We strongly recommend using a conda or virtual environment.

```shell
# We have submodules for GT4Py and DaCe. Don't forget to pull them
git clone --recurse-submodules [email protected]:NOAA-GFDL/NDSL.git

cd NDSL/

# We strongly recommend using conda or a virtual environment
python -m venv .venv/
source ./venv/bin/activate

# [optional] Install MPI if you don't have a system installation.
pip install openmpi

# Finally, install NDSL
pip install .[demos]
```

Now you can run through the Jupyter notebooks in `examples/NDSL` :rocket:.

Read on in the [user manual](./user/index.md).

!!! note "Supported compilers"

NDSL currently only works with the GNU compiler. Using `clang` will result in errors related to undefined OpenMP flags.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmalatino - there was work to allow the use of Intel, did that not make it into the release streams?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears not. Was this from Xingqiu's work?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will track it down and make a subsequent PR or suggest the changes here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanc here is a build script that Xingqiu wrote for our post-processing/analysis machine, which sets the flags for using the 2021.3.0 Intel compilers. I am not sure if it makes sense to amend this portion of the docs to indicate that a build and installation is possible with these compilers as well, or if we should hold off, test with that is more current, and make a subsequent PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a follow-up PR for (intel) compiler support. Think a bit where this might end up in the docs. Here we are in the quickstart section. That's not the place to be super technical. Imo, if we have anything else then "we know this works", then it should be a discussion on a dedicated page that we can just link to from here.


For MacOS users, we know that `gcc` version 14 from homebrew works.

!!! question "Why cloning the repository?"

We are cloning the repository because NDSL is not available on `pypi`.
48 changes: 48 additions & 0 deletions docs/user/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,51 @@
# Usage documentation

This part of the documentation is geared towards users of NDSL.

## Up and running

See our [quickstart guide](./quickstart.md) on how to get up and running.

## Configuration

NDSL tries to have sensible defaults. In cases you want tweak something, here are some pointers:

### Literal precision (float/int)

Unspecified integer and floating point literals (e.g. `42` and `3.1415`) default to 64-bit precision. This can be changed with the environment variable `PACE_FLOAT_PRECISION`.

For mixed precision code, you can specify the "hard coded" precision with type hints and casts, e.g.

```python
with computation(PARALLEL), interval(...):
# Either 32-bit or 64-bit depending on `PACE_FLOAT_PRECISION`
my_int = 42
my_float = 3.1415

# Always 32-bit
my_int32: int32 = 42
my_float32: float32 = 3.1415

# Explicit 64-bit cast within otherwise unspecified calculation
factor = 0.5 * float64(3.1415 + 2.71828)
```

### Full program optimizer

The behavior of the full program optimizer is controlled by `FV3_DACEMODE`. Valid values are:

`Python`

: The default. Disables full program optimization and only accelerates stencil code.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the acceleration under this option via numpy or gt4py only (reading this as a complete newbie to NDSL)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this is a bit out of scope for what we are doing here. But since we are here, let me give you the high-level overview. Basically, NDSL has two major modes of how it can run.

In the default mode, every with computation(): block is analyzed and an optimized. Depending on the backend, this could mean that the with computation(): block is running in C++ (e.g. in the original GridTools backends or in the dace:cpu backend). For dace:gpu, for example, we run that part of the code on the GPU. If users opt for the numpy backend, we "rewrite" that block of code with building blocks from numpy.

Now, numerical weather prediction (NWP) codes are kind of "fragmented" with many with computation() blocks. That's why NDSL can go a step further than plain GT4Py. In NDSL, we can leverage full program optimization or what we internally call "orchestration". Orchestration will not only analyze and optimize with computation(): blocks but also everything in between. More importantly, we can - in that mode - analyze two (subsequent) with computation(): blocks and decide to merge them into one if that makes sense and is allowed from a semantic point of view.

Our working hypothesis is that this second mode is much more potent for getting to portable performance because it allows us to do large-scale changes to code. However, full program optimization doesn't just magically work out of the box. It only works with the dace:* backends and it might need changes to the science code too.

Long story short: This part obviously needs rewording/rephrasing. I'm just reformatting existing docs here. I'd suggest to do this in a separate PR. Even the above write-up is probably too complicated for complete new users. Docs are currently very much work in progress and I expect sections to move around a lot until we settle on a frist version that we think we can (automatically) deploy. Most likely, the whole section on changing defaults with environment variables doesn't belong on the index page of the user documentation 😉.


`Build`

: Build the program, then exit. This mode is only available for backends `dace:gpu` and `dace:cpu`.

`BuildAndRun`

: Build the program, then run it immediately. This mode is only available for backends `dace:gpu` and `dace:cpu`.

`Run`

: Load a pre-compiled program and run it. Fails if the pre-compiled program can not be found. This mode is only available for backends `dace:gpu` and `dace:cpu`.
2 changes: 1 addition & 1 deletion external/gt4py
Submodule gt4py updated 69 files
+25 −5 .github/workflows/daily-ci.yml
+16 −1 CHANGELOG.md
+55 −0 docs/development/ADRs/cartesian/frontend-literal-precision.md
+9 −9 pyproject.toml
+10 −4 src/gt4py/cartesian/backend/dace_backend.py
+11 −8 src/gt4py/cartesian/backend/pyext_builder.py
+27 −1 src/gt4py/cartesian/definitions.py
+6 −0 src/gt4py/cartesian/frontend/defir_to_gtir.py
+152 −22 src/gt4py/cartesian/frontend/gtscript_frontend.py
+38 −1 src/gt4py/cartesian/frontend/nodes.py
+37 −0 src/gt4py/cartesian/gtc/common.py
+4 −0 src/gt4py/cartesian/gtc/cuir/cuir_codegen.py
+11 −6 src/gt4py/cartesian/gtc/dace/oir_to_tasklet.py
+11 −9 src/gt4py/cartesian/gtc/dace/utils.py
+3 −1 src/gt4py/cartesian/gtc/debug/debug_codegen.py
+6 −0 src/gt4py/cartesian/gtc/gtcpp/gtcpp_codegen.py
+9 −1 src/gt4py/cartesian/gtc/passes/gtir_upcaster.py
+13 −1 src/gt4py/cartesian/gtc/ufuncs.py
+87 −31 src/gt4py/cartesian/gtscript.py
+1 −2 src/gt4py/cartesian/utils/base.py
+8 −0 src/gt4py/cartesian/utils/meta.py
+1 −1 src/gt4py/next/ffront/field_operator_ast.py
+17 −2 src/gt4py/next/ffront/foast_passes/iterable_unpack.py
+33 −4 src/gt4py/next/ffront/foast_passes/type_deduction.py
+29 −1 src/gt4py/next/ffront/foast_passes/utils.py
+16 −3 src/gt4py/next/ffront/foast_to_gtir.py
+3 −23 src/gt4py/next/ffront/func_to_foast.py
+21 −4 src/gt4py/next/iterator/ir_utils/ir_makers.py
+8 −0 src/gt4py/next/iterator/transforms/collapse_list_get.py
+46 −45 src/gt4py/next/program_processors/runners/dace/gtir_dataflow.py
+14 −6 src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg.py
+0 −3 src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg_concat_where.py
+0 −1 src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg_scan.py
+5 −6 src/gt4py/next/program_processors/runners/dace/transformations/__init__.py
+20 −7 src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
+71 −0 src/gt4py/next/program_processors/runners/dace/transformations/constants.py
+64 −11 src/gt4py/next/program_processors/runners/dace/transformations/gpu_utils.py
+3 −1 src/gt4py/next/program_processors/runners/dace/transformations/local_double_buffering.py
+4 −4 src/gt4py/next/program_processors/runners/dace/transformations/loop_blocking.py
+153 −40 src/gt4py/next/program_processors/runners/dace/transformations/map_fusion_extended.py
+6 −6 src/gt4py/next/program_processors/runners/dace/transformations/map_fusion_utils.py
+0 −1 src/gt4py/next/program_processors/runners/dace/transformations/map_promoter.py
+5 −8 src/gt4py/next/program_processors/runners/dace/transformations/move_dataflow_into_if_body.py
+19 −35 src/gt4py/next/program_processors/runners/dace/transformations/simplify.py
+2 −1 src/gt4py/next/program_processors/runners/dace/transformations/splitting_tools.py
+8 −7 src/gt4py/next/program_processors/runners/dace/workflow/common.py
+12 −6 src/gt4py/next/program_processors/runners/dace/workflow/decoration.py
+24 −8 src/gt4py/next/program_processors/runners/dace/workflow/translation.py
+11 −0 src/gt4py/next/type_system/type_specifications.py
+2 −2 src/gt4py/storage/cartesian/layout.py
+8 −4 tests/cartesian_tests/integration_tests/multi_feature_tests/stencil_definitions.py
+4 −0 tests/cartesian_tests/integration_tests/multi_feature_tests/test_dace_parsing.py
+71 −0 tests/cartesian_tests/integration_tests/multi_feature_tests/test_math_functions.py
+152 −4 tests/cartesian_tests/unit_tests/frontend_tests/test_gtscript_frontend.py
+3 −1 tests/cartesian_tests/unit_tests/frontend_tests/test_ir_maker.py
+17 −0 tests/cartesian_tests/unit_tests/test_gtc/dace/test_oir_to_tasklet.py
+21 −0 tests/next_tests/integration_tests/feature_tests/ffront_tests/test_execution.py
+44 −1 tests/next_tests/integration_tests/feature_tests/ffront_tests/test_external_local_field.py
+10 −0 tests/next_tests/unit_tests/ffront_tests/test_func_to_foast.py
+63 −0 tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_gtir_to_sdfg.py
+0 −1 ...t_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_constant_substitution.py
+0 −1 ...unit_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_copy_chain_remover.py
+2 −1 ...s/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_horizontal_map_split_fusion.py
+1 −2 ...ests/unit_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_loop_blocking.py
+2 −2 ..._tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_map_buffer_elimination.py
+0 −1 ...ts/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_move_dataflow_into_if_body.py
+0 −5 ...next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_strides.py
+11 −3 ...sts/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_vertical_map_split_fusion.py
+41 −12 uv.lock
5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ theme:

nav:
- Home: index.md
- Quickstart: quickstart.md
- User documentation: user/index.md
- Porting:
- General Concepts: porting/index.md
Expand All @@ -24,8 +25,12 @@ markdown_extensions:
- abbr
# support for colored notes / warnings / tips / examples
- admonition
# support for "definition lists" (<dl>)
- def_list
# support for footnotes
- footnotes
# support for emojis
- pymdownx.emoji
# support for syntax highlighting
- pymdownx.highlight:
anchor_linenums: true
Expand Down
3 changes: 2 additions & 1 deletion ndsl/dsl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ def _get_literal_precision(default: Literal["32", "64"] = "64") -> Literal["32",


NDSL_GLOBAL_PRECISION = int(_get_literal_precision())
os.environ["GT4PY_LITERAL_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
os.environ["GT4PY_LITERAL_INT_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
os.environ["GT4PY_LITERAL_FLOAT_PRECISION"] = str(NDSL_GLOBAL_PRECISION)


# Set cache names for default gt backends workflow
Expand Down
12 changes: 12 additions & 0 deletions ndsl/dsl/gt4py/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,18 @@
computation,
cos,
cosh,
erf,
erfc,
exp,
externals,
float32,
float64,
floor,
function,
gamma,
horizontal,
int32,
int64,
interval,
isfinite,
isinf,
Expand Down Expand Up @@ -80,12 +86,18 @@
"computation",
"cos",
"cosh",
"erf",
"erfc",
"exp",
"externals",
"float32",
"float64",
"floor",
"function",
"gamma",
"horizontal",
"int32",
"int64",
"interval",
"isfinite",
"isinf",
Expand Down