NOAA-GFDL · romanc · Aug 13, 2025 · Aug 8, 2025 · Aug 11, 2025 · Aug 11, 2025
diff --git a/docs/index.md b/docs/index.md
@@ -1,102 +1,13 @@
 # NDSL Documentation
 
-NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer.
+NDSL is a middleware for climate and weather modelling developed jointly by NOAA and NASA. It allows atmospheric scientists to focus on what matters in model development and essentially decouples performance engineering from model development.
 
-## Quick Start
+## Portable performance
 
-Python `3.11.x` is required for NDSL and all its third party dependencies for installation.
+NDSL brings together [GT4Py](https://github.com/GridTools/gt4py/) and [DaCe](https://github.com/spcl/dace/), two libraries developed for high-performance and portability. On top of those pillars, NDSL deploys a series of optimized APIs for common operations, e.g. halo exchange or domain decomposition, and tools to port existing models.
 
-NDSL submodules `gt4py` and `dace` to point to vetted versions, use `git clone --recurse-submodule` to update the git submodules.
+## Batteries-included for FV-based models
 
-NDSL is **NOT** available on `pypi`. Installation of the package has to be local, via `pip install ./NDSL` (`-e` supported). The packages have a few options:
+Historically, NDSL was developed to port the FV3 dynamical core on the cubed-sphere. Therefore, the middleware ships with ready-to-execute specialization for models based on cubed-sphere grids and FV-based models in particular.
 
-- `ndsl[test]`: installs the test packages (based on `pytest`)
-- `ndsl[develop]`: installs tools for development and tests.
-
-NDSL uses pytest for its unit tests, the tests are available via:
-
-- `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
-- `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)
-
-## Requirements & supported compilers
-
-For CPU backends:
-
-- 3.11.x >= Python < 3.12.x
-- Compilers:
-  - GNU 11.2+
-
-For GPU backends (the above plus):
-
-- CUDA 11.2+
-- Python package:
-  - `cupy` (latest with proper driver support [see install notes](https://docs.cupy.dev/en/stable/install.html))
-- Libraries:
-  - MPI compiled with cuda support
-
-## NDSL installation and testing
-
-NDSL is not available at `pypi`, it uses
-
-```bash
-pip install NDSL
-```
-
-to install NDSL locally.
-
-NDSL has a few options:
-
-- `ndsl[test]`: installs the test packages (based on `pytest`)
-- `ndsl[develop]`: installs tools for development and tests.
-
-Tests are available via:
-
-- `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
-- `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)
-
-## Configurations for Pace
-
-Configurations for Pace to use NDSL with different backend:
-
-- FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior
-
-  - Python: default, use stencil only, no full program optimization
-
-  - Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`)
-
-  - BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`)
-
-  - Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`)
-
-- NDSL_LITERAL_PRECISION=64 controls the floating point precision throughout the program.
-
-Install Pace with different NDSL backend:
-
-- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
-- When cloning Pace you will need to update the repository's submodules as well:
-
-```bash
-git clone --recursive https://github.com/ai2cm/pace.git
-```
-
-  or if you have already cloned the repository:
-
-```bash
-git submodule update --init --recursive
-```
-
-- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
-- We recommend creating a python `venv` or conda environment specifically for Pace.
-
-```bash
-python3 -m venv venv_name
-source venv_name/bin/activate
-```
-
-- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:
-
-```bash
-pip3 install -r requirements_dev.txt -c constraints.txt
-```
-
-- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation   (`requirements_docs.txt`).
+Next: get [up and running](./quickstart.md).
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -0,0 +1,36 @@
+# Quickstart
+
+Alright - let's get you up an running!
+
+NDSL requires Python version `3.11` and a GNU compiler. We strongly recommend using a conda or virtual environment.
+
+```shell
+# We have submodules for GT4Py and DaCe. Don't forget to pull them
+git clone --recurse-submodules [email protected]:NOAA-GFDL/NDSL.git
+
+cd NDSL/
+
+# We strongly recommend using conda or a virtual environment
+python -m venv .venv/
+source ./venv/bin/activate
+
+# [optional] Install MPI if you don't have a system installation.
+pip install openmpi
+
+# Finally, install NDSL
+pip install .[demos]
+```
+
+Now you can run through the Jupyter notebooks in `examples/NDSL` :rocket:.
+
+Read on in the [user manual](./user/index.md).
+
+!!! note "Supported compilers"
+
+    NDSL currently only works with the GNU compiler. Using `clang` will result in errors related to undefined OpenMP flags.
+
+    For MacOS users, we know that `gcc` version 14 from homebrew works.
+
+!!! question "Why cloning the repository?"
+
+    We are cloning the repository because NDSL is not available on `pypi`.
diff --git a/docs/user/index.md b/docs/user/index.md
@@ -1,3 +1,51 @@
 # Usage documentation
 
 This part of the documentation is geared towards users of NDSL.
+
+## Up and running
+
+See our [quickstart guide](./quickstart.md) on how to get up and running.
+
+## Configuration
+
+NDSL tries to have sensible defaults. In cases you want tweak something, here are some pointers:
+
+### Literal precision (float/int)
+
+Unspecified integer and floating point literals (e.g. `42` and `3.1415`) default to 64-bit precision. This can be changed with the environment variable `PACE_FLOAT_PRECISION`.
+
+For mixed precision code, you can specify the "hard coded" precision with type hints and casts, e.g.
+
+```python
+with computation(PARALLEL), interval(...):
+    # Either 32-bit or 64-bit depending on `PACE_FLOAT_PRECISION`
+    my_int = 42
+    my_float = 3.1415
+
+    # Always 32-bit
+    my_int32: int32 = 42
+    my_float32: float32 = 3.1415
+
+    # Explicit 64-bit cast within otherwise unspecified calculation
+    factor = 0.5 * float64(3.1415 + 2.71828)
+```
+
+### Full program optimizer
+
+The behavior of the full program optimizer is controlled by `FV3_DACEMODE`. Valid values are:
+
+`Python`
+
+:   The default. Disables full program optimization and only accelerates stencil code.
+
+`Build`
+
+:   Build the program, then exit. This mode is only available for backends `dace:gpu` and `dace:cpu`.
+
+`BuildAndRun`
+
+:   Build the program, then run it immediately. This mode is only available for backends `dace:gpu` and `dace:cpu`.
+
+`Run`
+
+:   Load a pre-compiled program and run it. Fails if the pre-compiled program can not be found. This mode is only available for backends `dace:gpu` and `dace:cpu`.
diff --git a/external/gt4py b/external/gt4py
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -9,6 +9,7 @@ theme:
 
 nav:
   - Home: index.md
+  - Quickstart: quickstart.md
   - User documentation: user/index.md
   - Porting:
       - General Concepts: porting/index.md
@@ -24,8 +25,12 @@ markdown_extensions:
   - abbr
   # support for colored notes / warnings / tips / examples
   - admonition
+  # support for "definition lists" (<dl>)
+  - def_list
   # support for footnotes
   - footnotes
+  # support for emojis
+  - pymdownx.emoji
   # support for syntax highlighting
   - pymdownx.highlight:
       anchor_linenums: true

diff --git a/ndsl/dsl/__init__.py b/ndsl/dsl/__init__.py
@@ -41,7 +41,8 @@ def _get_literal_precision(default: Literal["32", "64"] = "64") -> Literal["32",
 
 
 NDSL_GLOBAL_PRECISION = int(_get_literal_precision())
-os.environ["GT4PY_LITERAL_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
+os.environ["GT4PY_LITERAL_INT_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
+os.environ["GT4PY_LITERAL_FLOAT_PRECISION"] = str(NDSL_GLOBAL_PRECISION)
 
 
 # Set cache names for default gt backends workflow

diff --git a/ndsl/dsl/gt4py/__init__.py b/ndsl/dsl/gt4py/__init__.py
@@ -26,12 +26,18 @@
     computation,
     cos,
     cosh,
+    erf,
+    erfc,
     exp,
     externals,
+    float32,
+    float64,
     floor,
     function,
     gamma,
     horizontal,
+    int32,
+    int64,
     interval,
     isfinite,
     isinf,
@@ -80,12 +86,18 @@
     "computation",
     "cos",
     "cosh",
+    "erf",
+    "erfc",
     "exp",
     "externals",
+    "float32",
+    "float64",
     "floor",
     "function",
     "gamma",
     "horizontal",
+    "int32",
+    "int64",
     "interval",
     "isfinite",
     "isinf",
+25 −5		.github/workflows/daily-ci.yml
+16 −1		CHANGELOG.md
+55 −0		docs/development/ADRs/cartesian/frontend-literal-precision.md
+9 −9		pyproject.toml
+10 −4		src/gt4py/cartesian/backend/dace_backend.py
+11 −8		src/gt4py/cartesian/backend/pyext_builder.py
+27 −1		src/gt4py/cartesian/definitions.py
+6 −0		src/gt4py/cartesian/frontend/defir_to_gtir.py
+152 −22		src/gt4py/cartesian/frontend/gtscript_frontend.py
+38 −1		src/gt4py/cartesian/frontend/nodes.py
+37 −0		src/gt4py/cartesian/gtc/common.py
+4 −0		src/gt4py/cartesian/gtc/cuir/cuir_codegen.py
+11 −6		src/gt4py/cartesian/gtc/dace/oir_to_tasklet.py
+11 −9		src/gt4py/cartesian/gtc/dace/utils.py
+3 −1		src/gt4py/cartesian/gtc/debug/debug_codegen.py
+6 −0		src/gt4py/cartesian/gtc/gtcpp/gtcpp_codegen.py
+9 −1		src/gt4py/cartesian/gtc/passes/gtir_upcaster.py
+13 −1		src/gt4py/cartesian/gtc/ufuncs.py
+87 −31		src/gt4py/cartesian/gtscript.py
+1 −2		src/gt4py/cartesian/utils/base.py
+8 −0		src/gt4py/cartesian/utils/meta.py
+1 −1		src/gt4py/next/ffront/field_operator_ast.py
+17 −2		src/gt4py/next/ffront/foast_passes/iterable_unpack.py
+33 −4		src/gt4py/next/ffront/foast_passes/type_deduction.py
+29 −1		src/gt4py/next/ffront/foast_passes/utils.py
+16 −3		src/gt4py/next/ffront/foast_to_gtir.py
+3 −23		src/gt4py/next/ffront/func_to_foast.py
+21 −4		src/gt4py/next/iterator/ir_utils/ir_makers.py
+8 −0		src/gt4py/next/iterator/transforms/collapse_list_get.py
+46 −45		src/gt4py/next/program_processors/runners/dace/gtir_dataflow.py
+14 −6		src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg.py
+0 −3		src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg_concat_where.py
+0 −1		src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg_scan.py
+5 −6		src/gt4py/next/program_processors/runners/dace/transformations/__init__.py
+20 −7		src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
+71 −0		src/gt4py/next/program_processors/runners/dace/transformations/constants.py
+64 −11		src/gt4py/next/program_processors/runners/dace/transformations/gpu_utils.py
+3 −1		src/gt4py/next/program_processors/runners/dace/transformations/local_double_buffering.py
+4 −4		src/gt4py/next/program_processors/runners/dace/transformations/loop_blocking.py
+153 −40		src/gt4py/next/program_processors/runners/dace/transformations/map_fusion_extended.py
+6 −6		src/gt4py/next/program_processors/runners/dace/transformations/map_fusion_utils.py
+0 −1		src/gt4py/next/program_processors/runners/dace/transformations/map_promoter.py
+5 −8		src/gt4py/next/program_processors/runners/dace/transformations/move_dataflow_into_if_body.py
+19 −35		src/gt4py/next/program_processors/runners/dace/transformations/simplify.py
+2 −1		src/gt4py/next/program_processors/runners/dace/transformations/splitting_tools.py
+8 −7		src/gt4py/next/program_processors/runners/dace/workflow/common.py
+12 −6		src/gt4py/next/program_processors/runners/dace/workflow/decoration.py
+24 −8		src/gt4py/next/program_processors/runners/dace/workflow/translation.py
+11 −0		src/gt4py/next/type_system/type_specifications.py
+2 −2		src/gt4py/storage/cartesian/layout.py
+8 −4		tests/cartesian_tests/integration_tests/multi_feature_tests/stencil_definitions.py
+4 −0		tests/cartesian_tests/integration_tests/multi_feature_tests/test_dace_parsing.py
+71 −0		tests/cartesian_tests/integration_tests/multi_feature_tests/test_math_functions.py
+152 −4		tests/cartesian_tests/unit_tests/frontend_tests/test_gtscript_frontend.py
+3 −1		tests/cartesian_tests/unit_tests/frontend_tests/test_ir_maker.py
+17 −0		tests/cartesian_tests/unit_tests/test_gtc/dace/test_oir_to_tasklet.py
+21 −0		tests/next_tests/integration_tests/feature_tests/ffront_tests/test_execution.py
+44 −1		tests/next_tests/integration_tests/feature_tests/ffront_tests/test_external_local_field.py
+10 −0		tests/next_tests/unit_tests/ffront_tests/test_func_to_foast.py
+63 −0		tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_gtir_to_sdfg.py
+0 −1		...t_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_constant_substitution.py
+0 −1		...unit_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_copy_chain_remover.py
+2 −1		...s/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_horizontal_map_split_fusion.py
+1 −2		...ests/unit_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_loop_blocking.py
+2 −2		..._tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_map_buffer_elimination.py
+0 −1		...ts/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_move_dataflow_into_if_body.py
+0 −5		...next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_strides.py
+11 −3		...sts/program_processor_tests/runners_tests/dace_tests/transformation_tests/test_vertical_map_split_fusion.py
+41 −12		uv.lock