Skip to content
Open
Show file tree
Hide file tree
Changes from 90 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
9c3920d
[Graph] Add qd.checkpoint AST surface (slice 1a)
hughperkins Jun 4, 2026
ec5bdd2
[Graph] Plumb checkpoint_id end-to-end (slice 1b)
hughperkins Jun 4, 2026
1af49f0
[Graph] Add checkpoint IF-gate kernel source (slice 1c, part 1/N)
hughperkins Jun 4, 2026
1e18653
[Graph] Drop sm_110 from checkpoint gate fatbin script (CUDA 12.9 too…
hughperkins Jun 4, 2026
a5c614f
[Graph] GraphManager wires IF nodes per qd.checkpoint (slice 1c, CUDA…
hughperkins Jun 4, 2026
b4612a0
[Graph] Slice 1d (part 1): yield-check + cond-with-yield kernel sourc…
hughperkins Jun 5, 2026
e138cfd
[Graph] Slice 1d (part 2): regenerate fatbins for yield-check + cond-…
Jun 5, 2026
4492b35
[Graph] Slice 1d (part 3): wire yield-check kernel + cond-with-yield …
hughperkins Jun 5, 2026
79ebf57
[Graph] Slice 1d (part 4): tests for yield-check / yield-race / WHILE…
hughperkins Jun 5, 2026
28a2f29
[Graph] Slice 1d (fix): route first launch through launch_cached_grap…
hughperkins Jun 5, 2026
5dedc4e
[Graph] Slice 1d (tests): correct yield-first-wins semantics
hughperkins Jun 5, 2026
977f3f0
[Graph] Slice 1d (docs): document yield mechanism in graph.md + updat…
hughperkins Jun 5, 2026
bf906d7
[Graph] Slice 2 (impl): GraphStatus return value + kernel.resume(from…
hughperkins Jun 5, 2026
dd37b28
[Graph] Slice 2 (fix): move GraphStatus to its own module to avoid ci…
hughperkins Jun 5, 2026
788f607
[Graph] Slice 2 (tests): GraphStatus return + resume(from_checkpoint=…
hughperkins Jun 5, 2026
37ed4e4
[Graph] Slice 2 (docs): GraphStatus + kernel.resume() user guide
hughperkins Jun 5, 2026
4d5bc25
[Graph] Slice 3: port qipc test_resume_offset.cu scenarios A+B + rese…
hughperkins Jun 5, 2026
d7fe3f6
[Graph] Slice 3: regenerate condition kernel fatbin (cond-with-yield …
hughperkins Jun 5, 2026
7123030
[Graph] Slice 3 (fix): rewrite resume-offset tests to use for-loops s…
hughperkins Jun 5, 2026
c685f24
[Graph] Slice 3 (fix): use for _ in range(1) for scalar work inside c…
hughperkins Jun 5, 2026
0c94986
[Graph] Slice 7 (docs polish): backend-support table now lists qd.che…
hughperkins Jun 5, 2026
e35e9cb
[Graph] Slice 6: CPU fallback - host-branch gating for qd.checkpoint …
hughperkins Jun 5, 2026
83a6224
[Graph] Slice 6 (fix): clear resume_from_checkpoint after first WHILE…
hughperkins Jun 5, 2026
b265cdf
[Graph] Slice 6: gate yield bookkeeping on yield-capable kernels + op…
hughperkins Jun 5, 2026
e42c496
[Graph] Slice 7: graph.md final pass for slice 6 CPU coverage
hughperkins Jun 5, 2026
9a369c4
[Graph] Slice 4 (AMDGPU): sub-graph orchestration for qd.checkpoint
hughperkins Jun 5, 2026
22f8d26
[Graph] Slice 4 (AMDGPU): open up yield/resume tests on amdgpu (seque…
hughperkins Jun 5, 2026
22f2069
[Graph] Slice 4 (AMDGPU): WHILE + checkpoint via streaming-launcher h…
hughperkins Jun 5, 2026
a25c241
[Graph] Slice 5 (Vulkan/Metal): GFX-runtime host gating for qd.checkp…
hughperkins Jun 5, 2026
840ac3d
[Docs] Update graph.md for slice 4/5 (AMDGPU + Vulkan + Metal coverage)
hughperkins Jun 5, 2026
68a4d54
[Graph] Slice 4/5/6: arm64 also runs the CPU host-branch gating path
hughperkins Jun 5, 2026
cd2881d
[Graph] Slice 5 rewrite: Vulkan/Metal GPU-side gating via indirect di…
hughperkins Jun 5, 2026
fcc001f
[Graph] Pre-Hopper CUDA: GPU-side qd.checkpoint gating via codegen pr…
hughperkins Jun 5, 2026
559fbbf
amdgpu: WIP GPU-side checkpoint gating via codegen prologue + flat HI…
hughperkins Jun 5, 2026
fded02a
amdgpu graph: pass device ptr (not host vector addr) to yield-check k…
hughperkins Jun 5, 2026
73628e2
amdgpu streaming launcher: GPU-side checkpoint gating for graph_do_wh…
hughperkins Jun 5, 2026
01795ab
docs: AMDGPU qd.checkpoint is now GPU-side (codegen prologue + flat H…
hughperkins Jun 5, 2026
0983393
Merge branch 'main' into hp/graph-checkpoint
hughperkins Jun 11, 2026
5058e34
PR 725 review: tighten graph.md qd.checkpoint section + linter fix
hughperkins Jun 11, 2026
c8bfb4b
PR 725 lint: pre-commit run -a (clang-format, black, trailing whitesp…
hughperkins Jun 11, 2026
7c9b76a
PR 725 review round 2: graph.md tightening + AST detection of the bar…
hughperkins Jun 11, 2026
61ee679
PR 725 review round 3: auto-wrap bare stmts in qd.checkpoint, trim cp…
hughperkins Jun 11, 2026
a0f2b54
PR 725 fixes: pyright tuple, _resume_from_checkpoint compat, clang-ti…
hughperkins Jun 11, 2026
cf4633e
PR 725 CI: feature factorization (3 modules) + restore deleted ration…
hughperkins Jun 11, 2026
b6ca290
PR 725 fix: test_api -- register qd.checkpoint + qd.GraphStatus, hide…
hughperkins Jun 11, 2026
d2afddf
PR 725: rewrap comments to 120c per find_underwrapped.py audit
hughperkins Jun 11, 2026
b6ee9be
PR 725: rewrap comments to 120c per find_underwrapped.py audit (follo…
hughperkins Jun 11, 2026
2a483e7
PR 725 CI: address line wrapping, deleted comments, and SPIR-V test c…
hughperkins Jun 11, 2026
9240ca3
PR 725 CI: additional factorization + restore two more upstream comments
hughperkins Jun 11, 2026
fb799c6
PR 725: rewrap one more comment in checkpoint_yield_check_shader.h
hughperkins Jun 11, 2026
6ea701b
PR 725 fix: offload-pass cp_id propagation through intervening serial…
hughperkins Jun 11, 2026
8b45f47
PR 725 lint: rewrap mixed-width comment paragraphs to a single 120c p…
hughperkins Jun 11, 2026
dd9548b
PR 725 AMDGPU fix: add gfx1010/1011/1012 (RDNA1) to checkpoint yield-…
hughperkins Jun 12, 2026
d741a46
PR 725 AMDGPU fix: clang-format the regenerated HSACO header (and pre…
hughperkins Jun 12, 2026
197a892
qd.checkpoint: fuse adjacent bare statements into one wrapper task
hughperkins Jun 12, 2026
3402601
qd.checkpoint: reject bare top-level statements with a clear compile-…
hughperkins Jun 12, 2026
9e3eae5
PR 725 line-wrap: reflow three 80c comments flagged by the CI agent
hughperkins Jun 12, 2026
10c36a6
PR 725 line-wrap: reflow three more 80c comments flagged by the CI agent
hughperkins Jun 12, 2026
5b3e656
Merge branch 'main' into hp/graph-checkpoint
hughperkins Jun 12, 2026
0e26368
PR 725 line-wrap: reflow the last two 80c comments flagged by the CI …
hughperkins Jun 12, 2026
9aa7c69
Merge branch 'main' into hp/graph-checkpoint
hughperkins Jun 12, 2026
0bb6027
PR 725 line-wrap: reflow three more 80c comments flagged by the CI ag…
hughperkins Jun 12, 2026
935cc37
qd.checkpoint: mark as experimental in user-facing docs and docstrings
hughperkins Jun 13, 2026
7ddee8d
qd.checkpoint experimental note: drop the "pin to a version / file an…
hughperkins Jun 13, 2026
5d1c181
PR 725 line-wrap: bulk reflow of under-wrapped // comment runs across…
hughperkins Jun 13, 2026
7aa1f60
qd.checkpoint v2: auto-wrap top-level for-loops + IntEnum-friendly cp…
hughperkins Jun 17, 2026
b3e726e
qd.checkpoint v2: tests, docs, GraphStatus repr, line-wrapping pass
hughperkins Jun 17, 2026
9698f73
docs: drop converted-from-bare qd.checkpoint() calls from the graph.m…
hughperkins Jun 17, 2026
60290d7
qd.checkpoint: reject bare-Name cp_id (fastcache no-globals safety)
hughperkins Jun 17, 2026
aa408c6
docs: drop "auto-wrap" / "implicit checkpoint" from user-facing surface
hughperkins Jun 17, 2026
9e02885
docs: drop 'and are the expected pattern' from gdw + checkpoint note
hughperkins Jun 17, 2026
a7f198f
qd.checkpoint: re-allow bare-Name cp_id, just don't advertise it
hughperkins Jun 17, 2026
9d91543
docs: drop misleading 'first-yielder-wins-of-several' clause from gra…
hughperkins Jun 17, 2026
24139e0
qd.checkpoint: stop clearing user's yield_on flag in the yield-check
hughperkins Jun 17, 2026
be27aa3
amdgpu: regenerate checkpoint_yield_check HSACO after dropping yield_…
Jun 17, 2026
2db5db8
cuda: regenerate checkpoint_yield_check fatbin after dropping yield_o…
Jun 17, 2026
9a35cb9
docs: drop the skip-past-yielder paragraph (suggested API that doesn'…
hughperkins Jun 17, 2026
b0ae987
docs: spell out that yield_on must be initialised before the first la…
hughperkins Jun 17, 2026
c5fa6c9
docs: drop internal '(skip + yield_on=)' tag from qd.checkpoint row i…
hughperkins Jun 17, 2026
9e2cc90
docs: drop misleading 'kernel pauses at that checkpoint' from yield m…
hughperkins Jun 17, 2026
a3520a0
docs: drop the 'write the flag unconditionally' alternative pattern
hughperkins Jun 17, 2026
14edbf7
docs: GraphStatus returned by kernels with checkpoints=True (not 'at …
hughperkins Jun 17, 2026
5927d31
docs: 'iff a checkpoint' instead of 'iff some checkpoint'
hughperkins Jun 17, 2026
d3559df
docs: 'including from kernel.resume(...)' instead of 'and from'
hughperkins Jun 17, 2026
f21617d
Merge branch 'main' into hp/graph-checkpoint
hughperkins Jun 17, 2026
67da464
address review commetns
hughperkins Jun 17, 2026
a398139
precoommit
hughperkins Jun 17, 2026
e939301
kernel: extract qd.checkpoint plumbing into kernel_checkpoint.py
hughperkins Jun 17, 2026
6544ff6
fix CI failures from kernel_checkpoint extract
hughperkins Jun 17, 2026
0464423
wrap: tighten 3 comment runs flagged by AI bot on 6544ff668
hughperkins Jun 17, 2026
cb4f308
Merge origin/main into hp/graph-checkpoint (nested graph_do_while)
hughperkins Jun 18, 2026
28f630b
address comments
hughperkins Jun 18, 2026
1cf0343
add Resume where setion
hughperkins Jun 18, 2026
35e49d4
precommit
hughperkins Jun 18, 2026
3b84936
factor checkpoint launch helpers out of GfxRuntime::launch_kernel
hughperkins Jun 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 89 additions & 5 deletions docs/source/user_guide/graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,11 @@ Graphs reduce kernel launch overhead by capturing a sequence of GPU operations i

## Backend support

Both features run on every backend. They are *hardware accelerated* on CUDA (via CUDA graphs) and AMDGPU (via HIP graphs); `graph_do_while` additionally requires CUDA SM 9.0+ / Hopper for its hardware-accelerated path. On other backends, `graph=True` is silently ignored and the kernel runs via the normal launch path, and `graph_do_while` falls back to a host-side do-while loop that copies the condition value GPU → host each iteration (causing a pipeline stall — see [Caveats](#caveats)).
Comment thread
hughperkins marked this conversation as resolved.
Outdated

| Feature | `qd.cuda` SM 9.0+ | `qd.cuda` < SM 9.0 | `qd.amdgpu` | `qd.metal` | `qd.vulkan` | `qd.cpu` |
Comment thread
hughperkins marked this conversation as resolved.
| --- | --- | --- | --- | --- | --- | --- |
| `graph=True` | hardware accelerated | hardware accelerated | hardware accelerated | runs (no acceleration) | runs (no acceleration) | runs (no acceleration) |
| `graph_do_while` | hardware accelerated | host fallback | host fallback | host fallback | host fallback | host fallback |

AMDGPU `graph_do_while` falls back to the host-side loop because HIP does not currently expose conditional / while graph nodes (as of ROCm 7.2).
| `qd.graph_do_while` | hardware accelerated | host fallback | host fallback | host fallback | host fallback | host fallback |
| `qd.checkpoint` | GPU-side | GPU-side | GPU-side | GPU-side | GPU-side | host-side |

## Basic usage

Expand Down Expand Up @@ -154,3 +151,90 @@ Note: the basic `graph=True` path (without `graph_do_while`) does **not** stall
Therefore on unsupported platforms, you might consider creating a second implementation, which works differently. e.g.:
- fixed number of loop iterations, so no dependency on gpu data for kernel launch; combined perhaps with:
- make each kernel 'short-circuit', exit quickly, if the task has already been completed; to avoid running the GPU more than necessary

## Checkpoints with `qd.checkpoint` *(experimental)*

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: after the backwards-incompatible disaster taht was the algorithms.md changes for qipc, that kept changing 😅 I think I'd like to mark things as 'experimental' for a few weeks/months, until we are confident the api is stable.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to AI: this is not a request to you, it is an obseration for other human reviewers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If checkpoint is a graph-specific API, it should be prefixed by 'graph_', as for all the other functions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Know that I am thinking, we should probably just have some new 'qd.graph.' submodule. That would like everything both simpler and more less confusing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. But perhaps not in this PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, make sure it is tracking somewhere. Still, it is weird to address this in another PR, since it is literally the PR that introduces this function. But I can understand you want to move faster and qipc is already relying on this one.


> **Experimental.** `qd.checkpoint`, `qd.GraphStatus`, and `kernel.resume(from_checkpoint=...)` are experimental APIs. The shape of the public surface (the context-manager signature, the `@qd.kernel(checkpoints=True)` flag, the `GraphStatus` fields, the host-side resume loop, the error messages, and the cross-backend lowering details) may change in any future release without a deprecation cycle.

`qd.checkpoint` lets a graph kernel pause partway through, surface a reason to the host, let the host fix things up, and resume from where it paused on the next launch. An example use-case is an algorithm implemented as a graph that may need to allocate additional memory partway through, where the graph operations are in-place, and therefore not idempotent, and therefore for which simply retrying the whole graph from the start is not an option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I already mentioned, "graph operations" is not a standard term, you cannot use it without defining what it means, or just reformulate this sentence to avoid using this terminology.

I would suggest to define in parentheses what "idempotent" means in programming. It is not really common knowledge and understanding what it means in this context is critical.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that idempotent is a standard programming term. It is not compiler specific. It is not physics siulation specific. I've seem the term used by many engineers in my previous company, during standard PRs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • replaced 'graph operaitons' with 'operations in the graph'
  • replcae 'not idempotent' with 'cannot be rerun'

@duburcqa duburcqa Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‘cannot be rerun´ without altering / corrupting the output


To use checkpoints:

1. Decorate the kernel with `@qd.kernel(graph=True, checkpoints=True)`.
2. Place `with qd.checkpoint(cp_id, yield_on=flag):` around any section of the body where you want to be able to pause and resume.

```python
from enum import IntEnum

class Stage(IntEnum):
SIM = 0

@qd.kernel(graph=True, checkpoints=True)
def step(
arr: qd.types.ndarray(qd.f32, ndim=1),
overflow_flag: qd.types.ndarray(qd.i32, ndim=0),
newton_cond: qd.types.ndarray(qd.i32, ndim=0),
):
while qd.graph_do_while(newton_cond):
for i in range(arr.shape[0]):
# ...
pass
with qd.checkpoint(Stage.SIM, yield_on=overflow_flag):
for i in range(arr.shape[0]):
# ...
pass
for i in range(arr.shape[0]):
# ...
pass
```
Comment thread
hughperkins marked this conversation as resolved.

The `cp_id` argument is the label you'll use to identify the checkpoint from the host (in `GraphStatus.checkpoint` and `kernel.resume(from_checkpoint=...)`). It must be an int literal or an `IntEnum` value; the framework preserves the value as-is, so `qd.checkpoint(Stage.SIM, ...)` round-trips as `Stage.SIM` rather than the raw int. Labels must be unique within a kernel.

### Yield mechanism

When the body of a checkpoint writes a non-zero value into `yield_on[()]`:

1. Everything after the yielding checkpoint in the same launch is skipped.
2. `qd.checkpoint` will exit any surrounding `qd.graph_do_while`.

The framework never writes into your `yield_on` buffer — you own it end-to-end. That means:
Comment thread
duburcqa marked this conversation as resolved.

- Before the **first** launch, initialise it to `0` (a freshly allocated `qd.ndarray` is not guaranteed to be zeroed).
- Before each **resume** launch, reset it to `0` (otherwise the body of the same checkpoint sees the stale non-zero value and yields again on the same condition, looping forever).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth using some ⚠️ tag.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldnt that be non-ascii? 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it uses :warning:


### Host-side yield / resume loop

Kernels annotated with `checkpoints=True` return a `qd.GraphStatus` from every launch (including from `kernel.resume(...)`). The status carries two fields:

- `status.yielded` — `True` iff a checkpoint's `yield_on=` flag was non-zero during this launch.
- `status.checkpoint` — the `cp_id` label of the yielding checkpoint (or `None` when `yielded` is `False`).

Resume by calling `kernel.resume(..., from_checkpoint=label)`. Everything before `label` in source order is skipped on the resume launch; everything from `label` onward runs normally. The canonical host loop:

```python
overflow_flag[()] = 0 # initialise before the first launch
status = step(arr, overflow_flag, newton_cond)
Comment thread
duburcqa marked this conversation as resolved.
while status.yielded:
handle_overflow_for(status.checkpoint, ...)
overflow_flag[()] = 0 # clear before resume, otherwise the same checkpoint yields again
Comment thread
duburcqa marked this conversation as resolved.
status = step.resume(arr, overflow_flag, newton_cond,
from_checkpoint=status.checkpoint)
```

### Restrictions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain / be more explicit about what happens during resume?

Stating clearly that the entire checkpoint block is re-executed, and that it is user-responsibility to ensure idempotent behaviour when checkpoint is needed? Because if the state is altered during the checkpoint block, resuming is not going to save you I guess? Whatever the answer, it should be very clear in the doc.

Beyond that, how does checkpointing works under the hood? Does it snapshot all the input data by copy before yielding, or it just return like this? If no copy is made, this means that resuming must be done "right away", without further altering the data in between, otherwise it is some kind of undefined behaviour.

Another important point, what is I don't want to resume in such a case and I just want to move on to another kernel and continue like this? Is it supported or resume must happen?

I think it is essentially to clarify all these points in the documentation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpoint does NOT require idempotent behavior. This is the entire purpose of checkpoint: to be able to interrupt and resume graphs that are NOT idempotent.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, hte checkpoint block itself. right.

@hughperkins hughperkins Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the checkpoint block itself actually does not so much require idempotence, as requiring that it is atomic: it either succeeds completely, or fails without changing anything.

@hughperkins hughperkins Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an example, in the case of allocation issues, the checkpoint block looks like:

  • do we have enough memory availbel?
    • no: exit now
    • yes: ok, lets proceed with running the sort etc

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 'resume where' section

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the checkpoint block itself actually does not so much require idempotence, as requiring that it is atomic: it either succeeds completely, or fails without changing anything.

Yeah, this is exactly what I meant by « ensure idempotent behaviour when checkpoint is needed »

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fails without changing anything" I feel is not idempotent? Idempotent means that calling the function multiple times is identical in effect to calling it once. But if it fails the first time, it would only be idempotent if it always failed thereafter I feel?


- Must be used inside `@qd.kernel(graph=True, checkpoints=True)`. Without the flag, `qd.checkpoint(...)` raises `QuadrantsSyntaxError` at compile time with a fix-it pointing at `checkpoints=True`.
- `cp_id` must be an int literal or an `IntEnum` value, and must be unique across the kernel.
- `yield_on=` must be a kernel parameter that is a 0-d `qd.types.ndarray(qd.i32, ndim=0)`; expressions are not supported.
- Checkpoints cannot be nested inside other checkpoints. Checkpoints inside a `qd.graph_do_while` body are fine.
- The body of a `with qd.checkpoint(...)` block cannot contain bare top-level statements (assignments, augmented assignments, or bare call/expression statements). Every top-level statement must be inside a `for`-loop (or other control-flow construct). A docstring as the first statement is allowed. Bare statements raise `QuadrantsSyntaxError` at compile time with a fix-it pointing at the explicit one-iteration `for`-wrap:

```python
with qd.checkpoint(0, yield_on=flag):
for _ in range(1):
c[()] = c[()] + 1
for i in range(arr.shape[0]):
arr[i] = arr[i] + 1
```

The restriction is by design: each top-level statement inside a checkpoint becomes its own GPU task / graph node, so silently wrapping bare statements would hide a sequence of N field writes ballooning into N kernel launches. Forcing the user to write the `for`-wrap themselves keeps the lowering visible and gives a single obvious place to fuse multiple writes into one task by sharing a single wrapper.
2 changes: 2 additions & 0 deletions python/quadrants/lang/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,11 @@
"enums",
"exception",
"expr",
"graph_status",
"impl",
"inspect",
"kernel_arguments",
"kernel_checkpoint",
"kernel_impl",
"matrix",
"mesh",
Expand Down
46 changes: 46 additions & 0 deletions python/quadrants/lang/_quadrants_callable.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,48 @@ def __init__(self, fn: Callable, wrapper: Callable) -> None:
def __call__(self, *args, **kwargs):
return self.wrapper.__call__(*args, **kwargs)

def resume(self, *args, from_checkpoint, **kwargs):
"""Continues a paused graph kernel from the checkpoint labelled ``from_checkpoint``.

.. warning::

**Experimental.** ``kernel.resume`` is part of the experimental ``qd.checkpoint`` surface; the signature
(in particular the ``from_checkpoint=`` kwarg) and behaviour may change in any future release without a
deprecation cycle.

Use only on ``@qd.kernel(graph=True, checkpoints=True)`` kernels with at least one
``qd.checkpoint(cp_id, yield_on=flag)`` block. ``from_checkpoint`` is a ``cp_id`` label (typically an
``IntEnum`` value, often ``status.checkpoint`` from the previous launch): everything before that label in
source order is skipped on this launch, and execution continues from there. The host loop pattern is::

from enum import IntEnum

class Stage(IntEnum):
SIM = 0

overflow_flag[()] = 0 # initialise before the first launch
status = step(arr, overflow_flag, newton_cond)
while status.yielded:
handle(status.checkpoint, ...)
overflow_flag[()] = 0 # the framework never clears your yield_on flag
status = step.resume(arr, overflow_flag, newton_cond,
from_checkpoint=status.checkpoint)

Returns the same ``GraphStatus`` shape as the plain call.

Raises ``RuntimeError`` if invoked on a kernel without any ``yield_on=`` checkpoint, or if ``from_checkpoint``
does not match any declared ``cp_id`` in the kernel.
"""
if not isinstance(from_checkpoint, int):
raise RuntimeError(
f"from_checkpoint= must be an int or IntEnum value matching a `qd.checkpoint(cp_id=...)` label in "
f"the kernel (typically `status.checkpoint` from the previous launch's GraphStatus); "
f"got {from_checkpoint!r}."
)
# Smuggle the resume cookie past the AST-mapped kwargs path; `Kernel.__call__` pops it before anything else
# looks at kwargs.
return self.wrapper.__call__(*args, _qd_from_checkpoint=from_checkpoint, **kwargs)

def __get__(self, instance, owner):
if instance is None:
return self
Expand Down Expand Up @@ -125,3 +167,7 @@ def __setattr__(self, k: str, v: Any) -> None:
def grad(self, *args, **kwargs) -> "Kernel":
assert self.quadrants_callable._adjoint is not None
return self.quadrants_callable._adjoint(self.instance, *args, **kwargs)

def resume(self, *args, from_checkpoint, **kwargs):
"""Bound-method form of `QuadrantsCallable.resume` (see that docstring)."""
return self.quadrants_callable.resume(self.instance, *args, from_checkpoint=from_checkpoint, **kwargs)
29 changes: 28 additions & 1 deletion python/quadrants/lang/ast/ast_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@
get_decorator,
)
from quadrants.lang.ast.ast_transformers.call_transformer import CallTransformer
from quadrants.lang.ast.ast_transformers.checkpoint_transformer import (
CheckpointTransformer,
)
from quadrants.lang.ast.ast_transformers.function_def_transformer import (
FunctionDefTransformer,
)
Expand Down Expand Up @@ -1362,6 +1365,13 @@ def _is_graph_do_while_call(node: ast.expr) -> str | None:
return node.args[0].id
return None

@staticmethod
def _is_checkpoint_call(node: ast.expr, global_vars: dict):
"""Thin forwarding wrapper around ``CheckpointTransformer.is_checkpoint_call``; the actual logic lives in module
``ast_transformers/checkpoint_transformer.py`` to keep this file from growing per-feature. Returns a
``CheckpointCallInfo`` or ``None``."""
return CheckpointTransformer.is_checkpoint_call(node, global_vars)

@staticmethod
def build_While(ctx: ASTTransformerFuncContext, node: ast.While) -> None:
if node.orelse:
Expand Down Expand Up @@ -1575,15 +1585,32 @@ def build_With(ctx: ASTTransformerFuncContext, node: ast.With) -> None:
raise QuadrantsSyntaxError("'with ... as ...' is not supported in Quadrants kernels")
if not isinstance(item.context_expr, ast.Call):
raise QuadrantsSyntaxError("'with' in Quadrants kernels requires a call expression")

checkpoint_info = ASTTransformer._is_checkpoint_call(item.context_expr, ctx.global_vars)
if checkpoint_info is not None:
return ASTTransformer._build_checkpoint_with(ctx, node, checkpoint_info)

if not FunctionDefTransformer._is_stream_parallel_with(node, ctx.global_vars):
raise QuadrantsSyntaxError("'with' in Quadrants kernels only supports qd.stream_parallel()")
raise QuadrantsSyntaxError(
"'with' in Quadrants kernels only supports qd.stream_parallel() or qd.checkpoint()"
)
if not ctx.is_kernel:
raise QuadrantsSyntaxError("qd.stream_parallel() can only be used inside @qd.kernel, not @qd.func")
ctx.ast_builder.begin_stream_parallel()
build_stmts(ctx, node.body)
ctx.ast_builder.end_stream_parallel()
return None

@staticmethod
def _build_checkpoint_with(
ctx: ASTTransformerFuncContext,
node: ast.With,
info,
) -> None:
"""Thin forwarding wrapper around ``CheckpointTransformer.build_checkpoint_with``; the actual logic lives in
``ast_transformers/checkpoint_transformer.py``."""
return CheckpointTransformer.build_checkpoint_with(ctx, node, info, build_stmts)

@staticmethod
def build_Pass(ctx: ASTTransformerFuncContext, node: ast.Pass) -> None:
return None
Expand Down
Loading
Loading