chore(engine): add workflow package #19511

rfratto · 2025-10-15T18:41:57Z

The new workflow package introduces an abstraction over physical plans:

A physical plan is split into several parallelizable units called "tasks," split on pipeline breakers and generated shards downstream of Parallelize nodes (chore(engine): add Parallelize hint node #19521).
Each task sends or receives data along "streams."
The "workflow" is the graph of tasks to execute for a given query. A workflow listens for task results from the root task.

Other engines typically call these units "fragments," with the collection of fragments constructing the "distributed query plan."

We don't use the standard term here since our usage is stateful. Workflows respond to tasks changing state, and will eventually be responsible for deciding when a task should be enqueued for running at all.

The workflow.Runner interface is introduced to represent the mechanism running tasks. A basic implementation is used for testing, but in production, workflow.Runner will be implemented by the scheduler.

rfratto · 2025-10-15T18:43:26Z

pkg/engine/internal/planner/physical/printer.go


 func toTreeNode(n Node) *tree.Node {
 	treeNode := tree.NewNode(n.Type().String(), "")
+	treeNode.Context = n


This is used by workflow.Sprint to allow workflow printing to hook into the tree produced by the physical plan and add additional context for nodes (the streams to write to or read from).

rfratto · 2025-10-15T18:47:31Z

pkg/engine/internal/workflow/task.go

+	// ULID is a unique identifier of the Task.
+	ULID ulid.ULID


This could be UUIDv7, but I picked ULID here because:

Its output is smaller (26 vs 36 bytes), and

ULIDs have a canonical binary wire representation, unlike UUIDv7, allowing us to safely marshal its binary representation (16 bytes) rather than the text representation.

ULID also requires monotonic counters, which guarantees that a single instance of a process can't generate collisions. With UUIDv7, it's recommended but not required, so it depends on the library you use.

rfratto · 2025-10-16T16:11:18Z

I'm going to move this back into draft, I can complete the implementation of workflow splitting now that #19521 is available, and #19524 will be merged soon.

The scheduler prototype was relying on the tree ID matching the node ID for injecting additional information into the tree (streams used by that node). This commit introduces a workaround by allocating creators of a tree node to inject arbitrary values as context. Signed-off-by: Robert Fratto <[email protected]>

As the scheduler prototype will manipulate physical plans and split them into smaller fragments, utility functions are needed to create physical plans from DAGs and return the existing DAG for modification. Signed-off-by: Robert Fratto <[email protected]>

A shardable node is a physical plan node that supports being split into multiple smaller nodes. Currently, ScanSet is the only shardable node, where each shard is a one of its targets (resulting in a DataObjScan node). Shardable nodes will be used in task planning to create tasks downstream of Parallelize. Signed-off-by: Robert Fratto <[email protected]>

The new workflow package introduces an abstraction over physical plans: * A physical plan is split into several parallelizable units called "tasks," split on pipeline breakers and generated shards downstream of Parallelize nodes. * Each task sends or receives data along streams. * The "workflow" is the graph of tasks to execute for a given query. A workflow listens for task results from the root task. Other engines typically call these units "fragments," with the collection of fragments constructing the "distributed query plan." We don't use the standard term here since our usage is stateful. Workflows respond to tasks changing state, and will eventually be responsible for deciding when a task should be enqueued for running at all. The workflow.Runner interface is introduced to represent the mechanism running tasks. A basic implementation is used for testing, but in production, workflow.Runner will be implemented by the scheduler. Signed-off-by: Robert Fratto <[email protected]>

rfratto · 2025-10-17T16:18:31Z

This is ready for a review again, I've integrated #19521 and #19524, and now the workflow package is generating the representation of tasks I used in the scheduler prototype.

pkg/engine/internal/workflow/runner.go

pkg/engine/internal/workflow/workflow_planner.go

ivkalita · 2025-10-20T19:07:33Z

pkg/engine/internal/workflow/workflow.go

+
+func (wf *Workflow) onTaskChange(ctx context.Context, task *Task, newState TaskState) {
+	wf.tasksMut.Lock()
+	wf.taskStates[task] = newState


Wdyt about forbidding changing a terminal task state to a non-terminal? I think there could be race conditions otherwise. The lock is released on line 220 before the tasks are cancelled => if a task becomes non-terminal again after the lock is released but before the children are cancelled it might stuck (?).

I understand that the above sounds very unlikely to happen but the whole idea of a task being able to resurrect from a terminal state increases the complexity, so if it's not intended why not to forbid it explicitly? 😅

I don't think there's a race condition here; the most important thing is that onTaskChange can't be called recursively while the lock is held, which we prevent by releasing the lock right before canceling tasks. (It's fine if the tasks somehow revive themselves; that invocation of onTaskChange would wait for the previous invocation to exit).

I'm not sure we want onTaskChange to be responsible for rejecting state changes; workflow is more of a responder to whatever the runner decides the task state is. I'd like to revisit this once we have the worker in place and see how it feels, and then we can adjust based on that.

pkg/engine/internal/workflow/workflow.go

rfratto requested a review from a team as a code owner October 15, 2025 18:41

pull-request-size bot added the size/XXL label Oct 15, 2025

rfratto commented Oct 15, 2025

View reviewed changes

rfratto force-pushed the thor-scheduler-workflow branch 2 times, most recently from b37ed12 to 356a763 Compare October 15, 2025 19:14

rfratto mentioned this pull request Oct 16, 2025

chore(engine): introduce ScanSet node #19524

Merged

rfratto marked this pull request as draft October 16, 2025 16:11

rfratto force-pushed the thor-scheduler-workflow branch 3 times, most recently from 49d9f8a to 43f7b67 Compare October 17, 2025 15:41

rfratto added 4 commits October 17, 2025 12:02

rfratto force-pushed the thor-scheduler-workflow branch from 43f7b67 to 382c2c1 Compare October 17, 2025 16:02

rfratto assigned rfratto and unassigned rfratto Oct 17, 2025

rfratto marked this pull request as ready for review October 17, 2025 16:18

rfratto requested a review from spiridonov October 17, 2025 16:19

ivkalita reviewed Oct 20, 2025

View reviewed changes

chaudum approved these changes Oct 21, 2025

View reviewed changes

fixup! chore(engine): add workflow package

2db4323

rfratto merged commit 6b96e4d into main Oct 21, 2025
65 checks passed

rfratto deleted the thor-scheduler-workflow branch October 21, 2025 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(engine): add workflow package #19511

chore(engine): add workflow package #19511

Uh oh!

rfratto commented Oct 15, 2025 •

edited

Loading

Uh oh!

rfratto Oct 15, 2025

Uh oh!

rfratto Oct 15, 2025 •

edited

Loading

Uh oh!

rfratto commented Oct 16, 2025 •

edited

Loading

Uh oh!

rfratto commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivkalita Oct 20, 2025

Uh oh!

rfratto Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(engine): add workflow package #19511

chore(engine): add workflow package #19511

Uh oh!

Conversation

rfratto commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfratto Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

rfratto Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfratto commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfratto commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivkalita Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

rfratto Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rfratto commented Oct 15, 2025 •

edited

Loading

rfratto Oct 15, 2025 •

edited

Loading

rfratto commented Oct 16, 2025 •

edited

Loading