Feature request: Durable Actions with pluggable workflow engines #318
FredKSchott
started this conversation in
Feature Request
Replies: 1 comment
-
|
Another provider integration to look at: openworkflow.dev |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Explore durable Actions as a follow-up to RFC #312: Actions as reusable finite agent orchestration.
The proposed direction is to add a durable Action variant with a Temporal/Inngest/Cloudflare Workflows-style step API:
A configured durability engine would checkpoint completed steps and resume interrupted Action executions. Without such an engine, the same Action could still run locally through an in-process step implementation, but without restart guarantees.
This is a feature direction, not a complete specification. The goal is to validate the model and gather feedback before committing to provider contracts or exact APIs.
Motivation
Flue is intended to be a durable agent framework, but a long-running Action called by an Agent can still be vulnerable to process failure.
For example, an Action may perform 30 minutes of finite orchestration on behalf of an Agent. If the process dies after 20 minutes, Flue needs to know:
The new Action primitive proposed in RFC #312 is a natural place for durable checkpoints. Actions already represent finite orchestration shared by Agents and Workflows, while Agents and Workflows provide different execution roots.
Proposed model
Ordinary Actions
An ordinary Action executes directly without creating a durable-engine run:
This avoids durable execution overhead for small helpers and fast orchestration.
Durable Actions
A durable Action opts into replay-compatible semantics and receives
step:defineDurableAction()declares that the Action is written for checkpointed replay. It does not itself choose or configure a provider.The distinction is primarily about execution cost and programming constraints:
defineAction()runs as a normal function through Flue’s Action executor.defineDurableAction()may create a Cloudflare Workflow, Inngest function run, Temporal execution, or another provider-backed durable execution.Agents and Workflows should not need their own
durable: trueflags. They use the configured durability engine automatically whenever they invoke a durable Action.Durability engine
Durability should generally be configured once for the application or deployment target rather than repeated across every Agent and Workflow.
Conceptually:
Or target-specific configuration:
The exact configuration surface is unresolved. It may be implemented through build/target plugins even if “plugin” is not the user-facing mental model.
The important ownership model is:
A future named engine registry could support rare cases where different execution roots require different providers, but this should not shape the initial API without a concrete use case.
Execution semantics
Durable Actions would use checkpoint replay rather than serialized JavaScript continuations.
After an interruption, the engine would:
For example:
On replay, the Action starts again, but completed
sourceandreviewsteps return their checkpointed outputs instead of rerunning.This means code outside step callbacks may execute more than once and should be deterministic and free of durable side effects.
Step contract
The smallest useful API may look like:
Potential later configuration:
Step identities must be explicit and stable. Completed outputs must be serializable. A replay encountering an existing step key with incompatible inputs or configuration should fail rather than silently return stale data.
Potential future APIs include:
Timers, external events, parallelism, and compensation are larger additions and should not be implied by the first
step.run()implementation.Durability guarantees
A checkpointed result may be reused without rerunning its callback, but a generic step callback cannot be promised exactly-once execution.
There is always a possible failure window:
The callback may therefore execute again. Durable Actions should expose a stable idempotency key and clearly document at-least-once callback semantics.
Only serializable values should cross step boundaries. Live values such as these must remain inside callbacks and be reacquired after replay:
Sandbox recovery
Durable Action recovery should reconnect to the existing sandbox rather than silently create a fresh one.
Flue’s intended durability model assumes that an Agent or Workflow can reacquire the same workspace and continue from persisted session and filesystem state. Replaying completed filesystem steps against an empty replacement sandbox would be incorrect: the engine might skip a completed clone or build step even though its files no longer exist.
A durability-capable sandbox therefore needs a stable, persisted lease or identity that the runner can reacquire after interruption.
If the original sandbox cannot be recovered, the Action should fail or enter a blocked state rather than pretending that checkpoint replay is safe. Automatic workspace reconstruction or snapshots would be a separate feature.
Provider integrations
Cloudflare Workflows
Cloudflare Workflows appears to be a direct fit:
step.run()maps tostep.do().Flue would still own its Action API, scoped harness behavior, sandbox reconnection, Agent integration, schemas, and normalized events.
Inngest
Inngest is also a close conceptual fit. Inngest functions replay from the beginning and memoize completed
step.run()results.A Flue integration could register discovered durable execution roots as Inngest functions and adapt the Flue step facade onto Inngest’s durable step API.
Temporal
Temporal integration appears possible but less direct.
Temporal separates deterministic Workflow code from side-effecting Activities. Arbitrary inline callbacks that capture a Flue harness cannot necessarily be serialized or registered as Activities without build-time support.
Possible approaches include:
Temporal should remain a potential provider, but the initial portable contract should not claim that every provider can execute arbitrary inline closures identically.
Trigger.dev
Trigger.dev is a plausible engine for long-running Action execution and retries. Its task model may map more naturally to whole Actions or generated named subtasks than to inline checkpoint callbacks.
This is still useful, but it may offer different capabilities from Cloudflare Workflows or Inngest. Provider adapters should declare the durable primitives they support rather than Flue silently weakening guarantees.
Durable Actions called by Agents
The most valuable and difficult use case is an Agent invoking a durable Action as a model tool.
Flue would need to:
A model-invoked durable Action remains an Action invocation, not a Workflow Run. It needs internal execution identity and status, while only a
createWorkflow()binding creates the public Workflow Run envelope discussed in RFC #312.This integration should likely follow top-level durable Workflow Actions because it changes Agent tool-call recovery and settlement semantics.
Local behavior without an engine
A durable Action should remain runnable when no external durability engine is configured, especially for local development and reusable packages.
In that mode:
step.run()executes the callback in process;This lets authors share the same durable Action code regardless of deployment environment.
The local implementation must not imply guarantees it does not provide. Tooling and documentation should clearly identify whether the current runner has a durable engine attached.
Why not make every Action durable?
Flue should make configured durability broadly available by default across Agents and Workflows, but not every small Action should pay the cost of creating a provider run.
Durable execution introduces real overhead:
The proposed distinction is therefore:
defineDurableAction()opts a particular Action into provider-backed durable execution;defineAction()remains the lightweight path.The Action chooses durable-compatible semantics. The deployment chooses the engine and actual guarantees.
Relationship to RFC #312
This proposal builds on RFC #312 rather than changing its core model:
createWorkflow().Durable Actions add another execution option beneath that model:
The Actions RFC should land first because it establishes runner-owned harnesses, stable Action identities, scoped sessions, and a shared Action executor. Those are prerequisites for correct durable replay.
Possible implementation sequence
stepimplementation for durable Actions.Goals
Non-goals
Open questions
defineDurableAction()ordefineAction({ durable: true })?Beta Was this translation helpful? Give feedback.
All reactions