Skip to content

Conversation

pdoerner
Copy link
Contributor

What changed?
Added a new workflow service API for CompleteNexusOperation

Why?
To support operation completion by non-workflow callers.

@pdoerner pdoerner requested review from a team as code owners July 29, 2025 21:51
@pdoerner pdoerner requested review from bergundy and cretz July 29, 2025 21:51
// The namespace from which the user operation handler is calling.
string namespace = 2;
// A request ID that can be used as an idempotentency key.
string request_id = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not part of the Nexus spec ATM and we don't have a way to leverage this but it could be a nice addition for later to make the API idempotent. Please remove for now and track as an issue in nexus-rpc/api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 2741 to 2749
// The operation token as generated by the operation handler. Required to support the case when the original caller
// receives the completion request before the response to the start operation call and ignored otherwise.
string operation_token = 7;
// The time the operation was started. Required to support the case when the original caller receives the completion
// request before the response to the start operation call and ignored otherwise.
google.protobuf.Timestamp started_time = 8;
// Links returned when the operation was started. Required to support the case when the original caller receives
// the completion request before the response to the start operation call and ignored otherwise.
repeated temporal.api.nexus.v1.Link links = 9;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this for now since I wouldn't expect users to set this for manual completion. It's already difficult enough for our SDKs to implement this properly.
Generally, I think we may want some logic in the server to delay handling the completion callback internally if the operation hasn't been successfully started yet instead of asking the operation handler to fabricate the start information, that approach is proving to put too much burden on the handler.

Comment on lines 2730 to 2732
// The callback containing the URL where the server should deliver the completion and additional headers to forward
// with the request.
temporal.api.common.v1.Callback.Nexus callback = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned on security of this. What validation is applied to this? Specifically, do we validate everything else here is valid/trusted before firing off to this URL or other task queue? Do we only allow limited URLs? Is there some way we can confirm this came from us on a start call instead of just letting any freeform value in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CompleteOperation request will always be authorized against the caller namespace. If the target is a task queue in that caller namespace, then we have already validated that they should have permissions to send that request.

If the target is another namespace, then the callback request will be made through an HTTP call which will go through the authorization for that target namespace.

If the URL is external, the caller is responsible for making sure they have the right authorization.

Copy link
Member

@cretz cretz Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider providing a way to validate that the callback here is the one we provided to start. For example, maybe a signature. That a client (even authenticated) can be used to send an HTTP anywhere unrelated to a real Nexus call is a bit concerning. Ideally we can at least make sure it's a result of a Nexus operation start request and that the value is one we gave to them (as given to us from the caller side). Does not have to be done in this PR, but I am worried we'll lose the concern if not tracked somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergundy reminded me we have a dynamic config for allowed callback URL template. We would validate these incoming URLs against that as well to ensure we are not sending HTTP requests outside temporal. These callback requests will also need to have a valid callback token generated by Temporal, so we will have high confidence that these callbacks came from Temporal. Those tokens are not encrypted (yet) but do contain at least one UUID so would be nearly impossible to guess.

@pdoerner pdoerner requested review from bergundy and cretz August 5, 2025 17:02
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I will probably want to change all of our public facing user provided errors to temporal.api.v1.Failure to simplify working with failure converters and proxies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants