-
Notifications
You must be signed in to change notification settings - Fork 75
Nexus complete operation workflow service API #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: nexus-external-callers
Are you sure you want to change the base?
Nexus complete operation workflow service API #625
Conversation
// The namespace from which the user operation handler is calling. | ||
string namespace = 2; | ||
// A request ID that can be used as an idempotentency key. | ||
string request_id = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not part of the Nexus spec ATM and we don't have a way to leverage this but it could be a nice addition for later to make the API idempotent. Please remove for now and track as an issue in nexus-rpc/api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The operation token as generated by the operation handler. Required to support the case when the original caller | ||
// receives the completion request before the response to the start operation call and ignored otherwise. | ||
string operation_token = 7; | ||
// The time the operation was started. Required to support the case when the original caller receives the completion | ||
// request before the response to the start operation call and ignored otherwise. | ||
google.protobuf.Timestamp started_time = 8; | ||
// Links returned when the operation was started. Required to support the case when the original caller receives | ||
// the completion request before the response to the start operation call and ignored otherwise. | ||
repeated temporal.api.nexus.v1.Link links = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this for now since I wouldn't expect users to set this for manual completion. It's already difficult enough for our SDKs to implement this properly.
Generally, I think we may want some logic in the server to delay handling the completion callback internally if the operation hasn't been successfully started yet instead of asking the operation handler to fabricate the start information, that approach is proving to put too much burden on the handler.
// The callback containing the URL where the server should deliver the completion and additional headers to forward | ||
// with the request. | ||
temporal.api.common.v1.Callback.Nexus callback = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit concerned on security of this. What validation is applied to this? Specifically, do we validate everything else here is valid/trusted before firing off to this URL or other task queue? Do we only allow limited URLs? Is there some way we can confirm this came from us on a start call instead of just letting any freeform value in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CompleteOperation request will always be authorized against the caller namespace. If the target is a task queue in that caller namespace, then we have already validated that they should have permissions to send that request.
If the target is another namespace, then the callback request will be made through an HTTP call which will go through the authorization for that target namespace.
If the URL is external, the caller is responsible for making sure they have the right authorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider providing a way to validate that the callback here is the one we provided to start. For example, maybe a signature. That a client (even authenticated) can be used to send an HTTP anywhere unrelated to a real Nexus call is a bit concerning. Ideally we can at least make sure it's a result of a Nexus operation start request and that the value is one we gave to them (as given to us from the caller side). Does not have to be done in this PR, but I am worried we'll lose the concern if not tracked somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergundy reminded me we have a dynamic config for allowed callback URL template. We would validate these incoming URLs against that as well to ensure we are not sending HTTP requests outside temporal. These callback requests will also need to have a valid callback token generated by Temporal, so we will have high confidence that these callbacks came from Temporal. Those tokens are not encrypted (yet) but do contain at least one UUID so would be nearly impossible to guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I will probably want to change all of our public facing user provided errors to temporal.api.v1.Failure
to simplify working with failure converters and proxies.
What changed?
Added a new workflow service API for
CompleteNexusOperation
Why?
To support operation completion by non-workflow callers.