-
Notifications
You must be signed in to change notification settings - Fork 297
[do not merge] potential fixes for deferred response error handling #7453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
I'm not sure if this will always work - need to test nested queries looks like a possible fix for #2329
after the changes to the on_graphql_error selector, we need to make sure the event selector passes as well
@carodewig, please consider creating a changeset entry in |
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 5 new, 30 changed, 17 removed
Build ID: 32829321deb5d21eafb01224 URL: https://www.apollographql.com/docs/deploy-preview/32829321deb5d21eafb01224 |
// Always return `true` for `on_graphql_error` on the `response` selector. | ||
// Each chunk of a response (if a stream) or the full request should also be checked | ||
// with `on_response_event`. | ||
// TODO: this could well be a terrible idea but it's currently my only idea for how | ||
// to do error handling on deferred errors | ||
Some(opentelemetry::Value::Bool(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not work when using condition
with this selector. For instruments or events for example. I'm not sure, I'm suspicious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you could just ignore this selector in on_response
and also remove this selector from Stage::Response
in the is_active
method I think that could do the job
This has all unraveled after I pulled an innocuous-looking string. There are two semi-related changes in this PR that should probably be split into separate PRs. But as I'd like some early feedback on my approach to both of them, I'm raising this draft PR.
on_graphql_error
selector doesn't worksupergraph
stage (notably for coprocessors, but I suspect this is also true for logging etc).Issue 1
This arises because of the way we filter errors within
split_incremental_response
-error_path.starts_with(&path)
never returns true (given theerror_path
andpath
values I've observed), so all errors are filtered out.Current state:
I believe the fix for this is to remove the trailing
index
frompath
, but I need to test this on more complex deferred queries. My concern is a path like[Key("top"), Index(0), Key("value"), Index(1)]
is possible, and I don't know ifIndex(0)
will be present inerror.path
. (f913ee9)Issue 2
The
on_graphql_error
selector currently relies onCONTAINS_GRAPHQL_ERROR
within the response context. That value isn't actually set until the telemetry layer of the supergraph stage, so it works for router coprocessors but not for supergraph coprocessors.I'm generally concerned about using a global context value for
on_graphql_error
, since once issue 1 is fixed we can surface errors at any point in a deferred response. I'm not yet sure how to handle it at the router stage (where we're dealing with bytes), but at the supergraph stage I think it would be much better to make decisions viaresponse.errors
andresponse.incremental[*].errors
rather than relying onCONTAINS_GRAPHQL_ERROR
. (a6d801f and 4b7fdcd)I don't like my tentative solution of always returning
true
foron_graphql_error
on_response
and relying on callers to know to useon_event_response
.. but don't have a better idea at the moment.Bonus Issues / TBD
on_graphql_error
selector at the router and supergraph stages. This will need to either be fixed or be thoroughly documented.on_event_response
, so that will need to be updated or else everything will be logged onon_graphql_error: true
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩