-
Notifications
You must be signed in to change notification settings - Fork 16
[WIP] Refactor open telemetry plugin #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,67 @@ | |||
// This must be run first. Node uses patching on implementations to inject telemetry so must run before the packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would not live inside the plugin as it is server setup for otel.
The way otel is setup is by patching of libraries such as fetch, for this reason it needs to run outside of the plugin and be one imported before most packages.
api.context.setGlobalContextManager(contextManager); | ||
|
||
const exporter = new OTLPTraceExporter({ | ||
url: "todo:configuration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to drive this from configuration. I have done this within our gateway implementation but that has a different approach to the hive-gateway to config so Im not sure of the best approach.
@@ -0,0 +1,45 @@ | |||
import {ASTNode, DocumentNode, Kind, visit} from "graphql/index"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: test this
], | ||
logging: false, | ||
}); | ||
it('query should add spans', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: more extensive testing.
Im not sure on the best approach to testing this as a gateway, i.e. testing it calls a subgraph.
Query: { | ||
hello: () => 'World', | ||
ping: () => { | ||
expect(api.context.active()).not.toEqual(api.ROOT_CONTEXT); // proves that the context is propagated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to figure out a better way to test this.
Hi @darren-west , we've just merged #532 , @EmrysMyrddin did amazing work on refactoring the internals of the plugin system and gw core, so now The new OTEL plugin relies on the core instrumentation, so OTEL spans are now fully correlated with the gw runtime - also includes fixes for how we handle terminations, retries, batched requests, request otel context and many other areas where the previous implementation failed to deliver. @EmrysMyrddin took a lot of inspiration for this PR, thanks @darren-west ! Upcoming PRs are going to address most of the open issues related to OTEL. We are also experimenting with new v2 of otel-js sdk (#875). I'm going to close this one, @darren-west please let me know if we missed anything, we can create follow-up issues and plan this in our next iteration on the OTEL plugin 🙏 |
Summary
This is a refactor to make the opentelemetry plugin in the hive gateway conform closer to the opentelemetry api spec specifically around how context is propagated within the node sdk.
The goal is to allow the gateway to take advantage of the instrumentations available in the otel libraries, for example http, dns, grpc etc. The big benefit of this is it allows custom plugins to benefit from the instrumentations which is a requirement that we have, where we create a grpc request within the context phase of GraphQL.
I put some effort into removing potentially sensitive data from traces, for example error messages. Im not sure on the best approach perhaps an option could be passed in to support this if its a requirement for others.
In otel for node it uses async local storage to wrap functions to propagate the context, this PR introduces the wrapping of the different phases of GraphQL so that any span has the correct trace parent. This can be seen when using http instrumentation, because the execute phase is wrapped using api.with the subsequent fetch will have the correct trace parent.
Questions to answer.
TODO: