Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor open telemetry plugin #328

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

darren-west
Copy link

@darren-west darren-west commented Dec 13, 2024

Summary

This is a refactor to make the opentelemetry plugin in the hive gateway conform closer to the opentelemetry api spec specifically around how context is propagated within the node sdk.

The goal is to allow the gateway to take advantage of the instrumentations available in the otel libraries, for example http, dns, grpc etc. The big benefit of this is it allows custom plugins to benefit from the instrumentations which is a requirement that we have, where we create a grpc request within the context phase of GraphQL.

I put some effort into removing potentially sensitive data from traces, for example error messages. Im not sure on the best approach perhaps an option could be passed in to support this if its a requirement for others.

In otel for node it uses async local storage to wrap functions to propagate the context, this PR introduces the wrapping of the different phases of GraphQL so that any span has the correct trace parent. This can be seen when using http instrumentation, because the execute phase is wrapped using api.with the subsequent fetch will have the correct trace parent.

Questions to answer.

  • The open telemetry instrumentations predominantly work by patching, which means that the bootstrapping code needs to run before packages are imported. An example is the patching of gRPC or fetch. This makes it not fit with running this logic in a plugins init phase as its to late.
  • How do we configure the bootstrapping of otel within the hive-gateways configuration.
  • Testing this as a gateway, whats the best approach for instance how can I test the onSubgraphExecute function?

TODO:

  • Refactor and add tests, reintroducing existing features.
  • Figure out where best to run the bootstrapping of the otel library.

@@ -0,0 +1,67 @@
// This must be run first. Node uses patching on implementations to inject telemetry so must run before the packages
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not live inside the plugin as it is server setup for otel.
The way otel is setup is by patching of libraries such as fetch, for this reason it needs to run outside of the plugin and be one imported before most packages.

api.context.setGlobalContextManager(contextManager);

const exporter = new OTLPTraceExporter({
url: "todo:configuration",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to drive this from configuration. I have done this within our gateway implementation but that has a different approach to the hive-gateway to config so Im not sure of the best approach.

@@ -0,0 +1,45 @@
import {ASTNode, DocumentNode, Kind, visit} from "graphql/index";
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: test this

],
logging: false,
});
it('query should add spans', async () => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: more extensive testing.

Im not sure on the best approach to testing this as a gateway, i.e. testing it calls a subgraph.

Query: {
hello: () => 'World',
ping: () => {
expect(api.context.active()).not.toEqual(api.ROOT_CONTEXT); // proves that the context is propagated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to figure out a better way to test this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant