Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span Export fails with OpenTelemetry 1.32.0 #68

Closed
Traderjoe95 opened this issue Nov 30, 2023 · 2 comments
Closed

Span Export fails with OpenTelemetry 1.32.0 #68

Traderjoe95 opened this issue Nov 30, 2023 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Traderjoe95
Copy link

Version

The issue is certainly present in Vert.x 4.5.0, I haven't tried to reproduce it on older versions.

Context

The OpenTelemetry Span exporters (reproduced with Jaeger gRPC and OTLP gRPC) started throwing NullPointerException after updating to OpenTelemetry 1.32.0. The exception states that the vertxCtx is null.

java.lang.NullPointerException: Cannot invoke "io.vertx.core.Context.getLocal(Object)" because "vertxCtx" is null
    at io.vertx.tracing.opentelemetry.VertxContextStorageProvider$VertxContextStorage.attach(VertxContextStorageProvider.java:27) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.tracing.opentelemetry.VertxContextStorageProvider$VertxContextStorage.attach(VertxContextStorageProvider.java:23) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.opentelemetry.context.Context.makeCurrent(Context.java:202) ~[opentelemetry-context-1.32.0.jar:1.32.0]
    at io.opentelemetry.context.Context.lambda$wrap$1(Context.java:211) ~[opentelemetry-context-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.internal.InstrumentationUtil.suppressInstrumentation(InstrumentationUtil.java:28) ~[opentelemetry-exporter-common-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.sender.okhttp.internal.OkHttpGrpcSender.send(OkHttpGrpcSender.java:116) ~[opentelemetry-exporter-sender-okhttp-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.internal.grpc.GrpcExporter.export(GrpcExporter.java:61) ~[opentelemetry-exporter-common-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter.export(OtlpGrpcSpanExporter.java:73) ~[opentelemetry-exporter-otlp-1.32.0.jar:1.32.0]
    at io.opentelemetry.sdk.trace.export.SimpleSpanProcessor.onEnd(SimpleSpanProcessor.java:81) ~[opentelemetry-sdk-trace-1.32.0.jar:1.32.0]
    at io.opentelemetry.sdk.trace.SdkSpan.endInternal(SdkSpan.java:451) ~[opentelemetry-sdk-trace-1.32.0.jar:1.32.0]
    at io.opentelemetry.sdk.trace.SdkSpan.end(SdkSpan.java:431) ~[opentelemetry-sdk-trace-1.32.0.jar:1.32.0]
    at io.vertx.tracing.opentelemetry.OpenTelemetryTracer.end(OpenTelemetryTracer.java:97) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.tracing.opentelemetry.OpenTelemetryTracer.receiveResponse(OpenTelemetryTracer.java:143) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.tracing.opentelemetry.OpenTelemetryTracer.receiveResponse(OpenTelemetryTracer.java:31) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.kafka.client.common.tracing.ProducerTracer$StartedSpan.finish(ProducerTracer.java:89) ~[vertx-kafka-client-4.5.0.jar:4.5.0]
    at io.vertx.kafka.client.producer.impl.KafkaWriteStreamImpl.lambda$null$2(KafkaWriteStreamImpl.java:111) ~[vertx-kafka-client-4.5.0.jar:4.5.0]
    at org.apache.kafka.clients.producer.KafkaProducer$AppendCallbacks.onCompletion(KafkaProducer.java:1505) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:270) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:234) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.ProducerBatch.complete(ProducerBatch.java:180) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:692) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:663) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:589) ~[kafka-clients-3.5.0.jar:?]
    at java.util.ArrayList.forEach(Unknown Source) ~[?:?]
    at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:576) ~[kafka-clients-3.5.0.jar:?]
    at java.lang.Iterable.forEach(Unknown Source) ~[?:?]
    at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:576) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:850) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:594) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:586) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) [kafka-clients-3.5.0.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:?]

Do you have a reproducer?

No, but I pinned down the issue pretty clearly. The stack trace above occurs on the Kafka Producer Network Thread. This thread is obviously not a Vert.x thread (also none with virtual threads support, as added in Vert.x 4.5.0), so it is not expected to have a Vert.x context. Since OpenTelemetry 1.32.0, a new utility is added, which suppresses instrumentation for certain blocks of code. This is achieved by putting a special OpenTelemetry Context into scope, which contains a flag to disable instrumentation:

open-telemetry/opentelemetry-java@v1.31.0...v1.32.0#diff-e0f86f9c669c5dfdee741804e40a62f9b122ec756c2df1acc56a8d7a9ff3247fR28

This utility is used around the network calls of the span exporters, and thus tries to bring the special context into scope. Because there is no Vert.x context in this thread, the code at VertxContextStorageProvider.java:27 throws the NullPointerException.

As an additional remark, this could also hint at an issue in the Vert.x Kafka Client. I would expect the KafkaWriteStreamImpl to delegate back to the Event Loop right in between these stack frames:

    at io.vertx.kafka.client.producer.impl.KafkaWriteStreamImpl.lambda$null$2(KafkaWriteStreamImpl.java:111) ~[vertx-kafka-client-4.5.0.jar:4.5.0]
    at org.apache.kafka.clients.producer.KafkaProducer$AppendCallbacks.onCompletion(KafkaProducer.java:1505) ~[kafka-clients-

but that doesn't appear to happen. Would this be worth investigating?

Proposed Solution

I would suggest that VertxContextStorage delegates to OpenTelemetry's default ThreadLocalContextStorage when called on a thread where there's no Vert.x context. This would enable OpenTelemetry tracing also on non-Vert.x threads while not violating any Vert.x guarantees. What do you think?

If you agree, I am open to submit a PR.

@Traderjoe95 Traderjoe95 added the bug Something isn't working label Nov 30, 2023
@tsegismont tsegismont self-assigned this Dec 1, 2023
@tsegismont tsegismont added this to the 4.5.1 milestone Dec 1, 2023
@tsegismont
Copy link
Contributor

Seems related to #62

@tsegismont
Copy link
Contributor

Fixed by #72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants