Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich root spans that represent a dependency #125

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions enrichments/trace/internal/elastic/span.go
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ func (s *spanEnrichmentContext) Enrich(span ptrace.Span, cfg config.Config) {
func (s *spanEnrichmentContext) enrich(span ptrace.Span, cfg config.Config) {
if s.isTransaction {
s.enrichTransaction(span, cfg.Transaction)
s.enrichExitSpanTransaction(span, cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore, maybe this is personal preference. But the code may be a bit easier to follow when pulling in the if condition to this method, so that it's clear that only transactions that are of type client or producer get enriched.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we need a dedicated method for this at all.

Couldn't we just have some heuristic method in addition that just identifies whether a span is an exist span, use that in an additional condition here, but then just calling the existing enrichSpan() method.

Copy link
Contributor Author

@gregkalapos gregkalapos Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still needed to extend enrichSpan() a bit, but now I got rid of enrichExitSpanTransaction - so there is only enrichSpan and enrichTransaction as we had it originally.

The 2 main differences in enrichExitSpanTransaction where:

  • Not setting processor.event to span
  • Setting transaction.type

We handle these now with a condition.

I also moved the check up to this line.

} else {
s.enrichSpan(span, cfg.Span)
}
Expand Down Expand Up @@ -237,6 +238,39 @@ func (s *spanEnrichmentContext) enrichTransaction(
}
}

// In OTel a root span can represent an outgoing call or a producer span
gregkalapos marked this conversation as resolved.
Show resolved Hide resolved
// In such cases, the span is still mapped into a transaction, but such spans are enriched
gregkalapos marked this conversation as resolved.
Show resolved Hide resolved
// with additional attributes that are specific to the outgoing call or producer span.
func (s *spanEnrichmentContext) enrichExitSpanTransaction(
span ptrace.Span,
cfg config.Config,
) {
if span.Kind() == ptrace.SpanKindClient || span.Kind() == ptrace.SpanKindProducer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can rely on that condition, tbh.

Since that entire situation we try to handle here is an instrumentation bug in itself (IN THEORY: there never should be a single span that is entry and exit at the same time, but IN PRACTICE that happens in some situations), I don't think we should rely on SpanKind at all, tbh.

For example, what if the SpanKind == SpanKindServer, but the span still represents entry and exit at the same time?

How about, we use heuristics similar to how we do it for deriving span.type to identify whether a span is an exit span or not. For example, checking for certain attributes that would only make sense on exit spans but not on entry spans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid point and was thinking about this as well. My idea was to be on the safe side and only enrich real exit spans with the necessary attributes.

I understand we have this issue with nginx proxy spans that are a single span for both incoming and outgoing calls. I don't know what's the span type there, but if it's SpanKindServer then we would not enrich that indeed.

But that use-case is already a bug and I'm not sure how much we want to unwind such bugs.

For example, what if the SpanKind == SpanKindServer, but the span still represents entry and exit at the same time?

I think that's bug in the instrumentation. SemConv says :

For an HTTP client span, SpanKind MUST be Client.

and

For an HTTP server span, SpanKind MUST be Server.

So for HTTP that goes against the spec and I'm not sure how much we should fight these bugs.

How about, we use heuristics similar to how we do it for deriving span.type to identify whether a span is an exit span or not. For example, checking for certain attributes that would only make sense on exit spans but not on entry spans.

We could explore that, but I'm not sure how reliable it is. The check if something is an exit span is a bit different - there is no differentiation in that check between incoming and outgoing spans, because incomings are already assumed to be transactions, so all those cases are already filtered out at that point by only doing the check for spans.

I just quickly looked at HTTP spans, what I see, the server side only has url.path which is only required in server spans, and is not present in client spans. And then url.full is only on theclient-side. But if spankind is already invalid, I'm not sure how much we can rely on attributes. E.g. what if both are present?

Overall I feel we'd end up with a messy heuristic which may not work in all cases and may not even unwind these bugs in a way we'd want to.

If we relax this check here, I think the risk we run here is that we may categorize some incoming calls as exit spans and we start calculating dependencies for those. That'd be very bad. If we can come up with something to avoid that, then I'd feel more confident to relax this.

if cfg.Span.TypeSubtype.Enabled {
s.setSpanTypeSubtype(span)
}
if cfg.Span.ServiceTarget.Enabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregkalapos What about the thing we discussed about having two ProcessorEvent values in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and that triggers an error when the water-flow chart is generated. These transactions will be also returned for all queries that look for spans and there were some missing span related fields there.

@axw had a related idea: we could just remove the filter on processor.event from the relevant queries in Kibana. I'd rather explore that option and potentially get rid of processor.event in those queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @AlexanderWert and we said, the correct way would be to still set both span and transaction and it'd be nice to not have to update Kibana.

In 3def749 I changed this.

Now that we share the enrichSpan() method, there are no missing field errors when the waterfall chart is generated. With above commit, this seems working:

Screenshot 2024-12-09 at 19 28 32

However, order matters here. This is the code in Kibana that builds up the chart - and if it's [span transaction], then it's still categorized as a span and not as a transaction leading to an empty waterfall chart.

So, it seems we can set both span and transaction on such root exit spans, but I'd say mid/long term we should still update Kibana to make it more robust.

s.setServiceTarget(span)
}
if cfg.Span.DestinationService.Enabled {
s.setDestinationService(span)
}
if cfg.Span.Name.Enabled {
span.Attributes().PutStr(AttributeSpanName, span.Name())
}
if cfg.Transaction.Type.Enabled {
spanTypeAttr, hasType := span.Attributes().Get(AttributeSpanType)
if hasType {
transactionType := spanTypeAttr.Str()
if spanSubtypeAttr, hasSubType := span.Attributes().Get(AttributeSpanSubtype); hasSubType {
transactionType += "." + spanSubtypeAttr.Str()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to hear opinion on this one.

The APM UI groups transactions based on transaction type. The idea here is that for such exit&root spans we'd want to use a different transaction type so they don't end up in the same group with incoming transactions.

Screenshot 2024-11-28 at 16 55 18

E.g. the default transaction type for incoming HTTP is request - with this, for outgoing we'd use external.http. It can be that a service sends GET both as outgoing root span and as an incoming transaction. This way those 2 transactions will have different types.

Alternatively to . we could also have / - e.g. external/http.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will that impact other things like service map or so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I haven't seen any impact. Service map is mostly driven by the target.* and destination.* fields from what I know.

The dependency list has icons - that is based on this span, but that is not driven by transaction.type either, so that's not impacted either.

}
span.Attributes().PutStr(AttributeTransactionType, transactionType)
}
}
}
}

func (s *spanEnrichmentContext) enrichSpan(
span ptrace.Span,
cfg config.ElasticSpanConfig,
Expand Down
140 changes: 140 additions & 0 deletions enrichments/trace/internal/elastic/span_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,146 @@ func TestElasticTransactionEnrich(t *testing.T) {
}
}

// Tests root spans that represent a dependency and are mapped to a transaction.
func TestRootSpanAsDependencyEnrich(t *testing.T) {
for _, tc := range []struct {
name string
input ptrace.Span
config config.Config
enrichedAttrs map[string]any
expectedSpanLinks *ptrace.SpanLinkSlice
}{
{
name: "outgoing_http_root_span",
input: func() ptrace.Span {
span := ptrace.NewSpan()
span.SetName("rootClientSpan")
span.SetSpanID([8]byte{1})
span.SetKind(ptrace.SpanKindClient)
span.Attributes().PutStr(semconv.AttributeHTTPMethod, "GET")
span.Attributes().PutStr(semconv.AttributeHTTPURL, "http://localhost:8080")
span.Attributes().PutInt(semconv.AttributeHTTPResponseStatusCode, 200)
span.Attributes().PutStr(semconv.AttributeNetworkProtocolVersion, "1.1")
return span
}(),
config: config.Enabled(),
enrichedAttrs: map[string]any{
AttributeTimestampUs: int64(0),
AttributeTransactionName: "rootClientSpan",
AttributeProcessorEvent: "transaction",
AttributeSpanType: "external",
AttributeSpanSubtype: "http",
AttributeSpanDestinationServiceResource: "localhost:8080",
AttributeSpanName: "rootClientSpan",
AttributeEventOutcome: "success",
AttributeSuccessCount: int64(1),
AttributeServiceTargetName: "localhost:8080",
AttributeServiceTargetType: "http",
AttributeTransactionID: "0100000000000000",
AttributeTransactionDurationUs: int64(0),
AttributeTransactionRepresentativeCount: float64(1),
AttributeTransactionResult: "HTTP 2xx",
AttributeTransactionType: "external.http",
AttributeTransactionSampled: true,
AttributeTransactionRoot: true,
},
},
{
name: "db_root_span",
input: func() ptrace.Span {
span := ptrace.NewSpan()
span.SetName("rootClientSpan")
span.SetSpanID([8]byte{1})
span.SetKind(ptrace.SpanKindClient)
span.Attributes().PutStr(semconv.AttributeDBSystem, "mssql")

span.Attributes().PutStr(semconv.AttributeDBName, "myDb")
span.Attributes().PutStr(semconv.AttributeDBOperation, "SELECT")
span.Attributes().PutStr(semconv.AttributeDBStatement, "SELECT * FROM wuser_table")
return span
}(),
config: config.Enabled(),
enrichedAttrs: map[string]any{
AttributeTimestampUs: int64(0),
AttributeTransactionName: "rootClientSpan",
AttributeProcessorEvent: "transaction",
AttributeSpanType: "db",
AttributeSpanSubtype: "mssql",
AttributeSpanDestinationServiceResource: "mssql",
AttributeSpanName: "rootClientSpan",
AttributeEventOutcome: "success",
AttributeSuccessCount: int64(1),
AttributeServiceTargetName: "myDb",
AttributeServiceTargetType: "mssql",
AttributeTransactionID: "0100000000000000",
AttributeTransactionDurationUs: int64(0),
AttributeTransactionRepresentativeCount: float64(1),
AttributeTransactionResult: "Success",
AttributeTransactionType: "db.mssql",
AttributeTransactionSampled: true,
AttributeTransactionRoot: true,
},
},
{
name: "producer_messaging_span",
input: func() ptrace.Span {
span := ptrace.NewSpan()
span.SetName("rootClientSpan")
span.SetSpanID([8]byte{1})
span.SetKind(ptrace.SpanKindProducer)

span.Attributes().PutStr(semconv.AttributeServerAddress, "myServer")
span.Attributes().PutStr(semconv.AttributeServerPort, "1234")
span.Attributes().PutStr(semconv.AttributeMessagingSystem, "rabbitmq")
span.Attributes().PutStr(semconv.AttributeMessagingDestinationName, "T")
span.Attributes().PutStr(semconv.AttributeMessagingOperation, "publish")
span.Attributes().PutStr(semconv.AttributeMessagingClientID, "a")
return span
}(),
config: config.Enabled(),
enrichedAttrs: map[string]any{
AttributeTimestampUs: int64(0),
AttributeTransactionName: "rootClientSpan",
AttributeProcessorEvent: "transaction",
AttributeSpanType: "messaging",
AttributeSpanSubtype: "rabbitmq",
AttributeSpanDestinationServiceResource: "rabbitmq/T",
AttributeSpanName: "rootClientSpan",
AttributeEventOutcome: "success",
AttributeSuccessCount: int64(1),
AttributeServiceTargetName: "T",
AttributeServiceTargetType: "rabbitmq",
AttributeTransactionID: "0100000000000000",
AttributeTransactionDurationUs: int64(0),
AttributeTransactionRepresentativeCount: float64(1),
AttributeTransactionResult: "Success",
AttributeTransactionType: "messaging.rabbitmq",
AttributeTransactionSampled: true,
AttributeTransactionRoot: true,
},
},
} {
t.Run(tc.name, func(t *testing.T) {
expectedSpan := ptrace.NewSpan()
tc.input.CopyTo(expectedSpan)

// Merge with the expected attributes and override the span links.
for k, v := range tc.enrichedAttrs {
expectedSpan.Attributes().PutEmpty(k).FromRaw(v)
}
// Override span links
if tc.expectedSpanLinks != nil {
tc.expectedSpanLinks.CopyTo(expectedSpan.Links())
} else {
expectedSpan.Links().RemoveIf(func(_ ptrace.SpanLink) bool { return true })
}

EnrichSpan(tc.input, tc.config)
assert.NoError(t, ptracetest.CompareSpan(expectedSpan, tc.input))
})
}
}

// Tests the enrichment logic for elastic's span definition.
func TestElasticSpanEnrich(t *testing.T) {
now := time.Unix(3600, 0)
Expand Down