Skip to content

Conversation

@agocke
Copy link
Member

@agocke agocke commented Jul 5, 2025

Most cases were fairly simple, but a feature switch was needed for deserialization from a stream. I've audited all the call sites and it doesn't look like there are many types (if any) currently supported for event tracing that would be affected by disabling the feature switch.

agocke added 2 commits July 4, 2025 22:14
Fixes up simple annotation issues and APIs that have an easy AOT-safe
alternative.
@agocke
Copy link
Member Author

agocke commented Jul 5, 2025

cc @hoyosjs @tommcdon I think this is needed for making dotnet-gcdump AOT-compatible

@brianrob
Copy link
Member

brianrob commented Jul 6, 2025

I've had a chance to look through this PR. I'd like to have a conversation about what the goal is here and have some follow-up questions on several of the changes. My initial thought is that this change brings in several things that I don't want to maintain and/or I don't believe will work for TraceEvent all-up. To preview a few things:

  • I really do not want to add a new TFM for these libraries. We used to target multiple TFMs and this was much more complex for a variety of reasons.
  • I will need help understanding the requirement for PolySharp and how this impacts runtime execution. PerfView embeds its own dependencies, and so we'd need to understand the impact here.
  • Primitive-only deserialization will not work - This will break arbitrary trace files.

I will likely have more questions if we can resolve these issues, but these are the big rocks.

int bytes = _ctfStream.EventHeader.GetSize();
}

private bool ReadStruct<T>(out T result) where T : struct
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method appears unused

{
_log.WriteLine("Trying to generate the file {0}.", target);
var toolsDir = Path.GetDirectoryName(typeof(SourceFile).GetTypeInfo().Assembly.ManifestModule.FullyQualifiedName);
var archToolsDir = Path.Combine(toolsDir, NativeDlls.ProcessArchitectureDirectory);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable looks unused

<ItemGroup>
<PackageReference Include="Microsoft.Diagnostics.Tracing.TraceEvent.SupportFiles" Version="$(MicrosoftDiagnosticsTracingTraceEventSupportFilesVersion)" />
<PackageReference Include="Microsoft.Win32.Registry" Version="$(MicrosoftWin32RegistryVersion)" />
<PackageReference Include="PolySharp" Version="1.14.1">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package polyfills attributes for netstandard2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use latest version 1.15.0? Any breaking change that prevents usage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably works fine? I just did nuget add package and it did this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wired that this command adds on older version and not the most recent one. Try the update and look what happens.

#if NET
manifestStream.ReadExactly(serializedManifest, 0, len);
#else
manifestStream.Read(serializedManifest, 0, len);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost certainly a bug

@agocke
Copy link
Member Author

agocke commented Jul 6, 2025

I'd like to have a conversation about what the goal is here

The main goal is to support Native AOT for a number of diagnostic tools that use this library. Most choices are downstream of that goal:

I really do not want to add a new TFM for these libraries.

There isn't really a choice here. netstandard2.0 isn't annotated for trimming/AOT, so in order to see the incompatibilities you have to multi-target.

I will need help understanding the requirement for PolySharp and how this impacts runtime execution. PerfView embeds its own dependencies, and so we'd need to understand the impact here.

Polysharp just adds attributes so we can avoid ifdef'ing every usage of RequiresUnreferencedCode or SupportedOS. It shouldn't have any runtime impact.

Primitive-only deserialization will not work - This will break arbitrary trace files.

This is the key part of a "feature switch" -- it changes behavior when intent to trim (via PublishAot/PublishTrimmed/IsTrimmable/IsAotCompatible) is enabled. This will produce different behavior when trimming, but the old behavior is categorically impossible to support when trimming, so it's kind of the best middle-ground on offer. The alternative is to simply disallow use of the API entirely, which is problematic because it is pervasive in this library.

@brianrob
Copy link
Member

brianrob commented Jul 7, 2025

The main goal is to support Native AOT for a number of diagnostic tools that use this library. Most choices are downstream of that goal:

I think that we should get together to discuss the goals and priority here. As it stands, I am not inclined to take this change - this puts a burden on TraceEvent and its maintainers that isn't free and creates another row in the test matrix (AOT). I think we need to understand the business need here.

@agocke
Copy link
Member Author

agocke commented Jul 7, 2025

@tommcdon @steveisok @noahfalk can you provide some guidance here? My understanding was that it’s a long-term goal to move the diagnostics tools to aot to eliminate the dependency on a shared runtime in diagnostics collection scenarios. Should trace event be part of that scenario, or is removing the dependency an option?

@noahfalk
Copy link
Member

noahfalk commented Jul 8, 2025

@tommcdon @steveisok @noahfalk can you provide some guidance here? My understanding was that it’s a long-term goal to move the diagnostics tools to aot to eliminate the dependency on a shared runtime in diagnostics collection scenarios. Should trace event be part of that scenario, or is removing the dependency an option?

I'd agree its a long-term goal. So far it hasn't been high enough priority that we've put any effort into it and I wasn't thinking of it as pressing. Is there anything creating urgency or could we just park this and revisit when converting the tools is a higher priority?

In terms of the dependency I think we have a mixture of:

  • some tools don't use TraceEvent at all
  • some tool features need potentially constrained subsets like NetTrace parsing and gcdump formatting
  • some tool features like 'dotnet-trace report topN' probably have much broader code usage within TraceEvent. We could potentially cut or modify features like this to remove the dependency but that has user impacts and development cost.
    It would probably be hard to remove the dependency if we want to keep all the features.

@agocke
Copy link
Member Author

agocke commented Jul 8, 2025

I'm skeptical that there's going to be a future time where we have more time and resources to devote to this compared to now.

Can you say a bit more about what you would expect to change in the future? What new information do we need? What kind of thing would change that would shift our decision?

@agocke
Copy link
Member Author

agocke commented Jul 8, 2025

One more thing: almost every library in .NET multi-targets. It is a recommended strategy for most of our modern .NET scenarios and will likely continue to be popular far into the future. I don't see a reason why EventTrace would be different in this regard. What makes it special compared to every other .NET library?

@noahfalk
Copy link
Member

noahfalk commented Jul 8, 2025

I'm skeptical that there's going to be a future time where we have more time and resources to devote to this compared to now.

Can you say a bit more about what you would expect to change in the future? What new information do we need? What kind of thing would change that would shift our decision?

I largely think of it as priority and available devs to work on it. Andy, was it your plan that you were going to convert all the dotnet-* tools to NativeAOT yourself? If yes then we should try to figure this out now. If not and it fell to the diagnostics team to do the conversion then I am guessing it wouldn't happen until either:

  • priority of converting the tools went up (more customer dissat about acquisition experience or some broader scenario requires it)
  • other work priorities go down/get completed (SFI, engineering, multiple runtime convergence efforts, playing catchup to mandatory runtime features, new diagnostic tools and features, numerous bugs)

Of course if there is something I'm overlooking that makes the conversion more urgent/valuable than I realize do let me know. So far it just wasn't looming particularly large on my radar.

@agocke
Copy link
Member Author

agocke commented Jul 8, 2025

Andy, was it your plan that you were going to convert all the dotnet-* tools to NativeAOT yourself?

I was going to see how much work it would be. I started with dotnet-gcdump. Looks like this was the only blocker -- after I fixed TraceEvent I think dotnet-gcdump just works. I haven't yet looked at the rest of the tools

@brianrob
Copy link
Member

brianrob commented Jul 9, 2025

@noahfalk and I chatted about this offline yesterday. When I look at this change, I am most worried about it from the perspective of support cost. That's both in terms of additional development-time and testing complexity (e.g. new TFM) and behavior complexity (e.g. different behavior across different runtimes).

As a path forward, I think that I would like to step back and understand what the various options are for carving out the set of functionality that is capable of being AOT'd. My read of the current PR (I could be wrong), is that everything would be available for use in NativeAOT, though some behavior would be different. I'd like to take a different approach where we assert that all APIs available in NativeAOT have consistent behavior with non-AOT usage. If that's not possible for an API, then by default, I'd like it to not be available for usage in NativeAOT'd apps. This may or may not produce an issue for the current scenario (diagnostic tools), but I'd much rather have a conversation and find a path forward, knowing that we have API behavior consistency.

I would like to start this exercise by assuming that no APIs are available for NativeAOT, and then let's bring in the minimal set required for the diagnostic tools. As that's happening, if there are APIs that can't be supported, that's when we have a discussion about how to move forward. I can imagine potentially building new APIs that are more NativeAOT friendly so that we can be very clear about what works and what doesn't, and we'd be forcing folks that want to build NativeAOT'd apps that use TraceEvent to make changes to their code when an API that they currently use doesn't work with NativeAOT. I imagine that we have a good amount of experience doing this kind of work, and so I'm hoping that we won't be re-inventing the wheel here.

Regarding adding a new TFM, I'd like to understand what other options exist here if any. Is the issue that we're not targeting netstandard21, or just that no version of netstandard is sufficient for NativeAOT? I'm thinking that if we need to add a new TFM, I'm not sure if I prefer to just add the TFM to TraceEvent, or if I'd rather create a new DLL that contains the AOT'able subset of TraceEvent and allow folks to consume that.

I'm sure that we can find a good path forward here so that we end up in the right place. If we aren't sure how far we're going to get with converting the diagnostic tools to NativeAOT, then it might also be worth sticking with what you have for now, continuing down the path of converting the tools against a private branch of TraceEvent, and if all goes well, then we can come back to how we implement NativeAOT in TraceEvent. If that's too burdensome, we can certainly start to delve into the answers to these questions now.


// TODO this is very inefficient for blitable types. Optimize that.
var ret = Array.CreateInstance(elementType, arrayCount);
var ret = new object[arrayCount];
Copy link
Member

@noahfalk noahfalk Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the dynamic return type of the API and likely breaks consumers. Although the method return is statically typed as object I don't think it is really free to return an arbitrary type.

I believe arrayInfo.Element.Type is a small closed set of potential types so a helper method with a switch case could create this correctly.

@agocke
Copy link
Member Author

agocke commented Jul 25, 2025

OK, I've refactored this to completely remove the usage of System.Type. @noahfalk you were right. There was no dynamic usage of Type in this code and it could be completely replaced with simple switch statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants