Skip to content

[RuntimeAsync] Switch the implementation to the new shape of runtime async methods. #2818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Dec 14, 2024

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Nov 30, 2024

Re: https://github.com/dotnet/runtime/blob/main/docs/design/specs/runtime-async.md

  • updates the Roslyn reference to the version that emits async2 methods according to the new IL API
  • updates the VM part to understand the new IL API.
  • I assume that the async marker is a MethodImpl bit as in [RuntimeAsync] Make Async marker a bit in MethodImpl flags runtime#110274
    It would not be too hard to switch to an ordinary attribute if we decide otherwise, but dealing with MethodImpl is somewhat simpler, so I made an assumption that we will use MethodImpl.

Before this change we had the notion of "Thunk" that covered two concepts

  • a synthetic helper method that provides an entry with a signature not present in metadata
  • a method with synthetic body that forwards to another flavor.

Before this change these were one thing - "Thunk". A Thunk was always serving both purposes.

The most laborious part of this change was teasing apart a single Thunk concept into a Helper that provides a different signature and a Thunk - a method that forwards to another.
These things are not necessarily the same now. We can have a method with a signature provided in metadata yet its method body would forward to a synthetic entry point. The other way is possible too - A helper with synthetic Async2 signature, but with a body constructed by JIT-ting actual IL.

@VSadov VSadov requested review from jakobbotsch and davidwrighton and removed request for jakobbotsch November 30, 2024 03:10
@VSadov
Copy link
Member Author

VSadov commented Nov 30, 2024

All tests work again, except async\simple-eh and async\collectible-alc

These two hit some asserts in JIT. I suspect wrapped/unwrapped mismatches in return values, but it could take me quite some time to trace why that happens. Like something is picking up Task<T> instead of T now.

@jakobbotsch - could you take a look what is going on with these tests?

@@ -55,17 +55,27 @@ EXTERN_C VOID STDCALL NDirectImportThunk();
#define METHOD_TOKEN_RANGE_BIT_COUNT (24 - METHOD_TOKEN_REMAINDER_BIT_COUNT)
#define METHOD_TOKEN_RANGE_MASK ((1 << METHOD_TOKEN_RANGE_BIT_COUNT) - 1)

enum class AsyncMethodType
enum class AsyncMethodKind
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but why do you prefer "kind" over "type"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inside the type system implementation referring to someting as "type" when it is not a type adds extra confusion.
"Kind" seems slightly better, especially when we end up with fields/locals called "type".

};

struct AsyncMethodData
{
AsyncMethodType type;
AsyncMethodKind type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be renamed as well then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I missed a few places. Mostly was looking at test failures after things started failing after the change.

{
LIMITED_METHOD_DAC_CONTRACT;
if (!HasAsyncMethodData())
return false;
AsyncMethodType asyncType = GetAddrOfAsyncMethodData()->type;
return asyncType == AsyncMethodType::AsyncToTask || asyncType == AsyncMethodType::TaskToAsync;
auto asyncType = GetAddrOfAsyncMethodData()->type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto asyncType = GetAddrOfAsyncMethodData()->type;
AsyncMethodKind asyncType = GetAddrOfAsyncMethodData()->type;

and potentially a further rename of the variable.

// We use "async2" for runtime async methods that return "Unwrapped" values (i.e. T instead of Task<T>)
// The type of promise is typically captured in a modreq.
// CONSIDER: We probably need a better name for the concept, but it is hard to beat shortness of "async2"
inline bool IsAsync2Method() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see good reason to have both this and IsAsyncHelperMethod. It seems like one of them should just be removed.

Copy link
Member Author

@VSadov VSadov Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Thunk" and "Helper" were the same thing before this change and then had to be sorted out on what we actually want in every place where "IsThunk" was used.

Now we have the same situation with "IsHelper" and "IsAsync2". They are the same in current implementation, but fundamentally do not need to be. We may end up with a scenario where we have to declare something that is an "IsAsync2" but not a helper. - perhaps for internal helpers like UnsafeAwaitAwaiterFromRuntimeAsync

There are quite a few calls to IsAsync2Method and even more to IsAsyncHelperMethod

I'd like to keep these two separate for now. It is easy to fold then into one when we completely sure.


return asyncType == AsyncMethodType::Async;
return asyncType == AsyncMethodKind::Async;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be AsyncImplHelper, since you changed the meaning of Async?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is a bug.

Comment on lines 3590 to 3591
_ASSERTE(false);
pNewMethod->SetAsyncMethodKind(AsyncMethodKind::Async);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like some kind of TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is dead code now. I tried to limit diffs initially - to see what has changed, but this can be now removed.

CorElementType etype;
IfFailThrow(pSig.PeekElemType(&etype));

if ((etype & ELEMENT_TYPE_GENERICINST) != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((etype & ELEMENT_TYPE_GENERICINST) != 0)
if (etype == ELEMENT_TYPE_GENERICINST)

This is not a flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can it be an assert? I think it should be impossible for this not to be a Task<T> or ValueTask<T> instantiation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought of trying to keep the capability of parsing the old (modreq'd) signatures - in case we need that for some internal reasons, but checking for ELEMENT_TYPE_GENERICINST here is not the way. I think it would not work if we have a modreq'd signature that returns Task<T> as we would unwrap both modreq and Task wrappers.

Yes. It would be simpler if this is just an assert.
If we need to parse something else (I have more and more doubts) we would just have a different helper.

@jakobbotsch
Copy link
Member

These two hit some asserts in JIT. I suspect wrapped/unwrapped mismatches in return values, but it could take me quite some time to trace why that happens. Like something is picking up Task<T> instead of T now.

@jakobbotsch - could you take a look what is going on with these tests?

It looks like Roslyn is producing illegal IL. E.g. for simple-eh, Roslyn is storing an int result of the Throw call into a Task<int> local:

// Token: 0x06000003 RID: 3 RVA: 0x000020B4 File Offset: 0x000002B4
.method public hidebysig static 
	class [System.Runtime]System.Threading.Tasks.Task`1<int32> Handler () cil managed securitymitigations 
{
	// Header Size: 12 bytes
	// Code Size: 20 (0x14) bytes
	// LocalVarSig Token: 0x11000002 RID: 2
	.maxstack 1
	.locals init (
		[0] class [System.Runtime]System.Threading.Tasks.Task`1<int32>
	)

	.try
	{
		/* 0x000002C0 1F2A         */ IL_0000: ldc.i4.s  42
		/* 0x000002C2 281600000A   */ IL_0002: call      int32 modreq([System.Runtime]System.Threading.Tasks.Task`1)  Async2SimpleEH::Throw(int32)
		/* 0x000002C7 0A           */ IL_0007: stloc.0                           // <- this stores int into Task<int>
		/* 0x000002C8 DE08         */ IL_0008: leave.s   IL_0012
	} // end .try
	catch Async2SimpleEH/IntegerException
	{
		/* 0x000002CA 7B01000004   */ IL_000A: ldfld     int32 Async2SimpleEH/IntegerException::Value
		/* 0x000002CF 0A           */ IL_000F: stloc.0
		/* 0x000002D0 DE00         */ IL_0010: leave.s   IL_0012
	} // end handler

	/* 0x000002D2 06           */ IL_0012: ldloc.0
	/* 0x000002D3 2A           */ IL_0013: ret
} // end of method Async2SimpleEH::Handler

@VSadov
Copy link
Member Author

VSadov commented Dec 2, 2024

It looks like Roslyn is producing illegal IL. E.g. for simple-eh, Roslyn is storing an int result of the Throw call into a Task local:

Right. These locals are introduced very late - in emit phase, after async rewrite would have happened, so they pick the wrapped return type.
Thanks!!

@@ -17,14 +17,20 @@ public static void TestEntryPoint()
AsyncEntryPoint().Wait();
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static void CollectOnce()
Copy link
Member Author

@VSadov VSadov Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for an assert failure.
The assert is caused by this PR (as in - it does not happen without the changes, but I am not familiar enough with JIT codebase to tell right away how it is related to the changes. Seems like something related to PInvoke setup in an async2 method . . .

without the workaround:

Assert failure(PID 55592 [0x0000d928], Thread: 16608 [0x40e0]): Assertion failed 'comp->fgFirstBBisScratch()' in 'Async2CollectibleAlc:AsyncEntryPoint()' during 'Lowering nodeinfo' (IL size 49; hash 0x76f7a2c6; FullOpts)

    File: E:\dotnetA2\runtimelab\src\coreclr\jit\lower.cpp:5905
    Image: E:\dotnetA2\runtimelab\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new issue, it reproduces also before this PR with DOTNET_TieredCompilation=0, so we can ignore this particular issue.

However, the fact that it reproduces here means that tiered compilation is not working with the new scheme. The new thunks seem to be handled differently from metadata functions with respect to PGO, optimizations, etc.. Can you take a look at that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix for the pinvoke issue is up in #2847.

Copy link
Member Author

@VSadov VSadov Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new thunks seem to be handled differently from metadata functions with respect to PGO, optimizations, etc.. Can you take a look at that?

The thunks were explicitly excluded from tiering. perhaps because thunk methods are too small.
Exclusion of thunks is one of a few places that should not switch to IsAsyncHelperMethod() and should continue using IsAsyncThunkMethod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default everything that was using IsAsyncThunkMethod() switched to IsAsyncHelperMethod(), but in this case it was incorrect and effectively disabled tiering of runtime async methods.

Copy link
Member

@jakobbotsch jakobbotsch Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to validate that PGO works now, but the new Roslyn prototype fails to compile my test example:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Xunit;

public class Async2Pgo
{
    [Fact]
    public static void EntryPoint()
    {
        AsyncEntryPoint().Wait();
    }
    
    internal static async2 Task<int> AsyncEntryPoint()
    {
        int[] arr = Enumerable.Range(0, 100_000).ToArray();

        int sum = 0;
        for (int i = 0; i < 4; i++)
        {
            for (int j = 0; j < 100; j++)
                sum += await AggregateDelegateAsync(arr, new AggregateSum(), 0);
                
            await Task.Delay(100);
        }
        
        return sum;
    }
    
    private class AggregateSum : I<int>
    {
#pragma warning disable CS1998
        public async2 Task<int> Aggregate(int a, int b) => a + b;
    }
    
    public interface I<T>
    {
        public Task<T> Aggregate(T seed, T val);
    }
    
    public static async2 Task<T> AggregateDelegateAsync<T>(T[] arr, I<T> aggregate, T seed)
    {
        foreach (T val in arr)
            seed = await aggregate.Aggregate(seed, val);
            
        return seed;
    }
}
C:\Users\Jakob\.nuget\packages\microsoft.net.compilers.toolset\4.13.0-async-11\tasks\netcore\Microsoft.CSharp.Core.targets(89,5): er
ror : Process terminated. System.NullReferenceException: Object reference not set to an instance of an object. [C:\dev\dotnet\runtim
elab\src\tests\async\pgo.csproj]
C:\Users\Jakob\.nuget\packages\microsoft.net.compilers.toolset\4.13.0-async-11\tasks\netcore\Microsoft.CSharp.Core.targets(89,5): er
ror :    at Microsoft.CodeAnalysis.CSharp.Symbols.Async2ThunkForAsyncMethod.get_ReturnTypeWithAnnotations() in C:\dev\dotnet\async-r
oslyn-repo\src\Compilers\CSharp\Portable\Symbols\AsyncThunks.cs:line 83 [C:\dev\dotnet\runtimelab\src\tests\async\pgo.csproj]
C:\Users\Jakob\.nuget\packages\microsoft.net.compilers.toolset\4.13.0-async-11\tasks\netcore\Microsoft.CSharp.Core.targets(89,5): er
ror :    at Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol.get_ReturnType() in C:\dev\dotnet\async-roslyn-repo\src\Compilers\CSh
arp\Portable\Symbols\MethodSymbol.cs:line 251 [C:\dev\dotnet\runtimelab\src\tests\async\pgo.csproj]
C:\Users\Jakob\.nuget\packages\microsoft.net.compilers.toolset\4.13.0-async-11\tasks\netcore\Microsoft.CSharp.Core.targets(89,5): er
ror :    at Microsoft.CodeAnalysis.CSharp.Symbols.MethodSymbol.Microsoft.Cci.ISignature.GetType(EmitContext context) in C:\dev\dotne
t\async-roslyn-repo\src\Compilers\CSharp\Portable\Emitter\Model\MethodSymbolAdapter.cs:line 245 [C:\dev\dotnet\runtimelab\src\tests\
async\pgo.csproj]
...

It works fine with the previous prototype, and PGO kicks in with full hoisting and opts:

G_M19365_IG03:  ;; offset=0x0018
       test     rbx, rbx
       je       SHORT G_M19365_IG07
       mov      r8, 0x7FF8320A6DB0      ; Async2Pgo+AggregateSum
       cmp      qword ptr [rbx], r8
       jne      SHORT G_M19365_IG07
       add      rdx, 16
       align    [0 bytes for IG04]
                                                ;; size=24 bbWeight=0.99 PerfScore 5.72
G_M19365_IG04:  ;; offset=0x0030
       mov      r11d, dword ptr [rdx]
       add      r9d, r11d
       add      rdx, 4
       dec      ecx
       jne      SHORT G_M19365_IG04
                                                ;; size=14 bbWeight=165.40 PerfScore 620.24
G_M19365_IG05:  ;; offset=0x003E
       mov      eax, r9d
       xor      ecx, ecx
                                                ;; size=5 bbWeight=1.00 PerfScore 0.50
G_M19365_IG06:  ;; offset=0x0043
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       ret

@jkotas jkotas added the area-async Runtime generate async state machines label Dec 3, 2024
jakobbotsch
jakobbotsch previously approved these changes Dec 6, 2024
@jakobbotsch jakobbotsch dismissed their stale review December 6, 2024 12:16

Found a non-working case

@VSadov
Copy link
Member Author

VSadov commented Dec 6, 2024

I am still looking at reenabling tiered compilation. The part where we go back from native code -> method def -> method desc is not right for async methods. We start with implementation code and arrive to the method desc for a thunk, thus tiering up fails.

The problem is understood, but need to figure what/where the right fix would be.

@VSadov
Copy link
Member Author

VSadov commented Dec 6, 2024

Tiered compilation works again. All current tests pass.

I will look at Roslyn prototype issues with the PGO sample above next.

@VSadov
Copy link
Member Author

VSadov commented Dec 7, 2024

  • merged upstream into the Roslyn prototype
  • fixed a prototype bug with encoding underlying type of nongeneric async methods at call sites

@jakobbotsch - the PGO test should build now. I've added a test in the Roslyn prototype branch, and it builds, but I have not yet try running it. All existing tests pass though.

@VSadov VSadov requested a review from jakobbotsch December 11, 2024 23:46
@VSadov
Copy link
Member Author

VSadov commented Dec 12, 2024

@jakobbotsch I have added your sample as a testcase. It looks like tiering, PGO and OSR are working (if I read the following correctly).

; Assembly listing for method Async2Pgo:AsyncEntryPoint():int (Tier1-OSR)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; Tier1-OSR code
; OSR variant for entry point 0x32
; async2
; optimized code
; optimized using Dynamic PGO
; rsp based frame
; fully interruptible
; with Dynamic PGO: fgCalledCount is 116.0628
; 5 inlinees with PGO data; 15 single block inlinees; 2 inlinees without PGO data
; Final local variable assignments
;
;  V00 AsyncCont    [V00,T10] (  6,    3.86)     ref  ->  rcx         single-def "Async continuation arg"
;  V01 loc1         [V01,T08] (  3,   98.56)     ref  ->  rdi         class-hnd <int[]>
;  V02 loc2         [V02,T07] (  5,  197.13)     int  ->  rsi
;  V03 loc3         [V03,T13] (  5,    4.32)     int  ->  rbp
;  V04 loc4         [V04,T04] (  5,  298.56)     int  ->  rbx
;* V05 loc5         [V05    ] (  0,    0   )     int  ->  zero-ref
;  V06 OutArgs      [V06    ] (  1,    1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V07 tmp1         [V07    ] (  0,    0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Async2Pgo+AggregateSum>
;* V08 tmp2         [V08    ] (  0,    0   )     int  ->  zero-ref    "Inline return value spill temp"
;  V09 tmp3         [V09,T05] (  3,  295.10)     ref  ->  rax         class-hnd "Inline stloc first use temp" <int[]>
;* V10 tmp4         [V10    ] (  0,    0   )     int  ->  zero-ref    "Inline stloc first use temp"
;  V11 tmp5         [V11,T03] (  2,32932.47)     int  ->   r8         "Inline stloc first use temp"
;  V12 tmp6         [V12,T00] (  4,66259.19)     int  ->  rcx         "Inlining Arg"
;* V13 tmp7         [V13    ] (  0,    0   )  struct ( 8) zero-ref    ld-addr-op "Inline stloc first use temp" <System.Runtime.CompilerServices.TaskAwaiter>
;* V14 tmp8         [V14    ] (  0,    0   )  struct ( 8) zero-ref    ld-addr-op "Inline ldloca(s) first use temp" <System.Threading.CancellationToken>
;* V15 tmp9         [V15    ] (  0,    0   )  struct ( 8) zero-ref    "Inlining Arg" <System.Threading.CancellationToken>
;  V16 tmp10        [V16,T12] (  7,    5.47)     ref  ->  r14         class-hnd "Inline return value spill temp" <System.Threading.Tasks.Task>
;* V17 tmp11        [V17    ] (  0,    0   )  struct ( 8) zero-ref    ld-addr-op "Inlining Arg" <System.Threading.CancellationToken>
;  V18 tmp12        [V18,T11] (  2,    5.76)     ref  ->  rbx         class-hnd exact "Inlining Arg" <System.TimeProvider+SystemTimeProvider>
;* V19 tmp13        [V19    ] (  0,    0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <System.Threading.Tasks.Task+DelayPromiseWithCancellation>
;  V20 tmp14        [V20,T09] (  3,    8.64)     ref  ->  r14         class-hnd exact "NewObj constructor temp" <System.Threading.Tasks.Task+DelayPromise>
;* V21 tmp15        [V21    ] (  0,    0   )   ubyte  ->  zero-ref    "Inline return value spill temp"
;* V22 tmp16        [V22    ] (  0,    0   )     ref  ->  zero-ref    class-hnd "Inlining Arg" <System.Threading.CancellationTokenSource>
;* V23 tmp17        [V23    ] (  0,    0   )  struct ( 8) zero-ref    ld-addr-op "NewObj constructor temp" <System.Runtime.CompilerServices.TaskAwaiter>
;* V24 tmp18        [V24    ] (  0,    0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V25 tmp19        [V25    ] (  0,    0   )   ubyte  ->  zero-ref    "Inlining Arg"
;* V26 tmp20        [V26    ] (  0,    0   )     ref  ->  zero-ref    class-hnd "Inlining Arg" <System.Threading.Tasks.Task>
;* V27 tmp21        [V27    ] (  0,    0   )     int  ->  zero-ref    "Inlining Arg"
;* V28 tmp22        [V28    ] (  0,    0   )     ref  ->  zero-ref    class-hnd "Inlining Arg" <System.Threading.Tasks.Task>
;* V29 tmp23        [V29    ] (  0,    0   )     ref  ->  zero-ref    "field V13.m_task (fldOffset=0x0)" P-INDEP
;* V30 tmp24        [V30    ] (  0,    0   )     ref  ->  zero-ref    "field V14._source (fldOffset=0x0)" P-INDEP
;* V31 tmp25        [V31    ] (  0,    0   )     ref  ->  zero-ref    "field V15._source (fldOffset=0x0)" P-INDEP
;* V32 tmp26        [V32    ] (  0,    0   )     ref  ->  zero-ref    "field V17._source (fldOffset=0x0)" P-INDEP
;  V33 tmp27        [V33,T14] (  2,    2.88)     ref  ->  rcx         "field V23.m_task (fldOffset=0x0)" P-INDEP
;  V34 cse0         [V34,T06] (  3,  295.10)     int  ->  rdx         "CSE #01: aggressive"
;  V35 rat0         [V35,T01] (  4,49496.68)   byref  ->  rax         "Strength reduced derived IV"
;  V36 rat1         [V36,T02] (  4,49496.68)     int  ->  rdx         "Trip count IV"
;  V37 rat2         [V37,T15] (  3,    2.30)     ref  ->  rcx         "returned continuation"
;  V38 rat3         [V38,T16] (  6,    0   )     ref  ->  rbx         "new continuation"
;  V39 rat4         [V39,T18] (  3,    0   )     ref  ->  r15         "object[] for continuation"
;  V40 rat5         [V40,T17] (  4,    0   )     ref  ->  rax         "byte[] for continuation"
;  V41 rat6         [V41,T19] (  3,    0   )     ref  ->  rdx         "byte[] for continuation"
;  V42 rat7         [V42,T20] (  3,    0   )     ref  ->  rcx         "object[] for continuation"
;
; Lcl frame size = 40

G_M18330_IG01:  ;; offset=0x0000
       sub      rsp, 88
       mov      qword ptr [rsp+0x138], r15
       mov      qword ptr [rsp+0x130], r14
       mov      qword ptr [rsp+0x128], rdi
       mov      qword ptr [rsp+0x120], rsi
       mov      qword ptr [rsp+0x118], rbx
       mov      rcx, gword ptr [rsp+0x150]
       mov      rdi, gword ptr [rsp+0x100]
       mov      esi, dword ptr [rsp+0xFC]
       mov      ebp, dword ptr [rsp+0xF8]
       mov      ebx, dword ptr [rsp+0xF4]
                                                ;; size=81 bbWeight=1 PerfScore 15.25
G_M18330_IG02:  ;; offset=0x0051
       test     rcx, rcx
       je       SHORT G_M18330_IG06
                                                ;; size=5 bbWeight=1 PerfScore 1.25
G_M18330_IG03:  ;; offset=0x0056
       mov      rdx, gword ptr [rcx+0x10]
       cmp      dword ptr [rdx+0x10], 0
       jl       SHORT G_M18330_IG06
       jmp      G_M18330_IG16
                                                ;; size=15 bbWeight=0.86 PerfScore 6.89
G_M18330_IG04:  ;; offset=0x0065
       mov      ecx, dword ptr [r14+0x34]
       and      ecx, 0x11000000
       cmp      ecx, 0x1000000
       jne      G_M18330_IG13
                                                ;; size=22 bbWeight=1.44 PerfScore 5.04
G_M18330_IG05:  ;; offset=0x007B
       inc      ebp
       cmp      ebp, 100
       jge      G_M18330_IG14
       xor      ebx, ebx
                                                ;; size=13 bbWeight=1.44 PerfScore 2.52
G_M18330_IG06:  ;; offset=0x0088
       cmp      ebx, 100
       jge      SHORT G_M18330_IG11
                                                ;; size=5 bbWeight=1.44 PerfScore 1.80
G_M18330_IG07:  ;; offset=0x008D
       xor      ecx, ecx
       mov      rax, rdi
       mov      edx, dword ptr [rax+0x08]
       test     edx, edx
       jle      SHORT G_M18330_IG10
                                                ;; size=12 bbWeight=98.56 PerfScore 369.60
G_M18330_IG08:  ;; offset=0x0099
       add      rax, 16
       align    [3 bytes for IG09]
                                                ;; size=7 bbWeight=97.97 PerfScore 48.99
G_M18330_IG09:  ;; offset=0x00A0
       mov      r8d, dword ptr [rax]
       add      ecx, r8d
       add      rax, 4
       dec      edx
       jne      SHORT G_M18330_IG09
                                                ;; size=14 bbWeight=16466.24 PerfScore 61748.39
G_M18330_IG10:  ;; offset=0x00AE
       add      esi, ecx
       inc      ebx
       cmp      ebx, 100
       jl       SHORT G_M18330_IG07
                                                ;; size=9 bbWeight=98.56 PerfScore 172.48
G_M18330_IG11:  ;; offset=0x00B7
       mov      rcx, 0x2464F000238      ; const ptr
       mov      rbx, gword ptr [rcx]
       mov      rcx, 0x7FF814DFAD00      ; System.Threading.Tasks.Task+DelayPromise
       call     CORINFO_HELP_NEWSFAST
       mov      r14, rax
       mov      rcx, r14
       mov      r8, rbx
       mov      edx, 1
       call     [System.Threading.Tasks.Task+DelayPromise:.ctor(uint,System.TimeProvider):this]
       mov      rcx, r14
       test     dword ptr [rcx+0x34], 0x1600000
       jne      G_M18330_IG04
                                                ;; size=64 bbWeight=1.44 PerfScore 16.91
G_M18330_IG12:  ;; offset=0x00F7
       mov      rcx, r14
       call     [System.Runtime.CompilerServices.RuntimeHelpers:UnsafeAwaitAwaiterFromRuntimeAsync[System.Runtime.CompilerServices.TaskAwaiter](System.Runtime.CompilerServices.TaskAwaiter)]
       test     rcx, rcx
       jne      SHORT G_M18330_IG17
       jmp      G_M18330_IG04
                                                ;; size=19 bbWeight=1.15 PerfScore 7.49
G_M18330_IG13:  ;; offset=0x010A
       mov      rcx, r14
       xor      edx, edx
       call     [System.Runtime.CompilerServices.TaskAwaiter:HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task,int)]
       jmp      G_M18330_IG05
                                                ;; size=16 bbWeight=0.00 PerfScore 0.01
G_M18330_IG14:  ;; offset=0x011A
       mov      eax, esi
       xor      ecx, ecx
                                                ;; size=4 bbWeight=0.01 PerfScore 0.00
G_M18330_IG15:  ;; offset=0x011E
       add      rsp, 280
       pop      rbx
       pop      rsi
       pop      rdi
       pop      r14
       pop      r15
       pop      rbp
       ret
                                                ;; size=16 bbWeight=0.01 PerfScore 0.04
G_M18330_IG16:  ;; offset=0x012E
       mov      rdx, gword ptr [rcx+0x10]
       mov      esi, dword ptr [rdx+0x18]
       mov      ebp, dword ptr [rdx+0x1C]
       mov      rcx, gword ptr [rcx+0x18]
       mov      rdi, gword ptr [rcx+0x10]
       mov      r14, gword ptr [rcx+0x18]
       jmp      G_M18330_IG04
                                                ;; size=27 bbWeight=0 PerfScore 0.00
G_M18330_IG17:  ;; offset=0x0149
       mov      edx, 2
       mov      r8d, 16
       call     CORINFO_HELP_ALLOC_CONTINUATION
       mov      rbx, rax
       mov      rcx, 0x7FF814B7A850      ; code for (dynamicClass):IL_STUB_AsyncResume_AsyncEntryPoint_Tier1OSR(System.Object):System.Object
       mov      qword ptr [rbx+0x20], rcx
       mov      rcx, 0x400000000
       mov      qword ptr [rbx+0x28], rcx
       mov      r15, gword ptr [rbx+0x18]
       lea      rcx, bword ptr [r15+0x10]
       mov      rdx, rdi
       call     CORINFO_HELP_ASSIGN_REF
       lea      rcx, bword ptr [r15+0x18]
       mov      rdx, r14
       call     CORINFO_HELP_ASSIGN_REF
       mov      rax, gword ptr [rbx+0x10]
       mov      dword ptr [rax+0x10], 50
       mov      dword ptr [rax+0x18], esi
       mov      dword ptr [rax+0x1C], ebp
       mov      rcx, rbx
                                                ;; size=95 bbWeight=0 PerfScore 0.00
G_M18330_IG18:  ;; offset=0x01A8
       add      rsp, 280
       pop      rbx
       pop      rsi
       pop      rdi
       pop      r14
       pop      r15
       pop      rbp
       ret
                                                ;; size=16 bbWeight=0 PerfScore 0.00

; Total bytes of code 440, prolog size 81, PerfScore 62396.66, instruction count 109, allocated bytes for code 440 (MethodHash=022db865) for method Async2Pgo:AsyncEntryPoint():int (Tier1-OSR)
; ============================================================

@@ -719,6 +719,15 @@ namespace

COR_ILMETHOD_DECODER* pHeader = NULL;
COR_ILMETHOD* ilHeader = pConfig->GetILHeader();

Copy link
Member Author

@VSadov VSadov Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do here:
when pConfig tries to match the native code to global IL it will follow native code->methoddef->methoddesc->IL chain.
However, we have two methoddescs associated with the methoddef and the one that it the "primary" one (as in - stored in the def->desc mapping dictionary) is the thunk. It's signature matches metadata, but its IL is runtime-provided, thus there would be no global IL for it.

In a case if we get NULL IL, we check if the original methoddesc was actually an async "implementation" helper that is not a wrapper/thunk. In such case we ask the methoddesc directly for its IL as that would be the IL we are looking for.

@@ -0,0 +1,52 @@
// Licensed to the .NET Foundation under one or more agreements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other test files in this directory are lower-case. Can you please make this match?

for (int j = 0; j < 100; j++)
sum += await AggregateDelegateAsync(arr, new AggregateSum(), 0);

await Task.Delay(1);
Copy link
Member

@jakobbotsch jakobbotsch Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wait might not be enough to get us out of the tiering start-up mode (the tiering system has heuristic to wait with starting the tier-up while it thinks the app is in start-up mode).

The original test has i < 4 to potentially tier up through 4 tiers (in reality it should only need 3), but the longer wait to make sure the tiering system kicks in between each round of calls to the previous tier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see OSR scenario to happen with just 4 iterations. Perhaps not enough call count?
With 100 iterations it does happen, but to make the test run in reasonable time, an individual iteration needs to be sorter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about testing OSR in this test. We already have coverage of OSR from other tests in the tree. Perhaps we could write a more explicit one too, but I think it would look different from this one. (No sleep is necessary, but OSR triggers after 10000 hits to a backedge, so a simple loop can do it).
This test just needs to ensure that PGO information is properly saved and restored from tier0-instrumentation into tier1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can revert to the original 4 iteration/100 millisecond

@jakobbotsch
Copy link
Member

@jakobbotsch I have added your sample as a testcase. It looks like tiering, PGO and OSR are working (if I read the following correctly).

Can you add NoInlining to AggregateDelegateAsync? It looks like inlining kicked in, hence there is no PGO necessary to do devirtualization here.

Also it looks like you haven't pushed the changes to Roslyn to https://github.com/dotnet/roslyn/commits/demos/async2-experiment1/ yet.

@VSadov
Copy link
Member Author

VSadov commented Dec 12, 2024

Also it looks like you haven't pushed the changes to Roslyn to https://github.com/dotnet/roslyn/commits/demos/async2-experiment1/ yet.

I thought I did. Something did not work though. It was a big change since I merged in changes from main.
I pushed again. Should be in now

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now! I validated that I see what I expect in the PGO test.

Not that familiar with the VM bits, so you may want to ping @davidwrighton offline to get a review.

@VSadov
Copy link
Member Author

VSadov commented Dec 14, 2024

Not that familiar with the VM bits, so you may want to ping @davidwrighton offline to get a review.

The most material change on the VM side is that for async2 methods, while we keep the invariant that the "primary" methoddesc is the one declared in metadata, it is now the Task-returning one.
For the most part of VM the dual methoddescs are used symmetrically, so the swap does not matter. The only case where one methoddesc is really distinct from another is the methoddef->methoddesc map. That is what basically makes the "primary" methoddesc special, since the mapping records the "primary" one. And since it is still the one that matches metadata, it still fits the existing logic.

What bothered me a bit is the number of cases that would throw/assert/bail if they see the non-primary/helper methoddesc. Some of those are probably never supposed to see such scenarios, but some may be just NYI and may eventually need some support. Reviewing all those places at some point could be a good idea.

I will definitely talk to David about this and further steps.

@VSadov
Copy link
Member Author

VSadov commented Dec 14, 2024

Thanks for reviewing!

I think I will merge this, so it does not block any further changes, and if any more follow up is needed, that would be separate changes.

@VSadov VSadov merged commit 7624a38 into dotnet:feature/async2-experiment Dec 14, 2024
1 of 7 checks passed
@VSadov VSadov deleted the newAPI01 branch December 14, 2024 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-async Runtime generate async state machines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants