Skip to content

Conversation

majocha
Copy link
Contributor

@majocha majocha commented Oct 7, 2025

by using RuntimeHelpers.TryEnsureSufficientExecutionStack instead of arbitrary guessed limits.

Should improve #18970

ComputationExpressionBenchmarksbenchmark with EmptyCache = true:

current main:

| Method    | Source            | Mean       | Error    | StdDev   | Median     | Gen0       | Gen1      | Gen2      | Allocated  |
|---------- |------------------ |-----------:|---------:|---------:|-----------:|-----------:|----------:|----------:|-----------:|
| CheckCE   | CE100xnest1.fs    |   196.5 ms |  2.86 ms |  4.28 ms |   196.3 ms |  2000.0000 | 1000.0000 |         - |  157.34 MB |
| CompileCE | CE100xnest1.fs    |   352.4 ms |  6.90 ms |  8.21 ms |   353.8 ms |  9000.0000 | 4000.0000 | 2000.0000 |   846.1 MB |
| CheckCE   | CE100xnest5.fs    |   757.8 ms | 12.36 ms | 10.95 ms |   761.9 ms |  7000.0000 | 3000.0000 | 1000.0000 |  399.12 MB |
| CompileCE | CE100xnest5.fs    | 2,244.6 ms | 28.86 ms | 27.00 ms | 2,256.1 ms | 31000.0000 | 5000.0000 | 2000.0000 | 3365.74 MB |
| CheckCE   | CE1xnest15.fs     |   107.7 ms |  4.25 ms | 11.92 ms |   102.3 ms |  2000.0000 | 1000.0000 |         - |  112.56 MB |
| CompileCE | CE1xnest15.fs     |   133.3 ms |  3.16 ms |  8.66 ms |   130.1 ms |  3000.0000 | 1000.0000 |         - |  149.34 MB |
| CheckCE   | CEwCO100xnest5.fs |   179.1 ms |  2.96 ms |  4.25 ms |   178.5 ms |  3000.0000 | 1000.0000 |         - |  211.57 MB |
| CompileCE | CEwCO100xnest5.fs |   245.3 ms |  4.87 ms |  9.95 ms |   240.9 ms |  7000.0000 | 3000.0000 | 1000.0000 |  289.73 MB |
| CheckCE   | CEwCO500xnest1.fs |   128.7 ms |  4.43 ms | 12.65 ms |   125.6 ms |  1000.0000 |         - |         - |  132.02 MB |
| CompileCE | CEwCO500xnest1.fs | 1,379.5 ms | 21.61 ms | 19.16 ms | 1,376.8 ms | 27000.0000 | 3000.0000 | 1000.0000 |  592.42 MB |

this PR:

| Method    | Source            | Mean     | Error   | StdDev   | Median   | Gen0       | Gen1      | Gen2      | Allocated |
|---------- |------------------ |---------:|--------:|---------:|---------:|-----------:|----------:|----------:|----------:|
| CheckCE   | CE100xnest1.fs    | 132.0 ms | 3.92 ms | 11.11 ms | 132.0 ms |  1000.0000 |         - |         - | 154.24 MB |
| CompileCE | CE100xnest1.fs    | 183.3 ms | 4.25 ms | 12.13 ms | 182.3 ms |  6000.0000 | 3000.0000 | 1000.0000 | 245.16 MB |
| CheckCE   | CE100xnest5.fs    | 319.7 ms | 5.92 ms |  5.25 ms | 318.5 ms |  3000.0000 | 1000.0000 |         - |  377.1 MB |
| CompileCE | CE100xnest5.fs    | 520.9 ms | 7.56 ms |  7.07 ms | 521.6 ms | 13000.0000 | 5000.0000 | 2000.0000 | 754.65 MB |
| CheckCE   | CE1xnest15.fs     | 100.8 ms | 1.89 ms |  4.45 ms | 100.1 ms |  2000.0000 | 1000.0000 |         - | 113.85 MB |
| CompileCE | CE1xnest15.fs     | 114.2 ms | 2.36 ms |  6.45 ms | 111.7 ms |  2000.0000 | 1000.0000 |         - | 149.68 MB |
| CheckCE   | CEwCO100xnest5.fs | 169.8 ms | 3.36 ms |  4.71 ms | 168.4 ms |  3000.0000 | 1000.0000 |         - | 211.11 MB |
| CompileCE | CEwCO100xnest5.fs | 216.9 ms | 4.33 ms | 11.84 ms | 216.4 ms |  7000.0000 | 3000.0000 | 1000.0000 | 289.32 MB |
| CheckCE   | CEwCO500xnest1.fs | 113.9 ms | 3.87 ms | 10.66 ms | 110.7 ms |  1000.0000 |         - |         - |  131.5 MB |
| CompileCE | CEwCO500xnest1.fs | 358.1 ms | 7.12 ms | 10.65 ms | 357.3 ms | 10000.0000 | 3000.0000 | 1000.0000 | 547.89 MB |

Additionally, StackGuard stats are improved. If StackGuard does any thread switches, the minimal depth that triggered it is recorded. Visible in binlog when compiled with --times option:

| caller |        source        | jumps | min depth |
|--------|----------------------|-------|-----------|
| exprF  | TypedTreeOps.fs:7448 |    23 |      1601 | 

This effort is kindly sponsored by AmplifyingF# 🚀

Copy link
Contributor

github-actions bot commented Oct 7, 2025

❗ Release notes required


✅ Found changes and release notes in following paths:

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/11.0.0.md

@majocha
Copy link
Contributor Author

majocha commented Oct 7, 2025

I added some benchmark results to the description.
The more nested recursion, the bigger gain there is.
I skipped the very long running items from the comparison:

CE100xnest10.fs which on main takes around 17 s per iteration,
CE200xnest5.fs which takes 50 s per iteration.

Those look like this with this PR:

| Method    | Source          | Mean       | Error    | StdDev   | Gen0       | Gen1      | Gen2      | Allocated  |
|---------- |---------------- |-----------:|---------:|---------:|-----------:|----------:|----------:|-----------:|
| CheckCE   | CE100xnest10.fs |   582.8 ms |  8.80 ms |  7.80 ms |  4000.0000 | 1000.0000 |         - |  673.37 MB |
| CompileCE | CE100xnest10.fs | 1,975.5 ms | 25.50 ms | 23.85 ms | 26000.0000 | 6000.0000 | 2000.0000 | 2359.46 MB |
| CheckCE   | CE200xnest5.fs  |   642.5 ms |  9.57 ms |  8.48 ms |  5000.0000 | 2000.0000 |         - |  823.69 MB |
| CompileCE | CE200xnest5.fs  | 5,060.3 ms | 56.14 ms | 52.51 ms | 97000.0000 | 6000.0000 | 2000.0000 | 5605.73 MB |

@majocha
Copy link
Contributor Author

majocha commented Oct 7, 2025

Another informal benchmark I did was to compare build times of IcedTasks.Tests, which are heavy with resumable state machine CEs:

released compiler

❯ dotnet build .\tests\IcedTasks.Tests\ -f net9.0  -c Release -bl --no-incremental
  IcedTasks net9.0 succeeded (5.1s) → src\IcedTasks\bin\Release\net9.0\IcedTasks.dll
  IcedTasks.Tests net9.0 succeeded (18.6s) → tests\IcedTasks.Tests\bin\Release\net9.0\IcedTasks.Tests.dll

Build succeeded in 23.9s

|Optimizations | 16.8319| 13.6376| 1021| 75| 20| 4| 450| 65|

this PR:

❯ dotnet build .\tests\IcedTasks.Tests\ -f net9.0  -c Release -bl --no-incremental -p:DotnetFscCompilerPath=E:\repos\fsharp\artifacts\bin\fsc\Release\net10.0\fsc.dll
  IcedTasks net9.0 succeeded (5.6s) → src\IcedTasks\bin\Release\net9.0\IcedTasks.dll
  IcedTasks.Tests net9.0 succeeded (13.4s) → tests\IcedTasks.Tests\bin\Release\net9.0\IcedTasks.Tests.dll

Build succeeded in 19.2s

|Optimizations | 10.4479| 6.7095| 1059| 41| 10| 3| 451| 65|

The optimizations phase is significantly faster (6.7 s vs 13.6 s) and there are no stack guard thread switches at all when there were thousands before.

@majocha majocha changed the title WIP: use EnsureSufficientExecutionStack instead of arbitrary limits in StackGuard Fix excessive StackGuard thread jumps Oct 7, 2025
@majocha majocha marked this pull request as ready for review October 7, 2025 20:23
@majocha majocha requested a review from a team as a code owner October 7, 2025 20:23
@majocha
Copy link
Contributor Author

majocha commented Oct 7, 2025

As usual I'm installing this in VS to also test the IDE behavior. The MacOS intermittent test fail (StackOverflowRepro.fs does not actually stack overflow) is a coincidence, I believe. It seems dotnet on MacOS has a larger stack now?

@majocha
Copy link
Contributor Author

majocha commented Oct 8, 2025

Per Vlad's suggestion I made the old, throwing version conditional, only for netstandard2.0.

@T-Gro
Copy link
Member

T-Gro commented Oct 8, 2025

As usual I'm installing this in VS to also test the IDE behavior. The MacOS intermittent test fail (StackOverflowRepro.fs does not actually stack overflow) is a coincidence, I believe. It seems dotnet on MacOS has a larger stack now?

Yes, this is an environmental change, not caused by you.
(its just a "signaling" test to let us know if/when Stackoverflow happens)

Copy link
Member

@T-Gro T-Gro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is well isolated and brings huge compilation speed gains 🥇 , especially to optimizer phase.

It is already too late to make into the first GA releases of NET10/VS2026 respectively, but I am eager to port this to a subsequent release 👍 .

EDIT:
A bit more context on the process: While it is technically possible to convince the .NET team to accept last-minute additions or backport PRs, such changes would not benefit from any preview/RC1/RC2 testing.

The nature of this change directly affects the likelihood of stack overflows. For in-process editor tooling, that risk translates to crashing the entire IDE. In my view, this change improves the situation compared to the current approach of relying on hardcoded, estimated numbers. However, the API we are now using is not fully guaranteed. The documentation notes:

“The artificial stack limit is chosen by the common language runtime to ensure that enough space remains to throw an exception safely… [and that] stack space is large enough to execute the average .NET function.”

Having a preview period would allow us to gain much more confidence through real-world testing across F# programs and across different OSes and platforms.

@github-project-automation github-project-automation bot moved this from New to In Progress in F# Compiler and Tooling Oct 8, 2025
@majocha
Copy link
Contributor Author

majocha commented Oct 9, 2025

I tested it for a day in VS. Works fine, no problems, good performance, no crashes.

If anything, the built-in limit is in some cases way more conservative than what we had.
We could probably get away with some threshold before we defer to TryEnsureSufficientExecutionStack, for example

if depth.Value < 40 || StackGuard.IsStackSufficient() then
...

but that would be more risk for very little gain.

There's a #if NETSTANDARD2_0 now. We don't usually do them in FCS code base, at least I can't find any other. Yet, I think it is focused and for a good reason.

@T-Gro
Copy link
Member

T-Gro commented Oct 9, 2025

If anything, the built-in limit is in some cases way more conservative than what we had. We could probably get away with some threshold before we defer to TryEnsureSufficientExecutionStack, for example

I would not do it, I prefer the change as it is 👍 .

Btw. I will also remove the stackoverflow reproduction test (will keep the test which tests that we can avoid SO, will remove the one confirming SO in specific constelletation)

Removed StackOverflow reproduction test due to host not crashing anymore reliably (which is not a bad thing).
@T-Gro T-Gro enabled auto-merge (squash) October 9, 2025 07:27
@T-Gro T-Gro merged commit 4c68fe8 into dotnet:main Oct 9, 2025
38 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in F# Compiler and Tooling Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants