-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process input files in parallel #26
Conversation
Signed-off-by: ricekot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! Much needed. I recently did something similar for the Ruby counterpart. Just have one request on the moving of a logging line I observed
@@ -218,6 +211,8 @@ private static void _SummaryForDLLFile(FileInfo filePath, string? exclusionRegex | |||
return; | |||
} | |||
|
|||
_logger?.LogInformation("Parsing file {fileName}", fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for moving this? Would this not end up logging all types of files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the existing behaviour too I believe, see line 192 which logs the file name in the loop.
https://github.com/joernio/DotNetAstGen/pull/26/files#diff-600f8abdf835e81e081c02e55f3bed18ecb1fdd2c80333c77184fefc67c6ad92L192
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the looping has changed, my bad. Looks all good to me then
@@ -218,6 +211,8 @@ private static void _SummaryForDLLFile(FileInfo filePath, string? exclusionRegex | |||
return; | |||
} | |||
|
|||
_logger?.LogInformation("Parsing file {fileName}", fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the looping has changed, my bad. Looks all good to me then
Uses new version of `DotNetAstGen` that uses a parallelized version. Credits to @ricekot joernio/DotNetAstGen#26
Process input source files in parallel to reduce the total AST generation time. The time taken to generate an AST is cut down by more than half with parallel processing.
About the changes
A PLINQ query is used to parallelize the
foreach
loops that iterate over files in the input directory one by one.Since the
AsParallel
method is called on anIEnumerable<FileInfo>
, PLINQ should opt for chunk partitioning (dynamic batch sizes) by default, as opposed to a range partitioning (fixed batch sizes) approach as suggested in #11.Refs:
Benchmark
A fresh clone of the bitwarden/server repo was created locally (at commit
4adcecb
).Then, a scan was run against the cloned repo twice, once from the
main
branch of this repo and once with the code changes in this PR.time dotnet run --project DotNetAstGen/ -i /tmp/bitwarden-server -o /tmp/bitwarden-server-ast/
The results of the scans are added below:
Without the changes from this PR (sequential processing):
With the changes from this PR (parallel processing)
Thus, AST generation with parallel processing was 48.69 seconds or about 55% faster.
I did multiple runs and the result was similar each time.
For reference these runs were done on a machine running macOS 15.0.1 with Apple M3 Pro and 36GiB RAM.