Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate additional compiler optimizations like PGO #80

Open
zamazan4ik opened this issue Oct 30, 2024 · 8 comments
Open

Evaluate additional compiler optimizations like PGO #80

zamazan4ik opened this issue Oct 30, 2024 · 8 comments

Comments

@zamazan4ik
Copy link
Contributor

Hi!

Just read your article about the tool - I highly appreciate such a tool since I am also a person who is interested in a static analysis in various domains. Nice done! I think I have several potentially interesting ideas about improving this tool. I share them here since the Discussions are disabled for the repo.

Link-Time Optimization (LTO)

I noticed that in the Cargo.toml file Link-Time Optimization (LTO) for the project is not enabled. I suggest switching it on since it will reduce the binary size (always a good thing to have) and will likely improve the application's performance a bit.

I suggest enabling LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional dist or release-lto profile where additionally to regular release optimizations LTO will also be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most performant version of the application. Via enabling LTO in the Cargo.toml file we can deliver an LTO-optimized version of the tool to users with cargo install. Using ThinLTO should also help to reduce the build-time overhead with LTO. E.g., check cargo-outdated Release profile.

Basically, it can be enabled with the following lines:

[profile.release]
lto = true

I have made quick tests (Fedora 40) by adding lto = true to the Release profile. The binary size reduction is from 7.6 Mib to 5.8 Mib. Maybe you also will be interested in tweaking other compiler options like codegen-units.

Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)

According to my benchmarks, PGO measurably helps to optimize various applications (its CPU efficiency more precisely) in several domains, especially compilers, static analysis, linters, etc - see Clang, Rustc, clang-tidy, Rust Analyzer and many other examples. That's why I think applying PGO (and a similar PLO technique via LLVM BOLT) could help to optimize Zizmor further.

I see that the project is in its early stages so investing resources (time) into PGO and PLO things could not be worth it. However, when more functionality will be implemented and more "polishing" time will be available, PGO can definitely be useful.

Thank you.

@woodruffw
Copy link
Owner

Thanks for the detailed issue @zamazan4ik! I greatly appreciate it.

I'd be happy to merge a PR that enables LTO for release builds, if you're interested in sending one.

PGO and PLO are promising for future squeezing, but this tool is still in a pretty early stage and there's a lot of just plain inefficient stuff in it already (for example, it doesn't currently bother to cache and reuse the filtered lists of steps that it repeatedly loops over). So I'd be happy to start with just LTO, and then consider PGO/PLO further down the line.

@woodruffw woodruffw changed the title Enable Link-Time Optimization (LTO) and evaluate additional compiler optimizations like PGO Evaluate additional compiler optimizations like PGO Oct 30, 2024
@woodruffw
Copy link
Owner

Merged LTO in release builds with #81, so I've updated the issue title to reflect the remaining items.

@zamazan4ik
Copy link
Contributor Author

Yep, thanks a lot!

@carlocab
Copy link

Just curious -- what kind of workloads allow you to see a difference from LTO/PGO? LTO can reduce binary size (curious by how much it did here too), not sure if PGO can do the same these days (but seems unlikely from principle).

@zamazan4ik
Copy link
Contributor Author

Just curious -- what kind of workloads allow you to see a difference from LTO/PGO?

Actually, many of them benefits from PGO- see these. this list includes static analyzers. Some of the cases include LTO benchmarks too.

LTO can reduce binary size (curious by how much it did here too)

I have made quick tests (Fedora 40) by adding lto = true to the Release profile. The binary size reduction is from 7.6 Mib to 5.8 Mib.

... not sure if PGO can do the same these days (but seems unlikely from principle).

If you mean the binary size reduction, it actually "it depends" on the workload - for some cases the amount of inlining will be reduced -> the binary size will be smaller. Is it true in practice or not - it depends.

@carlocab
Copy link

Sorry, I was unclear -- what kind of workloads for zizmor allow you to see this difference?

I'm aware that for large applications it makes a substantial difference, but I was wondering specifically about this project.

@zamazan4ik
Copy link
Contributor Author

I didn't perform PGO benchmarks for Zizmor yet - in the original post it was an idea without actual benchmarks

@carlocab
Copy link

carlocab commented Oct 31, 2024

Cool. Would be interested to know what kind of workflows it would be reading for LTO/PGO to actually make a noticeable performance difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants