-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CI to detect performance regressions #53
Conversation
a58ee67
to
a215876
Compare
This is bugging me 😡Locally I invoke the same commands (using the same exact machine dual-booted):
In the CI workflow (which uses ubuntu-latest), it takes around 165 seconds to run! At least the runtime is consistent between pure-python and pure-rust. It does not seem to matter if I use clang v18 or v14. |
Compares two release builds of cpp-linter binary: 1. the previous commit (for push events) or the base branch of a PR 2. the newest commit on the branch 3. the latest v1.x release of the pure-python cpp-linter package Caching is enabled to reduce CI runtime. Results are output to the CI workflow's job summary. This CI does not (currently) fail when a regression is detected.
WalkthroughA new performance regression testing workflow has been added to the Changes
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
I give up for now. I may play with this later when I experiment with improving the async performance in rust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
.github/workflows/perf-test.yml (2)
22-63
: LGTM: Build job is well-structured with efficient caching.The build job is well-designed, using a matrix strategy to build both current and previous commits efficiently. The caching mechanism is implemented correctly, which should significantly improve performance across runs.
One minor suggestion for improvement:
Consider adding a step to validate the cached binaries, ensuring they are not corrupted or incomplete. This could be a simple check like verifying the file size or running a quick command with the binary.
Example:
- name: Validate cached binary if: steps.cache.outputs.cache-hit == 'true' run: | if [ ! -x target/release/cpp-linter ] || ! target/release/cpp-linter --version; then echo "Cached binary is invalid, rebuilding..." echo "cache-hit=false" >> $GITHUB_OUTPUT fi🧰 Tools
🪛 actionlint
50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
54-54: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
50-50
: Minor: Address shellcheck warnings for improved script robustness.There are a few instances where shellcheck has identified potential issues with environment variable usage. While these are minor, addressing them can improve the robustness of the script.
Apply the following changes to address the shellcheck warnings:
- run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> $GITHUB_OUTPUT + run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> "$GITHUB_OUTPUT" - run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> $GITHUB_OUTPUT + run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> "$GITHUB_OUTPUT" - run: cat ${{ runner.temp }}/benchmark.md >> $GITHUB_STEP_SUMMARY + run: cat "${{ runner.temp }}/benchmark.md" >> "$GITHUB_STEP_SUMMARY"These changes add double quotes around the environment variables to prevent potential issues with word splitting or globbing.
Also applies to: 54-54, 121-121
🧰 Tools
🪛 actionlint
50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- .github/workflows/perf-test.yml (1 hunks)
- .github/workflows/perf_annotate.py (1 hunks)
🧰 Additional context used
🪛 actionlint
.github/workflows/perf-test.yml
50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
54-54: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
121-121: shellcheck reported issue in this script: SC2086:info:1:40: Double quote to prevent globbing and word splitting
(shellcheck)
🔇 Additional comments (4)
.github/workflows/perf-test.yml (4)
1-21
: LGTM: Workflow trigger configuration is well-defined.The workflow trigger configuration is appropriate for performance regression testing. It runs on pushes and pull requests to the main branch, with specific path filters to ensure it only executes when relevant files are changed. This approach helps to optimize CI resources by avoiding unnecessary runs.
130-135
: LGTM: report-no-src-changes job is simple and effective.The report-no-src-changes job serves its purpose well. It runs only when both builds are cached, indicating no changes to the cpp-linter source code. The simple notice it logs provides clear feedback in the GitHub Actions UI, which is helpful for developers and maintainers.
1-135
: Overall: Well-structured workflow with room for addressing environment-specific performance concerns.The workflow is well-designed and aligns with the PR objectives of implementing CI for performance regression detection. It covers building, benchmarking, and reporting, with efficient use of caching to optimize CI resources.
However, there's an important concern raised in the PR comments that this workflow doesn't address:
The significant performance discrepancies observed between local and CI environments (12s on Windows, 55s on Linux locally, vs 165s in CI) could lead to unreliable performance regression detection. To address this:
- Consider adding environment information logging to the benchmark job. This could include details like CPU model, available memory, and any other relevant system information. For example:
- name: Log environment information run: | echo "CPU Info:" lscpu echo "Memory Info:" free -h echo "Disk Info:" df -h
Investigate the possibility of using a self-hosted runner that more closely matches the development environment. This could provide more consistent and relevant performance metrics.
Add a step to compare the benchmark results against known baselines for different environments. This could help in normalizing the results and making them more comparable across different setups.
By addressing these points, you can improve the reliability and usefulness of the performance regression detection in your CI process.
🧰 Tools
🪛 actionlint
50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
54-54: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting
(shellcheck)
121-121: shellcheck reported issue in this script: SC2086:info:1:40: Double quote to prevent globbing and word splitting
(shellcheck)
65-128
: 🛠️ Refactor suggestionLGTM: Benchmark job is comprehensive, with room for minor improvements.
The benchmark job is well-structured and provides a comprehensive performance comparison between different builds. The use of hyperfine for benchmarking and the comparison with the pure-python implementation are excellent choices.
Suggestions for improvement:
Consider increasing the number of runs in the hyperfine command (currently set to 2) for more statistically significant results. A value between 5 and 10 might provide a better balance between accuracy and execution time.
Add warm-up runs to the hyperfine command to reduce the impact of cold starts. For example:
hyperfine --warmup 2 --runs 5 # ... (rest of the command)
- Consider adding a step to compare the benchmark results with a predefined threshold to automatically detect significant performance regressions. This could be done using a custom script or a GitHub Action specifically designed for benchmark comparisons.
To ensure the benchmarking process is consistent and reliable, let's verify the existence and contents of the
perf_annotate.py
script:✅ Verification successful
LGTM: Benchmark job verification successful.
The benchmarking setup is correctly implemented with the presence of
perf_annotate.py
, ensuring performance results are appropriately annotated. The benchmark job configuration is comprehensive and reliable.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the existence and contents of perf_annotate.py if [ -f .github/workflows/perf_annotate.py ]; then echo "perf_annotate.py exists. Contents:" cat .github/workflows/perf_annotate.py else echo "perf_annotate.py does not exist in the expected location." fiLength of output: 1837
🧰 Tools
🪛 actionlint
121-121: shellcheck reported issue in this script: SC2086:info:1:40: Double quote to prevent globbing and word splitting
(shellcheck)
Compares two release builds of cpp-linter binary and pure python package:
Caching is enabled to reduce CI runtime.
Results are output to the CI workflow's job summary. This CI does not (currently) fail when a regression is detected.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
cpp-linter
.Bug Fixes