Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CI to detect performance regressions #53

Merged
merged 3 commits into from
Oct 4, 2024
Merged

add CI to detect performance regressions #53

merged 3 commits into from
Oct 4, 2024

Conversation

2bndy5
Copy link
Collaborator

@2bndy5 2bndy5 commented Oct 3, 2024

Compares two release builds of cpp-linter binary and pure python package:

  1. the previous commit (for push events) or the base branch of a PR
  2. the newest commit on the branch
  3. the latest v1.x release of the pure-python cpp-linter package

Caching is enabled to reduce CI runtime.

Results are output to the CI workflow's job summary. This CI does not (currently) fail when a regression is detected.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new GitHub Actions workflow for automated performance regression testing of the cpp-linter.
    • Added a script for analyzing performance benchmarks, providing insights on performance changes between builds.
  • Bug Fixes

    • Improved handling of performance regression detection with clear output messages for users.

@2bndy5 2bndy5 added the enhancement New feature or request label Oct 3, 2024
@2bndy5 2bndy5 force-pushed the add-perf-ci branch 5 times, most recently from a58ee67 to a215876 Compare October 3, 2024 22:17
@2bndy5
Copy link
Collaborator Author

2bndy5 commented Oct 4, 2024

This is bugging me 😡

Locally I invoke the same commands (using the same exact machine dual-booted):

  • on Windows it takes around 12 seconds
  • on Linux it takes around 55 seconds

In the CI workflow (which uses ubuntu-latest), it takes around 165 seconds to run!

At least the runtime is consistent between pure-python and pure-rust. It does not seem to matter if I use clang v18 or v14.

Compares two release builds of cpp-linter binary:
1. the previous commit (for push events) or the base branch of a PR
2. the newest commit on the branch
3. the latest v1.x release of the pure-python cpp-linter package

Caching is enabled to reduce CI runtime.

Results are output to the CI workflow's job summary.
This CI does not (currently) fail when a regression is detected.
@2bndy5 2bndy5 marked this pull request as ready for review October 4, 2024 03:46
Copy link
Contributor

coderabbitai bot commented Oct 4, 2024

Walkthrough

A new performance regression testing workflow has been added to the cpp-linter project through the introduction of the perf-test.yml file in the GitHub Actions workflows. This workflow includes three jobs: building the project for current and previous commits, benchmarking performance differences, and reporting when no source changes occur. Additionally, a new script named perf_annotate.py has been created to analyze benchmark results from a JSON file, providing insights into performance changes and potential regressions.

Changes

File Path Change Summary
.github/workflows/perf-test.yml Introduced a new workflow for performance regression testing with jobs for building, benchmarking, and reporting.
.github/workflows/perf_annotate.py Added a script to analyze performance benchmarks from a JSON file, calculating differences and outputting results.

Possibly related PRs

  • feat: add optional colored log output #52: Modifications to logging functionality, including a new dependency for colored log output, which may relate to the logging aspects of the performance testing workflow.

Poem

🐰 In the meadow, where benchmarks play,
A new workflow hops in, brightening the day.
With builds and tests, it dances around,
Reporting changes, where performance is found.
So let’s cheer for the code, both swift and spry,
As we measure and analyze, oh me, oh my! 🌼


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0bdda0a and 409a238.

📒 Files selected for processing (2)
  • .github/workflows/perf-test.yml (1 hunks)
  • .github/workflows/perf_annotate.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/perf-test.yml
  • .github/workflows/perf_annotate.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Oct 4, 2024

I give up for now. I may play with this later when I experiment with improving the async performance in rust.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
.github/workflows/perf-test.yml (2)

22-63: LGTM: Build job is well-structured with efficient caching.

The build job is well-designed, using a matrix strategy to build both current and previous commits efficiently. The caching mechanism is implemented correctly, which should significantly improve performance across runs.

One minor suggestion for improvement:

Consider adding a step to validate the cached binaries, ensuring they are not corrupted or incomplete. This could be a simple check like verifying the file size or running a quick command with the binary.

Example:

- name: Validate cached binary
  if: steps.cache.outputs.cache-hit == 'true'
  run: |
    if [ ! -x target/release/cpp-linter ] || ! target/release/cpp-linter --version; then
      echo "Cached binary is invalid, rebuilding..."
      echo "cache-hit=false" >> $GITHUB_OUTPUT
    fi
🧰 Tools
🪛 actionlint

50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)


54-54: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)


50-50: Minor: Address shellcheck warnings for improved script robustness.

There are a few instances where shellcheck has identified potential issues with environment variable usage. While these are minor, addressing them can improve the robustness of the script.

Apply the following changes to address the shellcheck warnings:

- run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> $GITHUB_OUTPUT
+ run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> "$GITHUB_OUTPUT"

- run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> $GITHUB_OUTPUT
+ run: echo "is-cached=${{ steps.cache.outputs.cache-hit }}" >> "$GITHUB_OUTPUT"

- run: cat ${{ runner.temp }}/benchmark.md >> $GITHUB_STEP_SUMMARY
+ run: cat "${{ runner.temp }}/benchmark.md" >> "$GITHUB_STEP_SUMMARY"

These changes add double quotes around the environment variables to prevent potential issues with word splitting or globbing.

Also applies to: 54-54, 121-121

🧰 Tools
🪛 actionlint

50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 023c170 and d7484f6.

📒 Files selected for processing (2)
  • .github/workflows/perf-test.yml (1 hunks)
  • .github/workflows/perf_annotate.py (1 hunks)
🧰 Additional context used
🪛 actionlint
.github/workflows/perf-test.yml

50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)


54-54: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)


121-121: shellcheck reported issue in this script: SC2086:info:1:40: Double quote to prevent globbing and word splitting

(shellcheck)

🔇 Additional comments (4)
.github/workflows/perf-test.yml (4)

1-21: LGTM: Workflow trigger configuration is well-defined.

The workflow trigger configuration is appropriate for performance regression testing. It runs on pushes and pull requests to the main branch, with specific path filters to ensure it only executes when relevant files are changed. This approach helps to optimize CI resources by avoiding unnecessary runs.


130-135: LGTM: report-no-src-changes job is simple and effective.

The report-no-src-changes job serves its purpose well. It runs only when both builds are cached, indicating no changes to the cpp-linter source code. The simple notice it logs provides clear feedback in the GitHub Actions UI, which is helpful for developers and maintainers.


1-135: Overall: Well-structured workflow with room for addressing environment-specific performance concerns.

The workflow is well-designed and aligns with the PR objectives of implementing CI for performance regression detection. It covers building, benchmarking, and reporting, with efficient use of caching to optimize CI resources.

However, there's an important concern raised in the PR comments that this workflow doesn't address:

The significant performance discrepancies observed between local and CI environments (12s on Windows, 55s on Linux locally, vs 165s in CI) could lead to unreliable performance regression detection. To address this:

  1. Consider adding environment information logging to the benchmark job. This could include details like CPU model, available memory, and any other relevant system information. For example:
- name: Log environment information
  run: |
    echo "CPU Info:"
    lscpu
    echo "Memory Info:"
    free -h
    echo "Disk Info:"
    df -h
  1. Investigate the possibility of using a self-hosted runner that more closely matches the development environment. This could provide more consistent and relevant performance metrics.

  2. Add a step to compare the benchmark results against known baselines for different environments. This could help in normalizing the results and making them more comparable across different setups.

By addressing these points, you can improve the reliability and usefulness of the performance regression detection in your CI process.

🧰 Tools
🪛 actionlint

50-50: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)


54-54: shellcheck reported issue in this script: SC2086:info:1:58: Double quote to prevent globbing and word splitting

(shellcheck)


121-121: shellcheck reported issue in this script: SC2086:info:1:40: Double quote to prevent globbing and word splitting

(shellcheck)


65-128: 🛠️ Refactor suggestion

LGTM: Benchmark job is comprehensive, with room for minor improvements.

The benchmark job is well-structured and provides a comprehensive performance comparison between different builds. The use of hyperfine for benchmarking and the comparison with the pure-python implementation are excellent choices.

Suggestions for improvement:

  1. Consider increasing the number of runs in the hyperfine command (currently set to 2) for more statistically significant results. A value between 5 and 10 might provide a better balance between accuracy and execution time.

  2. Add warm-up runs to the hyperfine command to reduce the impact of cold starts. For example:

hyperfine
  --warmup 2
  --runs 5
  # ... (rest of the command)
  1. Consider adding a step to compare the benchmark results with a predefined threshold to automatically detect significant performance regressions. This could be done using a custom script or a GitHub Action specifically designed for benchmark comparisons.

To ensure the benchmarking process is consistent and reliable, let's verify the existence and contents of the perf_annotate.py script:

✅ Verification successful

LGTM: Benchmark job verification successful.

The benchmarking setup is correctly implemented with the presence of perf_annotate.py, ensuring performance results are appropriately annotated. The benchmark job configuration is comprehensive and reliable.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence and contents of perf_annotate.py
if [ -f .github/workflows/perf_annotate.py ]; then
  echo "perf_annotate.py exists. Contents:"
  cat .github/workflows/perf_annotate.py
else
  echo "perf_annotate.py does not exist in the expected location."
fi

Length of output: 1837

🧰 Tools
🪛 actionlint

121-121: shellcheck reported issue in this script: SC2086:info:1:40: Double quote to prevent globbing and word splitting

(shellcheck)

@2bndy5 2bndy5 merged commit 1ea0ea6 into main Oct 4, 2024
11 checks passed
@2bndy5 2bndy5 deleted the add-perf-ci branch October 4, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant