Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Number of pull requests" data appears inaccurate/misleading #32

Open
howardjohn opened this issue Nov 2, 2023 · 6 comments
Open

"Number of pull requests" data appears inaccurate/misleading #32

howardjohn opened this issue Nov 2, 2023 · 6 comments

Comments

@howardjohn
Copy link

In the velocity reports, we report "The y-axis is the total number of pull requests and issues".

From the query, this is determined by the total amount of PullRequestEvents: https://github.com/cncf/velocity/blob/8e1d1c189b65e2544fae7aec43c6381f9e4b4d82/BigQuery/velocity_cncf.sql#L19C18-L19C34.

A PullRequestEvent does not correlate 1:1 with "a PR" in a way a person would interpret a count of PRs, in my opinion. There are two reasonable approaches (merged PRs or opened PRs, strongly preferring merged PRs), neither of which this counts.

Per docs 'The action that was performed. Can be one of opened, edited, closed, reopened, assigned, unassigned, review_requested, review_request_removed, labeled, unlabeled, and synchronize.'. However, in practice I found this doesn't seem to be the case. Looking at a single day across github:

   1916 reopened
 168129 closed
 193220 opened

Even without the other possible events, we at least appear to be double counting PRs?

@lukaszgryglicki
Copy link
Member

Hi, I can take a look on October 13th the earliest, I'll be on KubeCon next week.

@lukaszgryglicki
Copy link
Member

Actually I've checked charts and we are considering PRs/Issues activities there not just opened PRs, and this is consistent even if we take data from non-github projects (we then count activitie son bugs/emials etc), now cc @caniszczyk what to do:

  1. Keep it as-is (it was decided that way, years back) but get through all docs and generated charts and add a specific information that we count PR/Issue activities.
  2. Update code to actually count PRs/Issues (unique PR/Issue IDs on all activities) everywhere, so from now on we will have different stats that previous reports.

This needs decision: either (1) or (2).

@lukaszgryglicki
Copy link
Member

This still needs decision, so I'll keep this open.

@linsun
Copy link

linsun commented Dec 7, 2023

Hi @lukaszgryglicki, thanks for looking into this and propose potential options. My vote is 2 as it is what number of PRs really means :)

@lukaszgryglicki
Copy link
Member

OK but this needs an approval and new reports will now look differentb than they were, let's wait for a decision.

@craigbox
Copy link

craigbox commented Dec 8, 2023

/cc @caniszczyk who I assume is the decider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants