Skip to content

Latest commit

 

History

History
80 lines (50 loc) · 5.83 KB

cncf_chart_creation.md

File metadata and controls

80 lines (50 loc) · 5.83 KB

Back to the cncf/velocity README.md file

Guide to non-github project processing

Other useful notes

Guide to the CNCF projects chart creation

analysis.rb can be used to create data for a Cloud Native Computing Foundation projects bubble chart such as this one sample chart

The chart itself can be generated in a google sheet.

Chart data

Go to this CNCF page to find a list of current projects.

For every project, find a github repo and add it to a query appropriately - either as an org or a single repo or both. If a project does not have a GitHub repo or only lists a mirror, skip it for now but later add manually.

Run the query for a year, for example: ./run_bq.sh cncf 2024-01-01 2025-01-01. It takes about 900GB and costs about $15-$25+.

It will generate a file for example: data/data_cncf_projects_20240101_20250101.csv.

  • You can optionally compare commits counts from BigQuery to git commits counts via: PG_PASS=... ./shells/get_git_commits_count.sh proj_db YYYY-MM-DD YYYY-MM-DD.
  • You can optionally compare commits counts from BigQuery to DevStats commits counts via: PG_PASS=... ./shells/get_devstats_commits_count.sh proj_db YYYY-MM-DD YYYY-MM-DD.
  • Those steps are possible only from DevStats kubernetes node or if you have DevStats installed locally. Normally this is not needed and should be skipped.

Run analysis.rb with (you may lack CSV header, use org,repo,activity,comments,prs,commits,issues,authors_alt2,authors_alt1,authors,pushes in this case):

[SKIP_TOKENS=''] ruby analysis.rb data/data_cncf_projects_20240101_20250101.csv projects/projects_cncf_20240101_20250101.csv map/hints.csv map/urls.csv map/defmaps.csv map/skip.csv map/ranges_sane.csv

Some projects are defined as regexps inside one or more orgs - BQ query tracks their orgs and config specifies which repos go to which project. You need to remove remaining repos for those orgs from the report.

Currently manually check for oam-dev, layer5io and pixie-labs in projects/projects_cncf_20240101_20250101.csv file. Also check for last column being empty /,"", /oam-dev\|layer5io\|pixie-labs.

Update forks files used for LF and Top30 generation: ./merge_forks.rb lf_forks.json forks.json, ./merge_forks.rb all_forks.json forks.json.

Now update commits counts to use git instead of BigQuery data: (remember to update devstats:util_sql/only_bots.sql).

  • If updated forks JSON(s) then generate devstats-reports docker image: DOCKER_USER=lukaszgryglicki SKIP_TEST=1 SKIP_PROD=1 SKIP_FULL=1 SKIP_MIN=1 SKIP_GRAFANA=1 SKIP_TESTS=1 SKIP_PATRONI=1 SKIP_STATIC=1 SKIP_API=1 ./images/build_images.sh.
  • Create devstats-reports pod, shell into it and run: ./velocity/update_cncf_projects_commits.sh 2024-01-01 2025-01-01 &>> /update.log &, tail -f /update.log. This takes hours to complete.
  • Download update: wget https://teststats.cncf.io/backups/data_cncf_update_2024-01-01_2025-01-01.csv. mv data_cncf_update_2024-01-01_2025-01-01.csv data/. The server can also be devstats.cncf.io instead of teststats.cncf.io.
  • Delete no more needed reporting pod: helm delete devstats-prod-reports.
  • ruby update_projects.rb projects/projects_cncf_20240101_20250101.csv data/data_cncf_update_2024-01-01_2025-01-01.csv -1.

If you have all CNCF projects databases locally, you can use old local approach to get commits count updates:

  • PG_PASS=... ./update_cncf_projects_commits.rb 2024-01-01 2025-01-01.

You can consider removing CNCF project as it is not a real CNCF project but internal CNCF foundation orgs analysis entry.

Make a copy of the google sheet.

Put results of the analysis into a file and import the data in the 'Data' sheet in cell H1. File -> Import -> Upload -> in the Import location section, select the radio button called 'Replace data at selected cell', click Import data

Select the Chart tab, it will be updated automatically

A gist describing this process is at https://gist.github.com/lukaszgryglicki/093ced06455a3f14f0e4d25459525207

Use this sheet for K8s vs. Non-K8s comparison.

Update the main README, set new 'Current reports' and move current to Past Reports.

CNCF Projects split by Kubernetes VS rest

To compare CNCF K8s data vs non-k8s data do ruby analysis.rb data/data_cncf_projects_20240101_20250101.csv projects/projects_cncf_k8s_non_k8s_20240101_20250101.csv map/hints_k8s_non_k8s.csv map/urls_k8s_non_k8s.csv map/defmaps_k8s_non_k8s.csv map/skip.csv map/ranges_sane.csv.

For this case, a new set of map files was created:

  • map/k8s_vs_rest_defmaps.csv - list of orgs found in query
  • map/k8s_vs_rest_urls.csv - definition of k8s vs rest
  • map/k8s_vs_rest_hints.csv - list of repos found in query

Lists of orgs/repos in the map files should contain all values used in any period query.

It should be noted that historically, as CNCF grows, new projects are added. To get data for 2016, a query similar to that in BigQuery/query_cncf_4p_201511_201610.sql should be run and the following year would be span by BigQuery/query_cncf_projects_201611_201710.sql. To prepare an analysis, a command similar to this should be run:

ruby analysis.rb data/data_cncf_projects_201611_201710.csv projects/projects_cncf_k8s_vs_rest_201611_201710.csv map/k8s_vs_rest_hints.csv map/k8s_vs_rest_urls.csv map/k8s_vs_rest_defmaps.csv map/skip.csv map/ranges_unlimited.csv