trojan11111
diff --git a/‎.gitattributes
+3 b/‎.gitattributes
+3
diff --git a/‎.github/dependabot.yml
+12 b/‎.github/dependabot.yml
+12
diff --git a/‎.gitignore
+57 b/‎.gitignore
+57
diff --git a/‎ADD_PROJECT.md
+55 b/‎ADD_PROJECT.md
+55
diff --git a/‎FORBIDDEN_DATA.md
+12 b/‎FORBIDDEN_DATA.md
+12
@@ -0,0 +1,3 @@
+src/github_users.json filter=lfs diff=lfs merge=lfs -text
+src/stripped.json filter=lfs diff=lfs merge=lfs -text
+src/affiliated.json filter=lfs diff=lfs merge=lfs -text
@@ -0,0 +1,12 @@
+# To get started with Dependabot version updates, you'll need to specify which
+# package ecosystems to update and where the package manifests are located.
+# Please see the documentation for all configuration options:
+# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
+
+version: 2
+updates:
+  - package-ecosystem: "" # See documentation for possible values
+    directory: "/" # Location of package manifests
+    schedule:
+      interval: "weekly"
+dependabot.yml
@@ -0,0 +1,57 @@
+*.pyc
+*~
+.venv
+*.swp
+*.swo
+src/clearbit_tools/all_clearbit_queries.csv
+src/clearbit_tools/cncf_enriched.csv
+src/clearbit_tools/input_enriched.csv
+src/clearbit_tools/new_round_enriched.csv
+src/clearbit_tools/unknown_emails_enriched.csv
+all.log
+git.log
+all.log.xz
+#all.txt
+all.csv
+database.dump
+datelc.csv
+errors.txt
+header
+x
+src/ghusers/*
+# Data files
+# *.txt
+# *.csv
+*.out
+# *.json
+*.log
+*.old
+*.dat
+*.db
+*.dump
+err
+out
+out1
+out2
+out.diff
+geodata.tsv
+geodata.tsv.xz
+partial.json
+backup.json
+affiliated.json
+stripped.json
+*.htm*
+src/check_spell
+src/mtp
+src/check_shas
+src/map_orgs
+src/get_aff_files
+src/git_logs/*.log
+src/git_logs/*.1
+src/git_logs/*.2
+src/flist.txt
+/src/git.log_*
+*.secret
+allCountries.zip
+allCountries.txt
+allCountries.tsv
@@ -0,0 +1,55 @@
+# Add a non-cncf project/org ( project must be opensource ) to generate affiliations for it.
+1. Add the developers of your organization/project to be get affiliated in `./developers_affiliations.txt` in the proper format. `cd src/`. Now generate new email-map using `./import_affs.sh`, then: `mv email-map cncf-config/email-map`.
+ For e.g.
+     ```
+    developer1: email1@xyz, email2@abc, ...
+        company1
+        company2 until YYYY-MM-DD
+    developer2: email3@xyz, email4@pqr, ...
+        company3
+        company4 until YYYY-MM-DD
+     ```
+2. Clone all repositories of the project at `~/dev/project_name/`. For cloning either you can use `cncf/velocity` project and writing sql query in BigQuery folder or you can create a new shellscript file in `~/dev/cncf/gitdm/` location with name `clone_project_name.sh`. 
+    And just copy paste this code in that file
+    ```
+    #!/bin/bash
+    mkdir ~/dev/project_name/ 2>/dev/null
+    cd ~/dev/project_name || exit 1
+    git clone github_repo_clone_url_for_your_project1 || exit 1
+    git clone github_repo_clone_url_for_your_project2 || exit 1
+    ...
+    echo "All project_name repos cloned" 
+    ```
+    Paste all repository's clone_url manually.
+    Save file and run this script `chmod +x ./clone_project_name.sh`.
+    and then run this script - `./clone_project_name.sh` . This will clone all repos at the place `~/dev/project_name/`.
+
+    **Notes** : replace project_name with your github organization name.
+
+3. To generate `git.log` file, use this command `./all_repos_log.sh ~/dev/project_name/*`. Make it `uniq`.
+
+4. To run `cncf/gitdm` on a generated `git.log` file do: `~/dev/cncf/gitdm/cncfdm.py -i git.log -r "^vendor/|/vendor/|^Godeps/" -R -n -b ./src/ -t -z -d -D -U -u -o all.txt -x all.csv -a all_affs.csv > all.out`
+
+5. To generate human readable text affiliation files: `SKIP_COMPANIES="(Unknown)" ./gen_aff_files.sh`
+
+6. If updating via `ghusers.sh` or `ghusers_cached.sh` (step 6), please update `repos` array in `./ghusers.rb` with your org/project repos lists, then run `generate_actors.sh` too. But before it, make sure that you had set devstats and update `./generate_actors.sh` after first line with `sudo -u postgres psql -tA your_pg_database_name < ~/dev/go/src/devstats/util_sql/all_actors.sql > actors.txt`. now run `./generate_actors.sh`.
+
+7. Consider `./ghusers_cached.sh` or `./ghusers.sh` (if you run this, then copy result json somewhere and get 0-committers from previous version to save GH API points). Sometimes you should just run `./ghusers.sh` without cache.
+
+8. `ghusers_partially_cached.sh` will refetch repos metadata and commits and get users data from `github_users.json` so you can save a lot of API points.
+
+9. To update (enchance) `github_users.json` with new affiliations `./enchance_json.sh`.
+
+10. To merge multiple GitHub logins data (for example propagate known affiliation to unknown or not found on the same GitHub login) run: `./merge_github_logins.sh`.
+11. Because this can find new affiliations you can now use `./import_from_github_users.sh` to import back from `github_users.json` and then restart from step 3.
+
+12. Run `./correlation.sh` and examine its output `correlations.txt` to try to normalize company names and remove common suffixes like Ltd., Corp. and downcase/upcase differences.
+
+13. Run `./lookup_json.sh` and examine its output JSONs - those GitHub profiles have some useful data directly available - this will save you some manual research work.
+
+14. ALWAYS before any commit to GitHub run: `./handle_forbidden_data.sh` to remove any forbiden affiliations, please also see `FORBIDDEN_DATA.md`.
+
+15. You can use `./clear_affiliations_in_json.sh` to clear all affiliations on a generated `github_users.json`.
+
+16. You can create smaller final json for `cncf/devstats` using `./strip_json.sh github_users.json stripped.json; cp stripped.json ~/dev/go/src/devstats/github_users.json`.
+
@@ -0,0 +1,12 @@
+# How to remove affiliations data
+
+If you do not want your personal data like names and/or emails to be listed you can do the following.
+
+- Clone cncf/gitdm locally
+- `cd src/`
+- Run `./add_forbidden_data.rb 'youremail!domain.com'` or `./add_forbidden_data.rb 'YourName' '[email protected]' 'your!email.com'.
+- Phrase to be removed should not contain: `,`, `;`, `'`, `"`, `/`, `\` characters.
+- Program will generate SHA256 hashes of data provided from command line arguments and add them to `cncf-config/forbidden.csv` file.
+- Create PR with updated `cncf-config/forbidden.csv` file. That way your sensitive data won't be visible in a PR.
+- We will run `./handle_forbidden_data.sh` on your PR that will generate report with files containing that information.
+- We will remove requested informations and merge your PR.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+src/github_users.json filter=lfs diff=lfs merge=lfs -text`
	`2`	`+src/stripped.json filter=lfs diff=lfs merge=lfs -text`
	`3`	`+src/affiliated.json filter=lfs diff=lfs merge=lfs -text`