Track daily GitHub repository traffic (views and clones) and store metrics in Supabase (PostgreSQL/TimescaleDB).
Read the Blog: How to Track GitHub Repo Traffic Beyond 14 Days
- Fetches daily traffic metrics from GitHub REST API
- Stores metrics in Supabase (cloud Postgres) or a local database
- Visualizes metrics with Jupyter Notebook, Python script, or Grafana
This project uses Supabase as a cloud-hosted PostgreSQL database. This means:
- Your data is not tied to your local machine—you won't lose metrics if you change laptops or work from multiple devices.
- You can always access your GitHub traffic history in the future.
- Alternative: You can use a local PostgreSQL database if you prefer, but you'll need to update the connection string in your
.env
and ensure your local DB is running when you run the scripts.
New to Supabase? Learn how to setup Postgres here
Repositories to track are configured in a separate repos.yaml
file. Example:
repos:
- owner: MinimalDevops
repo: MD-Youtube-Summarizer
# - owner: another_owner
# repo: another_repo
You can add as many repositories as you like. The script will fetch metrics for all listed repos.
- Create a GitHub Personal Access Token (read-only scope)
- Add it to your
.env
file asGITHUB_TOKEN
Create a .env
file in your project root with the following content (replace the placeholder values with your actual credentials):
GITHUB_TOKEN=your_github_token_here
SUPABASE_HOST=your_supabase_host
SUPABASE_DB=your_database_name
SUPABASE_USER=your_db_user
SUPABASE_PASSWORD=your_db_password
SUPABASE_PORT=5432
- Create a Supabase project and database (or use a local PostgreSQL instance)
- Use the following SQL to create the table:
CREATE TABLE github_traffic (
repo_name TEXT NOT NULL,
timestamp DATE NOT NULL,
unique_views INTEGER,
total_views INTEGER,
unique_clones INTEGER,
total_clones INTEGER,
PRIMARY KEY (repo_name, timestamp)
);
- Add your DB credentials to
.env
All required packages (including Jupyter, pandas, matplotlib, plotly, PyYAML, etc.) are included in requirements.txt
.
pip install -r requirements.txt
Or, if using uv:
uv pip install -r requirements.txt
- Edit
github_traffic_tracker.py
to add your repositories - Run daily via cron or GitHub Actions
python github_traffic_tracker.py
To ensure you never miss a day of GitHub traffic data, set up a cronjob to run the tracker script automatically every day (e.g., at 9AM). This will keep your metrics up to date without manual intervention.
Example cron line (runs daily at 9AM):
0 9 * * * cd /path/to/your/project && /path/to/your/project/.venv/bin/python /path/to/your/project/github_traffic_tracker.py >> /path/to/your/project/cron.log 2>&1
- Adjust the paths as needed for your environment.
- This will log output to
cron.log
in your project directory. - Use
crontab -e
to add the line to your crontab without removing existing jobs.
Automating the fetch ensures you always have the latest metrics, even if you forget to run the script manually.
- Register your virtual environment as a Jupyter kernel:
python -m ipykernel install --user --name=github-traffic-venv --display-name "Python (github-traffic-venv)"
- (Recommended) Install the Jupyter extension in VSCode or your IDE for best experience.
- Run Jupyter Notebook:
jupyter notebook
- Open
visualize_metrics.ipynb
and run all cells. - Select the kernel named Python (github-traffic-venv).
- You'll see line plots for views and clones over time.
- The notebook will plot all repositories by default. You can filter repos in the notebook by editing the code to select specific
repo_name
values.
- Open
- Run the provided script:
python visualize.py
- This will display matplotlib plots for views and clones.
- You can now filter which repository to plot using a command-line argument:
python visualize.py --repo MinimalDevops/MD-Youtube-Summarizer
- Omit
--repo
to plot all repos together.
-
Install Grafana:
- On macOS (Homebrew):
brew install grafana brew services start grafana
- Or see Grafana Downloads for other platforms.
- On macOS (Homebrew):
-
Open Grafana:
- Go to http://localhost:3000
- Login (default:
admin
/admin
)
-
Add PostgreSQL Data Source:
- Use your Supabase credentials (host, db, user, password, port, SSL required)
-
Create a Dashboard:
- Generating Grafana Queries for Multiple Repos
To easily generate a custom SQL query for your Grafana dashboard (with separate lines for each repo and metric), use the provided
grafana_query.py
script. This script reads yourrepos.yaml
and outputs a SQL query tailored to your current repositories.
python grafana_query.py
- The script will print a SQL query to your terminal.
- Copy and paste this query into your Grafana panel's query editor.
- This will create separate lines for each repo's views and clones, with clear labels.
Benefit:
- No need to manually edit SQL for each new repo—just update
repos.yaml
and rerun the script. - Makes multi-repo dashboards easy and dynamic. Ensure:
- Set visualization to Time series/Line chart.
- Adjust time range (top right) to include your data (e.g., "Last 30 days").
- Generating Grafana Queries for Multiple Repos
To easily generate a custom SQL query for your Grafana dashboard (with separate lines for each repo and metric), use the provided
This project includes 3 Python scripts to generate different types of Grafana queries for your GitHub traffic data:
Generates time series queries for individual repositories with separate lines for each repo and metric.
python grafana_query.py
What it does:
- Creates separate lines for each repository's views and clones
- Shows time series data for each repo individually
- Perfect for line charts showing trends over time
- Each repo gets its own line in the visualization
Use case: When you want to see how each repository performs over time with separate trend lines.
Generates queries for all-time totals across all repositories combined.
python grafana_aggregated_totals_query.py
What it does:
- Shows total views and clones from the beginning until today
- Ignores the time range selected in Grafana
- Perfect for stat panels, gauges, or single value displays
- Gives you the grand total across all your repositories
Use case: When you want to show total traffic across all repositories as a single number (like a KPI card).
Generates queries showing totals for each individual repository.
python grafana_total_count_query.py
What it does:
- Shows totals for each individual repository
- Great for table panels to see which repos perform best
- Allows you to compare repositories side by side
- Perfect for ranking repositories by traffic
Use case: When you want to see which repositories are your top performers in a table or bar chart format.
- Run the appropriate script based on what you want to visualize
- Copy the generated SQL query from the terminal output
- Paste it into your Grafana panel's query editor
- Choose the right visualization type:
- Time series queries → Line charts, area charts
- Aggregated totals → Stat panels, gauges, single value
- Per-repo breakdown → Tables, bar charts
Benefits:
- Automatically includes all repositories from your
repos.yaml
- No need to manually edit SQL when adding new repos
- Each script serves a different visualization need
- Easy to regenerate queries when your repo list changes
- For Jupyter, ensure you select the correct Python kernel (your virtual environment) in VSCode or Jupyter Lab.
- If you have special characters in your DB password, URL-encode them in your
.env
. - For Grafana, always check the time range and use Table view to debug queries.
- You can use a local PostgreSQL database instead of Supabase if you prefer, but your data will only be available on that machine.
This project uses an upsert strategy when writing to the database:
- For each repository and day, the script attempts to insert a new row.
- If a row for the same
repo_name
andtimestamp
already exists, it is updated with the latest values from GitHub. - This is achieved using the SQL statement:
INSERT INTO github_traffic (...) VALUES (...) ON CONFLICT (repo_name, timestamp) DO UPDATE SET ...
- Result: No duplicate rows for the same repo and day, and your data is always up-to-date.
Unlike Grafana, the Jupyter notebook and Python script do not have built-in interactive date range filtering. If you want to plot data for a specific date range, you can filter the DataFrame in the code. For example:
df = df[(df['timestamp'] >= '2025-06-01') & (df['timestamp'] <= '2025-06-03')]
Add this line before plotting to restrict the visualization to a certain date range.
- If your plot is empty or missing data:
- Check the debug print output to see what data is being loaded and filtered.
- Make sure your
--repo
argument (for the script) matches the repo name exactly, but note that filtering is now case- and whitespace-insensitive. - Check for leading/trailing spaces or case mismatches in repo names.
- Use print statements to inspect the DataFrame content before plotting.
- If you only see a legend but no points, you may have only one data point or all values may be zero/NaN.
- For more than one data point, you can use line plots for a continuous view.
This project is licensed under the MIT License. You are free to use, modify, and distribute it as you wish.