A Python application that fetches analytics data for shortened URLs from the Tiny.cc API and stores it in Google BigQuery for further analysis.
This tool automates the collection of click statistics for Tiny.cc URLs. It:
- Fetches all shortened URLs from your Tiny.cc account
- Filters URLs created within the last 12 months
- Retrieves detailed click statistics for each URL
- Uploads the data to Google BigQuery
- Tracks which URLs have been processed to avoid duplicates
- Pagination Support: Handles large collections of URLs by paginating through the API
- Date-Based Filtering: Separates URLs into "recent" (last 4 months) and "older" (last 12 months) groups
- Deduplication: Avoids reuploading data that's already in BigQuery
- Progress Tracking: Maintains lists of processed URLs and URLs without click data
- Error Handling: Gracefully handles API errors and rate limits
- Python 3.6+
- Google Cloud Platform account with BigQuery access
- Tiny.cc Pro account with API access
-
Clone this repository:
git clone https://github.com/yourusername/tinycc-analytics-collector.git cd tinycc-analytics-collector
-
Install the required dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the project root with your Tiny.cc API key:TINYCC_API_KEY=your_api_key_here
-
Set up Google Cloud credentials:
- Create a service account with BigQuery access
- Download the JSON key file
- Set the environment variable
GOOGLE_APPLICATION_CREDENTIALS
to point to your key file:export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
Edit config.py
to set your specific configuration:
TINY_USERNAME
: Your Tiny.cc usernameBASE_URL
: The Tiny.cc API base URL (shouldn't need to change)BQ_PROJECT_ID
: Your Google Cloud project IDBQ_DATASET_ID
: The BigQuery dataset to useBQ_TABLE_ID
: The BigQuery table to store the data
Run the main script to fetch and upload URL statistics:
python main.py
To only fetch the list of URLs without processing stats:
python fetch_hashes.py
This will create older_hashes.txt
and recent_hashes.txt
files.
The data uploaded to BigQuery has the following schema:
Column | Description |
---|---|
tag | The Tiny.cc URL hash/identifier |
record_date | The date of the click statistics |
daily_total_clicks | Total clicks for that specific date |
daily_unique_clicks | Unique clicks for that specific date |
total_clicks | Cumulative total clicks for the URL |
unique_clicks | Cumulative unique clicks for the URL |
main.py
: Main script to run the full pipelinefetch_hashes.py
: Utilities to fetch and filter URL hashes from Tiny.ccconfig.py
: Configuration variablesbq_utils.py
: BigQuery helper functionsrequirements.txt
: Python dependencieshashes_temp_storage/
: Directory storing tracking files for processed URLs
The script maintains several files to track progress:
uploaded_hashes.txt
: URLs that have been successfully processedempty_hashes.txt
: URLs that returned no click dataolder_hashes.txt
: All URLs from the last 12 monthsrecent_hashes.txt
: URLs from the last 4 months
- The Tiny.cc API has rate limits. If you hit them, the script will stop processing.
- Only collects data for URLs created in the last 12 months.
Contributions are welcome! Please feel free to submit a Pull Request.
[Specify your license here]
- Tiny.cc for providing the API
- Google Cloud Platform for BigQuery services