Skip to content

hhrh/crypto-airflow-dbt-postgresql-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crypto Price Data Pipeline with Apache Airflow, dbt, PostgreSQL & Docker

A fully containerized modern data pipeline that ingests real-time cryptocurrency prices, loads them into PostgreSQL, transforms the data using dbt, and orchestrates the workflow with Apache Airflow.

Overview

This project demonstrates how to build a real-world, production-grade ELT pipeline using a modern data engineering stack. It extracts hourly cryptocurrency price data from the CoinGecko API, loads raw data into PostgreSQL, and transforms it into analytics-ready models using dbt. All orchestrated by Airflow and running inside Docker containers.

Architecture

graph LR
    subgraph DAG_crypto_price_etl
        A(run_etl) --> B(run_dbt) --> C(run_dbt_tests)
    end

    %% Colors:
    %% Airflow tasks: #4a90e2 (blue)
    %% Python script and Postgres: #50e3c2 (teal)
    %% dbt models and tests: #7ed321 (green)
    %% Extensions: #f5a623 (orange)

    style A fill:#4a90e2,stroke:#333,stroke-width:1px,color:#fff
    style B fill:#4a90e2,stroke:#333,stroke-width:1px,color:#fff
    style C fill:#4a90e2,stroke:#333,stroke-width:1px,color:#fff

    style D fill:#7ed321,stroke:#333,stroke-width:1px,color:#000

    style F fill:#7ed321,stroke:#333,stroke-width:1px,color:#000
    style I fill:#7ed321,stroke:#333,stroke-width:1px,color:#000

    style J fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
    style K fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
    style L fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
    style M fill:#f5a623,stroke:#333,stroke-width:1px,color:#000

    A --> D(fetch_crypto.py<br>PostgreSQL: crypto_prices)
    B --> F(dbt models:<br>coin_volatility.sql,<br>price_changes.sql,<br>daily_price_changes.sql)
    C --> I(dbt tests:<br>not_null, unique, etc.)

    F --> J(Dashboard Tool<br>e.g. Superset, Metabase)
    F --> K(Cloud Data Warehouse<br>e.g. Snowflake, BigQuery)
    F --> L(Streaming Layer<br>e.g. Kafka, Spark)
    I --> M(Data Quality Platform<br>e.g. Great Expectations)
Loading
Color Category Includes
🟦 Blue Airflow Tasks run_etl, run_dbt, run_dbt_tests — scheduled orchestration in Airflow
🟩 Green Core Data Logic fetch_crypto.py, crypto_prices table, dbt models (coin_volatility, etc.), dbt tests
🟧 Orange Optional Extensions Dashboards (e.g. Superset), Cloud Warehouses, Streaming, Data Quality Tools

Tech Stack

Layer Tool Purpose
Orchestration Apache Airflow DAG scheduling, task execution, retries, logging
Programming Python Extracts and loads data
Transformation dbt (data build tool) SQL models, testing, documentation
Storage PostgreSQL Stores both raw and transformed data
Infrastructure Docker + Compose Containerized local dev environment

Project Structure

crypto_airflow_project/
├── dags/                     # Airflow DAG definition
│   └── crypto_price_etl.py
├── scripts/                  # Python ETL script
│   └── fetch_crypto.py
├── crypto_dbt/               # dbt project (models, sources, tests)
│   ├── models/
│   ├── dbt_project.yml
├── docker-compose.yml        # Orchestrates all services
├── Dockerfile                # Custom Airflow image with dbt installed
├── profiles.yml              # dbt connection config
├── .gitignore
└── README.md

Features

  • Extracts real-time prices for Bitcoin, Ethereum, and Solana
  • Loads raw data into a normalized crypto_prices table
  • dbt transforms raw data into:
  • Clean staging models
  • Aggregated hourly metrics
  • Dashboard-ready marts
    • dbt tests ensure data quality (no nulls, unique timestamps)
    • Airflow schedules pipeline every hour with UI, logs, and retries
    • Fully dockerized, easy to run and deploy

How to Run This Project

1. Clone the Repository

git clone https://github.com/yourusername/crypto-airflow-dbt-pipeline.git
cd crypto-airflow-dbt-pipeline

2. Start the Environment

docker-compose up --build

Wait for all services to boot (especially Airflow webserver and scheduler).

3. Access the Airflow UI

Go to: http://localhost:8080

  • Username: admin
  • Password: admin

4. Trigger the DAG

  • Enable crypto_price_etl DAG
  • Manually trigger or wait for schedule (@hourly)
Screenshot 2025-07-15 at 10 47 10 AM

Example dbt Models

Some of the dbt models you’ll find include:

  • coin_volitility.sql – Standard deviation of price per day
  • price_changes.sql – Calculate % change in price from previous day/hour
  • daily_avg_prives.sql – aggregated fact table for daily average crypto prices
    • this could power a dashboard or weekly insights

Example output of daily_avg_prices model: Screenshot 2025-07-15 at 4 41 49 PM

Each model has documentation and tests (e.g., not_null, unique).

Tests + Validation

Tool What’s Tested
dbt tests - No null price_usd
- Unique timestamp per coin
Airflow logs Full task traceability and retry behavior
PostgreSQL Data validation via psql or GUI

Potential Extensions

  • Add Metabase or Superset for dashboarding
  • Deploy to AWS MWAA or GCP Cloud Composer
  • Integrate Great Expectations for deep data validation
  • Add GitHub Actions for CI/CD (dbt docs, tests, deploy)

License

Copyright © 2025 Hardy Fenam All Rights Reserved. Unauthorized use, reproduction, modification, or distribution of any part of this codebase is strictly prohibited.

Acknowledgements

About

Crypto Price Pipeline with Airflow, dbt, and PostgreSQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published