A fully containerized modern data pipeline that ingests real-time cryptocurrency prices, loads them into PostgreSQL, transforms the data using dbt, and orchestrates the workflow with Apache Airflow.
This project demonstrates how to build a real-world, production-grade ELT pipeline using a modern data engineering stack. It extracts hourly cryptocurrency price data from the CoinGecko API, loads raw data into PostgreSQL, and transforms it into analytics-ready models using dbt. All orchestrated by Airflow and running inside Docker containers.
graph LR
subgraph DAG_crypto_price_etl
A(run_etl) --> B(run_dbt) --> C(run_dbt_tests)
end
%% Colors:
%% Airflow tasks: #4a90e2 (blue)
%% Python script and Postgres: #50e3c2 (teal)
%% dbt models and tests: #7ed321 (green)
%% Extensions: #f5a623 (orange)
style A fill:#4a90e2,stroke:#333,stroke-width:1px,color:#fff
style B fill:#4a90e2,stroke:#333,stroke-width:1px,color:#fff
style C fill:#4a90e2,stroke:#333,stroke-width:1px,color:#fff
style D fill:#7ed321,stroke:#333,stroke-width:1px,color:#000
style F fill:#7ed321,stroke:#333,stroke-width:1px,color:#000
style I fill:#7ed321,stroke:#333,stroke-width:1px,color:#000
style J fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
style K fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
style L fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
style M fill:#f5a623,stroke:#333,stroke-width:1px,color:#000
A --> D(fetch_crypto.py<br>PostgreSQL: crypto_prices)
B --> F(dbt models:<br>coin_volatility.sql,<br>price_changes.sql,<br>daily_price_changes.sql)
C --> I(dbt tests:<br>not_null, unique, etc.)
F --> J(Dashboard Tool<br>e.g. Superset, Metabase)
F --> K(Cloud Data Warehouse<br>e.g. Snowflake, BigQuery)
F --> L(Streaming Layer<br>e.g. Kafka, Spark)
I --> M(Data Quality Platform<br>e.g. Great Expectations)
Color | Category | Includes |
---|---|---|
🟦 Blue | Airflow Tasks | run_etl , run_dbt , run_dbt_tests — scheduled orchestration in Airflow |
🟩 Green | Core Data Logic | fetch_crypto.py , crypto_prices table, dbt models (coin_volatility , etc.), dbt tests |
🟧 Orange | Optional Extensions | Dashboards (e.g. Superset), Cloud Warehouses, Streaming, Data Quality Tools |
Layer | Tool | Purpose |
---|---|---|
Orchestration | Apache Airflow | DAG scheduling, task execution, retries, logging |
Programming | Python | Extracts and loads data |
Transformation | dbt (data build tool) | SQL models, testing, documentation |
Storage | PostgreSQL | Stores both raw and transformed data |
Infrastructure | Docker + Compose | Containerized local dev environment |
crypto_airflow_project/
├── dags/ # Airflow DAG definition
│ └── crypto_price_etl.py
├── scripts/ # Python ETL script
│ └── fetch_crypto.py
├── crypto_dbt/ # dbt project (models, sources, tests)
│ ├── models/
│ ├── dbt_project.yml
├── docker-compose.yml # Orchestrates all services
├── Dockerfile # Custom Airflow image with dbt installed
├── profiles.yml # dbt connection config
├── .gitignore
└── README.md
- Extracts real-time prices for Bitcoin, Ethereum, and Solana
- Loads raw data into a normalized
crypto_prices
table - dbt transforms raw data into:
- Clean staging models
- Aggregated hourly metrics
- Dashboard-ready marts
- dbt tests ensure data quality (no nulls, unique timestamps)
- Airflow schedules pipeline every hour with UI, logs, and retries
- Fully dockerized, easy to run and deploy
git clone https://github.com/yourusername/crypto-airflow-dbt-pipeline.git
cd crypto-airflow-dbt-pipeline
docker-compose up --build
Wait for all services to boot (especially Airflow webserver and scheduler).
Go to: http://localhost:8080
- Username:
admin
- Password:
admin
- Enable
crypto_price_etl
DAG - Manually trigger or wait for schedule (
@hourly
)

Some of the dbt models you’ll find include:
coin_volitility.sql
– Standard deviation of price per dayprice_changes.sql
– Calculate % change in price from previous day/hourdaily_avg_prives.sql
– aggregated fact table for daily average crypto prices- this could power a dashboard or weekly insights
Example output of daily_avg_prices model:
Each model has documentation and tests (e.g., not_null
, unique
).
Tool | What’s Tested |
---|---|
dbt tests | - No null price_usd - Unique timestamp per coin |
Airflow logs | Full task traceability and retry behavior |
PostgreSQL | Data validation via psql or GUI |
- Add Metabase or Superset for dashboarding
- Deploy to AWS MWAA or GCP Cloud Composer
- Integrate Great Expectations for deep data validation
- Add GitHub Actions for CI/CD (dbt docs, tests, deploy)
Copyright © 2025 Hardy Fenam All Rights Reserved. Unauthorized use, reproduction, modification, or distribution of any part of this codebase is strictly prohibited.