This project implements a complete ETL pipeline containerized with Docker. It extracts data from a Source Database via a FastAPI service, transforms it using Dagster assets, and loads it into a Target Database.
- source_db: PostgreSQL (Pre-loaded with sample data).
- target_db: PostgreSQL (Destination for transformed data).
- fastapi: Middleware API that queries
source_db. - dagster: Orchestrator that pulls from
fastapiand writes totarget_db.
my-etl-project/
├── api/ # The FastAPI Service
│ ├── api.py
│ ├── Dockerfile
│ └── requirements.txt
├── etl/ # The Dagster Service
│ ├── src/
│ │ ├── extract.py
│ │ ├── load.py
│ │ ├── logging_config.py
│ │ └── transform.py
│ ├── dagster.yaml # (Optional config to setup permanent psql DB)
│ ├── definitions.py
│ ├── Dockerfile
│ ├── main.py # Python script version of ETL
│ ├── requirements.txt
│ └── wait-for-services.sh # Ensures right container building order
├── scripts/ # Database init scripts
│ └── create-source.sql
├── .env # Secrets (DB user, pass, etc.)
└── docker-compose.yml
- Docker Desktop (or Docker Engine + Compose plugin)
Create a .env file in the project root:
# Target Database (Postgres) Configuration
TARGET_DB=delfos_target
# Both Databases Credentials
DB_USER=postgres
DB_PASSWORD=postgres
DB_PORT=5432
NOTICE: The inserted database credentials must belong to a superuser
Run the stack using Docker Compose:
docker compose up --build -d
Wait ~15 seconds for the databases to initialize and the API to become healthy.
Open the Dagster UI: http://localhost:3000
- Click Overview or Assets to see your asset graph.
- Click Materialize All (top right) to run the full pipeline.
- Once finished, the green status indicates data has been loaded into
target_db.
You can run the ETL logic manually via a standalone script included in the container. This bypasses the Dagster scheduler and runs immediately.
docker compose exec dagster python main.py 'DD-MM-YYYY'
This executes etl/main.py inside the running environment, using the same logic and connections as the Dagster job.
Open your browser to: http://localhost:8000/docs
- You should see the Swagger UI.
- Try the
/healthendpoint to see the API status (on/offline). - Try the
/dataendpoint to ensure it can read from the Source DB.
View Logs If something isn't working, check the logs for a specific service:
docker compose logs -f dagster
# or
docker compose logs -f fastapi
Check Database Content To verify data inside the containers without installing local tools:
# Check Source DB
docker compose exec -it source_db psql -U myuser -d source_db -c "\dt"
# Check Target DB (verify ETL results)
docker compose exec -it target_db psql -U myuser -d postgres -c "SELECT * FROM power_data LIMIT 5;"
Full Reset (Fixes most errors) If your database schema changes or startup scripts fail, perform a clean reset:
# Stops containers and DELETES database volumes
docker compose down -v
# Rebuilds and starts fresh
docker compose up --build -d
Permission Denied: ./wait-for-services.sh**
If you see a permission error on startup:
- Mac/Linux: Run
chmod +x etl/wait-for-services.shlocally and rebuild. - Windows: Ensure your git client didn't convert line endings to CRLF.