Skip to content

Latest commit

 

History

History
212 lines (143 loc) · 4.32 KB

File metadata and controls

212 lines (143 loc) · 4.32 KB

Setup Guide

This guide will walk you through setting up your local development environment for the Modern Data Stack Bootcamp.

Table of Contents

  1. Installing Python
  2. Setting Up Virtual Environment
  3. Installing Dependencies
  4. Configuring dbt
  5. Downloading Chinook Database
  6. Running Your First dbt Command
  7. Troubleshooting

Installing Python

Check Your Python Version

First, verify you have Python 3.9 or higher:

python --version

If you need to install or upgrade Python:

  • macOS: Use Homebrew: brew install python@3.11
  • Windows: Download from python.org
  • Linux: Use your package manager: sudo apt install python3.9

Setting Up Virtual Environment

A virtual environment isolates your project dependencies:

Create Virtual Environment

# Navigate to project directory
cd modern-data-stack-bootcamp

# Create virtual environment
python -m venv venv

Activate Virtual Environment

macOS/Linux:

source venv/bin/activate

Windows:

venv\Scripts\activate

You should see (venv) in your terminal prompt.

Installing Dependencies

With your virtual environment activated, install the requirements:

pip install -r requirements.txt

This will install:

  • DuckDB
  • dbt-core and dbt-duckdb
  • Great Expectations
  • pandas, matplotlib
  • jupyter

Configuring dbt

1. Create dbt Profiles Directory

mkdir -p ~/.dbt

Windows users: Replace ~/.dbt with %USERPROFILE%\.dbt

2. Copy and Configure Profiles

Copy the example profiles file:

cp profiles.yml.example ~/.dbt/profiles.yml

3. Update Profiles File

Edit ~/.dbt/profiles.yml and update the path:

bootcamp:
  outputs:
    dev:
      type: duckdb
      path: '/absolute/path/to/bootcamp.duckdb'  # Update this path
      extensions:
        - sqlite_scanner
  target: dev

Downloading Chinook Database

The Chinook database is a sample SQLite database. You can download it:

# Using curl (macOS/Linux)
curl -O https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite

# Using wget (Linux)
wget https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite

Or visit: https://github.com/lerocha/chinook-database

Running Your First dbt Command

Initialize dbt (if needed)

If this is your first dbt project in the directory:

dbt init

(You can skip this if the directory structure already exists.)

Verify Configuration

Run dbt debug to check your setup:

dbt debug

You should see output like:

Running with dbt=1.7.0
dbt version: 1.7.0
python version: 3.11.0
python path: /path/to/python
os info: macOS-13.0.0
Using profiles dir at /Users/you/.dbt
Using profiles.yml file at /Users/you/.dbt/profiles.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
 - git [OK found]

Connection:
  Connection: "bootcamp" has been tested.
  
All checks passed!

Troubleshooting

dbt Command Not Found

Problem: dbt: command not found or dbt: No such file or directory

Solution:

  1. Make sure your virtual environment is activated
  2. Reinstall dbt: pip install dbt-core dbt-duckdb

Connection Error

Problem: Catalog Error: Failed to initialize

Solution:

  1. Verify your DuckDB file path in profiles.yml is correct
  2. Make sure the database file exists at that location
  3. Check that you have read permissions on the file

Import Error for dbt-duckdb

Problem: ImportError: cannot import name 'dbt' from 'dbt.adapters.duckdb'

Solution:

pip uninstall dbt-duckdb
pip install dbt-duckdb

Next Steps

Once setup is complete:

  1. Open the Phase 0 notebook: phase_0_modern_data_stack.ipynb
  2. Follow along with the instructional videos
  3. Complete the TODO exercises
  4. Build your first dbt models!

Getting Help

If you're still stuck:

  1. Check the Troubleshooting Guide
  2. Review common issues in our FAQ
  3. Ask in the course discussion forum

Good luck! 🚀