Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @dbt-labs/dx
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ profiles.yml

.ruff_cache
__pycache__
dbt_internal_packages/
2 changes: 1 addition & 1 deletion .sqlfluff
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[sqlfluff]
templater = dbt
templater = dbt-cloud
dialect = snowflake
runaway_limit = 10
max_line_length = 80
Expand Down
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,14 +90,10 @@ You're now ready to start developing with dbt Cloud! Choose a path below (either

There are a few ways to load the data for the project:

- **Using the sample data in the repo**. Add `"jaffle-data"` to the `seed-paths` config in your `dbt_project.yml` as below. This means that when dbt is scanning folders for `seeds` to load it will look in both the `seeds` folder as is default, but _also_ the `jaffle-data` folder which contains a sample of the project data. Seeds are static data files in CSV format that dbt will upload, usually for reference models, like US zip codes mapped to country regions for example, but in this case the feature is hacked to do some data ingestion. This is not what seeds are meant to be used for (dbt is not a data loading tool), but it's useful for this project to give you some data to get going with quickly. Run a `dbt seed` and when it's done either delete the `jaffle-data` folder, remove `jaffle-data` from the `seed-paths` list, or ideally, both.

```yaml dbt_project.yml
seed-paths: ["seeds", "jaffle-data"]
```
- **Using the sample data in the repo**. Seeds are static data files in CSV format that dbt will upload, usually for reference models, like US zip codes mapped to country regions for example, but in this case the feature is hacked to do some data ingestion. This is not what seeds are meant to be used for (dbt is not a data loading tool), but it's useful for this project to give you some data to get going with quickly. Run the command below and when it's done either delete the `seeds/jaffle-data` folder, remove `jaffle-data` config from the `dbt_project.yml`, or ideally, both.

```bash
dbt seed
dbt seed --full-refresh --vars '{"load_source_data": true}'
```

- **Load the data via S3**. If you'd prefer a larger dataset (6 years instead of 1), and are working via the dbt Cloud IDE and your platform's web interface, you can also copy the data from a public S3 bucket to your warehouse into a schema called `raw` in your `jaffle_shop` database. [This is discussed here](#-load-the-data-from-s3).
Expand All @@ -108,9 +104,7 @@ dbt seed

Once your development platform of choice and dependencies are set up, use the following steps to get the project ready for whatever you'd like to do with it.

1. Ensure that you've deleted the `jaffle-data` folder or removed it from the `seed-paths` list in your `dbt_project.yml` (or, ideally, both) if you used the seed method to load the data. This is important, if you don't do this, `dbt build` will re-run the seeds unnecessarily and things will get messy.

2. Run a `dbt build` to build the project.
1. Run a `dbt build` to build the project.

### 🏁 Checkpoint

Expand Down Expand Up @@ -199,12 +193,14 @@ There are two ways to work with a larger dataset than the default one year of da

To load the data from S3, consult the [dbt Documentation's Quickstart Guides](https://docs.getdbt.com/guides) for your data platform to see how to copy data from an S3 bucket to your warehouse. The S3 bucket URIs of the tables you want to copy into your `raw` schema are:

- `raw_customers`: `s3://jaffle-shop-raw/raw_customers.csv`
- `raw_orders`: `s3://jaffle-shop-raw/raw_orders.csv`
- `raw_order_items`: `s3://jaffle-shop-raw/raw_order_items.csv`
- `raw_products`: `s3://jaffle-shop-raw/raw_products.csv`
- `raw_supplies`: `s3://jaffle-shop-raw/raw_supplies.csv`
- `raw_stores`: `s3://jaffle-shop-raw/raw_stores.csv`
| table name | S3 URI | Direct Download Link | Schema |
|-------------------|------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| `raw_customers` | `s3://dbt-tutorial-public/long_term_dataset/raw_customers.csv` | [Download](https://dbt-tutorial-public.s3.us-west-2.amazonaws.com/long_term_dataset/raw_customers.csv) | `(id text, name text)` |
| `raw_orders` | `s3://dbt-tutorial-public/long_term_dataset/raw_orders.csv` | [Download](https://dbt-tutorial-public.s3.us-west-2.amazonaws.com/long_term_dataset/raw_orders.csv) | `(id text, customer text, ordered_at datetime, store_id text, subtotal int, tax_paid int, order_total int)` |
| `raw_order_items` | `s3://dbt-tutorial-public/long_term_dataset/raw_order_items.csv` | [Download](https://dbt-tutorial-public.s3.us-west-2.amazonaws.com/long_term_dataset/raw_order_items.csv) | `(id text, order_id text, sku text)` |
| `raw_products` | `s3://dbt-tutorial-public/long_term_dataset/raw_products.csv` | [Download](https://dbt-tutorial-public.s3.us-west-2.amazonaws.com/long_term_dataset/raw_products.csv) | `(sku text, name text, type text, price int, description text)` |
| `raw_supplies` | `s3://dbt-tutorial-public/long_term_dataset/raw_supplies.csv` | [Download](https://dbt-tutorial-public.s3.us-west-2.amazonaws.com/long_term_dataset/raw_supplies.csv) | `(id text, name text, cost int, perishable boolean, sku text)` |
| `raw_stores` | `s3://dbt-tutorial-public/long_term_dataset/raw_stores.csv` | [Download](https://dbt-tutorial-public.s3.us-west-2.amazonaws.com/long_term_dataset/raw_stores.csv) | `(id text, name text, opened_at datetime, tax_rate float)` |

#### 🌱 Generate via `jafgen` and seed the data with dbt Core

Expand Down Expand Up @@ -239,24 +235,28 @@ task install DB=[name of warehouse] # e.g. task install DB=bigquery
> [!NOTE]
> Because you have an active virtual environment, this new install of `dbt` should take precedence in your [`$PATH`]($PATH`). If you're not familiar with the `PATH` environment variable, just think of this as the order in which your computer looks for commands to run. What's important is that it will look in your active virtual environment first, so when you run `dbt`, it will use the `dbt` you just installed in your virtual environment.

5. Add `jaffle-data` to your `seed-paths` config in your `dbt-project.yml` as [detailed here](#-load-the-data), then run `jafgen` and `seed` the data it generates.
5. Run `jafgen` and `seed` the data it generates.

To generate 6 years of data:

```bash
jafgen [number of years to generate] # e.g. jafgen 6
dbt seed
jafgen 6
rm -rf seeds/jaffle-data
mv jaffle-data seeds
dbt seed --full-refresh --vars '{"load_source_data": true}'
```

**OR**

```bash
task gen YEARS=[integer of years to generate] # e.g. task gen YEARS=6
task gen YEARS=6
task seed
```

6. Remove the `jaffle-data` folder, then uninstall the temporary dbt Core installation. Again, this was to allow you to seed the large data files, you don't need it for the rest of the project which will use the dbt Cloud CLI. You can then delete your `profiles.yml` file and the configuration in your `dbt_project.yml` file. You should also delete the `jaffle-data` path from the `seed-paths` list in your `dbt_project.yml`.
6. Remove the `jaffle-data` folder, then uninstall the temporary dbt Core installation. Again, this was to allow you to seed the large data files, you don't need it for the rest of the project which will use the dbt Cloud CLI. You can then delete your `profiles.yml` file and the configuration in your `dbt_project.yml` file. You should also delete the `jaffle-data` path from the `seeds:` config in your `dbt_project.yml`.

```bash
rm -rf jaffle-data
rm -rf seeds/jaffle-data
python3 -m pip uninstall dbt-core dbt-[your warehouse adapter] # e.g. dbt-bigquery
```

Expand Down
5 changes: 3 additions & 2 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,12 @@ tasks:

gen:
cmds:
- source .venv/bin/activate && jafgen {{.YEARS}}
- source .venv/bin/activate && jafgen {{.YEARS}} && rm -rf seeds/jaffle-data && mv jaffle-data seeds

seed:
cmds:
- source .venv/bin/activate && dbt seed
- >
source .venv/bin/activate && dbt seed --full-refresh --vars '{"load_source_data": true}'

clean:
cmds:
Expand Down
6 changes: 4 additions & 2 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ test-paths: ["data-tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
- "target"
- "dbt_packages"
Expand All @@ -27,10 +25,14 @@ vars:
seeds:
jaffle_shop:
+schema: raw
jaffle-data:
+enabled: "{{ var('load_source_data', false) }}"

models:
jaffle_shop:
staging:
+materialized: view
marts:
+materialized: table
flags:
require_generic_test_arguments_property: true
6 changes: 4 additions & 2 deletions models/marts/customers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ models:
description: Customer overview data mart, offering key details for each unique customer. One row per customer.
data_tests:
- dbt_utils.expression_is_true:
expression: "lifetime_spend_pretax + lifetime_tax_paid = lifetime_spend"
arguments:
expression: "lifetime_spend_pretax + lifetime_tax_paid = lifetime_spend"
columns:
- name: customer_id
description: The unique key of the orders mart.
Expand All @@ -28,7 +29,8 @@ models:
description: Options are 'new' or 'returning', indicating if a customer has ordered more than once or has only placed their first order to date.
data_tests:
- accepted_values:
values: ["new", "returning"]
arguments:
values: ["new", "returning"]

semantic_models:
- name: customers
Expand Down
6 changes: 3 additions & 3 deletions models/marts/order_items.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ models:
- name: order_id
data_tests:
- relationships:
to: ref('orders')
field: order_id

arguments:
to: ref('orders')
field: order_id
unit_tests:
- name: test_supply_costs_sum_correctly
description: "Test that the counts of drinks and food orders convert to booleans properly."
Expand Down
11 changes: 7 additions & 4 deletions models/marts/orders.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ models:
description: Order overview data mart, offering key details for each order inlcluding if it's a customer's first order and a food vs. drink item breakdown. One row per order.
data_tests:
- dbt_utils.expression_is_true:
expression: "order_items_subtotal = subtotal"
arguments:
expression: "order_items_subtotal = subtotal"
- dbt_utils.expression_is_true:
expression: "order_total = subtotal + tax_paid"
arguments:
expression: "order_total = subtotal + tax_paid"
columns:
- name: order_id
description: The unique key of the orders mart.
Expand All @@ -16,8 +18,9 @@ models:
description: The foreign key relating to the customer who placed the order.
data_tests:
- relationships:
to: ref('stg_customers')
field: customer_id
arguments:
to: ref('stg_customers')
field: customer_id
- name: order_total
description: The total amount of the order in USD including tax.
- name: ordered_at
Expand Down
10 changes: 4 additions & 6 deletions models/staging/__sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,18 @@ sources:
- name: ecom
schema: raw
description: E-commerce data for the Jaffle Shop
freshness:
warn_after:
count: 24
period: hour
tables:
- name: raw_customers
description: One record per person who has purchased one or more items
- name: raw_orders
description: One record per order (consisting of one or more order items)
loaded_at_field: ordered_at
config:
loaded_at_field: ordered_at
- name: raw_items
description: Items included in an order
- name: raw_stores
loaded_at_field: opened_at
config:
loaded_at_field: opened_at
- name: raw_products
description: One record per SKU for items sold in stores
- name: raw_supplies
Expand Down
5 changes: 3 additions & 2 deletions models/staging/stg_order_items.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ models:
data_tests:
- not_null
- relationships:
to: ref('stg_orders')
field: order_id
arguments:
to: ref('stg_orders')
field: order_id
3 changes: 2 additions & 1 deletion models/staging/stg_orders.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ models:
description: Order data with basic cleaning and transformation applied, one row per order.
data_tests:
- dbt_utils.expression_is_true:
expression: "order_total - tax_paid = subtotal"
arguments:
expression: "order_total - tax_paid = subtotal"
columns:
- name: order_id
description: The unique key for each order.
Expand Down
19 changes: 12 additions & 7 deletions package-lock.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: calogica/dbt_date
version: 0.10.0
- package: dbt-labs/audit_helper
version: 0.12.0
sha1_hash: 974021e878d1894c35a21fb44fb0b6bd04f07078
- git: https://github.com/dbt-labs/dbt-audit-helper.git
name: audit_helper
revision: |
014af559efd9dd1fa6551cb348fcc6d4def9a1e7
__unrendered__: {}
- package: godatadriven/dbt_date
name: dbt_date
version: 0.10.0
- package: dbt-labs/dbt_utils
name: dbt_utils
version: 1.1.1
sha1_hash: dd20c07a28bff7dd8db16c5a8a3355aec8fc89a9
2 changes: 1 addition & 1 deletion packages.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: calogica/dbt_date
- package: godatadriven/dbt_date
version: 0.10.0
- git: "https://github.com/dbt-labs/dbt-audit-helper.git"
revision: main
File renamed without changes.
File renamed without changes.
Loading
Loading