- Set up new Mage project
- Set up DBT project
- Create standard (batch) pipeline
- Create DBT profile for database connections
- Add data loader block to pipeline
- Add DBT model block to pipeline
- Add test for DBT model
- Execute pipeline end-to-end
Read the setup guide to initialize a new project and start the Mage tool locally.
For the rest of this tutorial, we’ll use the project name demo_project
.
-
Open Mage and go to the terminal page: http://localhost:6789/terminal
-
Add
dbt-postgres
to your project’s dependencies file (requirements.txt
) by typing the following into the terminal in your browser:echo dbt-postgres > demo_project/requirements.txt
-
Install your project’s dependencies using
pip
by typing the following:pip install -r demo_project/requirements.txt
For more information on installing DBT, read their documentation.
-
Open Mage and go to the terminal page: http://localhost:6789/terminal
-
Initiate your DBT project using the
init
command (for this tutorial, we’ll use the DBT project namedemo
):cd demo_project/dbt dbt init demo touch demo/profiles.yml
For more information on creating a DBT project, read their documentation.
- Go to the Mage dashboard and click the button
+ New pipeline
and select the option labeledStandard (batch)
. - Near the top of the page, click the pipeline name and change it to
dbt demo pipeline
.
- On the left side of the page in the file browser, expand the folder
demo_project/dbt/demo/
. - Click the file named
profiles.yml
. - Paste the following credentials in that file:
demo: target: dev outputs: dev: dbname: mage/demo2 host: db.bit.io password: v2_3upzD_eMSdiu5AMjgzSbi3K7KTAuE port: 5432 schema: dbt_demo type: postgres user: mage
- Save the
profiles.yml
file by pressingCommand (⌘)
+S
. - Close the file by pressing the
X
button on the right side of the file namedbt/demo/profiles.yml
.
- Click the
+ Data loader
button, selectPython
, then clickAPI
. - At the top of the block, on the right of
DATA LOADER
, click the name of the block. - Change the name to
load data
. - Paste the following code in that block:
import io import pandas as pd import requests from pandas import DataFrame @data_loader def load_data_from_api(**kwargs) -> DataFrame: url = 'https://raw.githubusercontent.com/mage-ai/datasets/master/restaurant_user_transactions.csv' response = requests.get(url) return pd.read_csv(io.StringIO(response.text), sep=',')
- Under the data loader block you just added, click the button
DBT model
, then click the optionSingle model
. - In the file browser that pops up, click the file named
my_second_dbt_model.sql
under the foldersdemo/models/example/
.- This will add 2 DBT blocks to your pipeline: 1 for the DBT model named
my_first_dbt_model
and the 2nd for the DBT model namedmy_second_dbt_model
. - The model named
my_first_dbt_model
was added to the pipeline becausemy_second_dbt_model
references it.
- This will add 2 DBT blocks to your pipeline: 1 for the DBT model named
- In the DBT block named
my_first_dbt_model
, next to the labelDBT profile target
at the top is an input field, enterdev
. - Paste the following SQL into the DBT model named
my_first_dbt_model
:WITH source_data AS ( SELECT 1 AS id UNION ALL SELECT 2 AS id ) SELECT * FROM source_data
- Run the DBT model block by pressing the play button on the top right of the block or
by pressing
Command
+Enter
. - You should see a preview of the query results.
-
In the DBT block named
my_second_dbt_model
, next to the labelDBT profile target
at the top is an input field, enterdev
. -
Paste the following SQL into the DBT model named
my_second_dbt_model
:SELECT a.* , b.* FROM {{ ref('my_first_dbt_model') }} AS a LEFT JOIN {{ source('mage_demo', 'dbt_demo_pipeline_load_data') }} AS b ON 1 = 1 WHERE a.id = 1
When a DBT model depends on an upstream block that isn’t a DBT model, a source for that block is automatically added to the
demo_project/dbt/demo/models/example/mage_sources.yml
file.Read more about DBT sources in their documentation.
-
Run the DBT model block by pressing the play button on the top right of the block or by pressing
Command
+Enter
. -
You should see a preview of the query results.
- On the right side of the screen, click the tab labeled
Terminal
. - Create a new DBT test file by running the following command:
touch demo_project/dbt/demo/tests/test_my_second_dbt_model.sql
- On the left side of the page in the file browser,
expand the folder
demo_project/dbt/demo/tests/
and click the file namedtest_my_second_dbt_model.sql
. If you don’t see it, refresh the page. - Paste the following SQL in the file:
SELECT id FROM {{ ref('my_second_dbt_model') }} GROUP BY id HAVING (id = 0)
- Read more about DBT tests in their documentation.
- Click the name of the pipeline in the header breadcrumbs to go back to the detail page.
- Create a new trigger (you can use any interval you want for this tutorial). For more details, follow these steps.
- After your trigger is created, click the
Start trigger
button at the top of the page. - The pipeline will eventually fail because a DBT test failed. This means everything is working as expected.
- Open the file
demo_project/dbt/demo/models/example/schema.yml
and remove the tests namedunique
under both models. Your file should look like this:version: 2 models: - name: my_first_dbt_model description: "A starter dbt model" columns: - name: id description: "The primary key for this table" tests: - not_null - name: my_second_dbt_model description: "A starter dbt model" columns: - name: id description: "The primary key for this table" tests: - not_null
- Click on the
Failed
button next to the pipeline run and clickRetry run
. It should complete running successfully after a few minutes.
Congratulations! You’ve created a data pipeline that orchestrates your DBT models.
If you get stuck, run into problems, or just want someone to walk you through these steps, please join our
Slack
and someone will help you ASAP.