Extended Audit Helper solution 💪
This repository provides a collection of powerful macros designed to enhance data validation workflows that support:
- Historical Logging: Automatically saving detailed validation results into a designated DWH table for comprehensive audit tracking
- Latest Summary Reporting: Maintaining a concise, up-to-date summary table for quick insights into the current state of validations
- Codegen and Scripts: Simplifying workflows, particularly valuable for migration projects by automating repetitive tasks
Data Warehouses:
- ❄️ Snowflake (default)
- ☁️ BigQuery
-
Add to
packages.yml
file:packages: - package: infinitelambda/audit_helper_ext version: [">=0.1.0", "<1.0.0"] # keep an eye on the latest version, and change it accordingly
Or use the latest version from git:
packages: - git: "https://github.com/infinitelambda/dbt-audit-helper-ext" revision: <release version or tag> # 0.1.0
And run
dbt deps
to install the package! -
Initialize the resources:
dbt deps dbt run -s audit_helper_ext
This step will create log table (
validation_log
) and the summary view on top (validation_log_report
) -
Generate the validation macros:
Check
/scripts
directory for all the codegen utilitiesFirstly, we need to determine the location (database and schema) of the source tables:
** If all source tables are in the same location, we can use the environment variable to set these values:
export SOURCE_SCHEMA=MY_SOURCE_SCHEMA export SOURCE_DATABASE=MY_SOURCE_DATABASE
** If having multiple locations, we can start to configure the location inside each dbt models'
config
block:{{ config( ... audit_helper__source_database = 'MY_SOURCE_SCHEMA', audit_helper__source_schema = 'MY_SOURCE_DATABASE' ) }} ...
Then, we can start generating the validation macro files now. Let's say we need to validate all models in
03_mart
directory:python dbt_packages/audit_helper_ext/scripts/create_validation_macros.py models/03_mart
Or just aim to validation a specific model which is
03_mart/dim_sales
:python dbt_packages/audit_helper_ext/scripts/create_validation_macros.py \ models/03_mart \ dim_sales
Finally, check out your dbt project at the directory named
macros/validation
!
This repo contains the useful macros to support for saving the historical validation results into the DWH table (validation_log
), together with the latest summary table (validation_log_report
).
There are 3 main types of validation:
Additionally, we have the 4th type - upstream_row_count
(source) which will be very useful to understand better the validtion context, for example, the result might be up to 100% matched rate but there is 0 updates in the upstream models, hence there no updates in the final table, that means we can't not say surely it was a perfect match.
Depending on projects, it might be vary in the strategy of validation. Therefore, in this package, we're suggesting 1 first approach that we've used successfully in the real-life migration project (Informatica to dbt).
Context: Our dbt project has 3 layers (staging, intermediate, and mart). Each mart model will have the independant set of upstream models, or it is the isolated pipeline for each mart model. We want to validate mart models only.
Goal: 100% matched rate ✅, >=99% is still good 🟡, and below 99% is unacceptable ❌
Pre-requisites: 2 consecutive snapshots (e.g. Day1, Day2) of both source data and mart tables
Flow:
- Freeze the source data, so we have
source__YYYYMMD1
andsource__YYYYMMD2
,mart__YYYYMMD1
andmart__YYYYMMD2
- Scenario 1: Validate the fresh run against D1
- Configure source yml to use
source__YYYYMMD1
- Run dbt to build mart tables, callled
mart_dbt
- Run validation macros to compare between
mart_dbt
vsmart__YYYYMMD1
👍
- Configure source yml to use
- Scenario 2: Validate the incremental run against D2 based on D1
- Configure source yml to use
source__YYYYMMD2
- Clone
mart__YYYYMMD1
tomart_dbt
to mimic that dbt should have the D1 data already (e.g. clone_relation) - Run incrementally dbt to build mart tables
- Run validation macros to compare between
mart_dbt
vsmart__YYYYMMD2
👍👍
- Configure source yml to use
Finnally, check the validation log report, and decide what to do next steps:
🛩️ Sample report table on Snowflake:
💡 Optionally, let's build the Sheet to communicate the outcome with clent, here is the BigQuery+GGSheet sample:
dbt-audit-helper-ext
is an open-source dbt package. Whether you are a seasoned open-source contributor or a first-time committer, we welcome and encourage you to contribute code, documentation, ideas, or problem statements to this project.
👉 See CONTRIBUTING guideline
🌟 And finally, kudos to our beloved OG Contributors who orginally developed the macros and scripts in this package: @William, @Duc, @Csabi, @Adrien & @Dat
Infinite Lambda is a cloud and data consultancy. We build strategies, help organizations implement them, and pass on the expertise to look after the infrastructure.
We are an Elite Snowflake Partner, a Platinum dbt Partner, and a two-time Fivetran Innovation Partner of the Year for EMEA.
Naturally, we love exploring innovative solutions and sharing knowledge, so go ahead and:
🔧 Take a look around our Git
✏️ Browse our tech blog
We are also chatty, so:
👀 Follow us on LinkedIn
👋🏼 Or just get in touch