Skip to content

Commit b4e4111

Browse files
author
Hugo Rialan
committed
Initial commit
0 parents  commit b4e4111

17 files changed

+557
-0
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* @hrialan
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: Publish package to GitHub Packages
2+
on:
3+
release:
4+
types: [published]
5+
jobs:
6+
build:
7+
runs-on: ubuntu-latest
8+
permissions:
9+
contents: read
10+
packages: write
11+
steps:
12+
- uses: actions/checkout@v4
13+
- uses: actions/setup-node@v4
14+
with:
15+
node-version: '20.x'
16+
registry-url: 'https://npm.pkg.github.com'
17+
scope: '@hrialan'
18+
- run: npm ci
19+
- run: npm publish
20+
env:
21+
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
2+
.df-credentials.json
3+
node_modules/

.pre-commit-config.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.4.0
4+
hooks:
5+
- id: trailing-whitespace
6+
- id: end-of-file-fixer
7+
- id: fix-byte-order-marker
8+
- id: check-merge-conflict
9+
- id: check-json
10+
- id: check-yaml
11+
- id: check-added-large-files
12+
- id: no-commit-to-branch
13+
args: [--branch, main]
14+
- repo: https://github.com/devoteamgcloud/pre-commit-dataform
15+
rev: v1.0.2
16+
hooks:
17+
- id: dataform_format
18+

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Devoteam G Cloud
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Dataform Assertions
2+
3+
This Dataform package provides a set of assertions for testing the data in your warehouse. It includes assertions for data freshness, unique keys, row conditions, and data completeness.
4+
5+
Contributions are welcome! If you have an idea for a new assertion, please open an issue or submit a pull request.
6+
7+
## Contents
8+
9+
- [Installation](#installation)
10+
- [Usage](#usage)
11+
- [Available assertions](#available-assertions)
12+
- [License](#license)
13+
14+
## Installation
15+
16+
Follow the instructions in the [Dataform documentation](https://cloud.google.com/dataform/docs/install-package) to install this package. Here is a quick summary:
17+
18+
1. In the `package.json`file dependencies, add the following line:
19+
```json
20+
"dataform-assertions": "https://github.com/devoteamgcloud/dataform-assertions/archive/refs/tags/[RELEASE_VERSION].tar.gz"
21+
```
22+
2. Click on `Install packages` in the Dataform web UI or use the `dataform install` CLI command in the terminal.
23+
3. You are ready to go!
24+
25+
## Usage
26+
27+
Create a js file in the `/definitions` folder of your Dataform project and add the following code with the desired parameters:
28+
29+
```javascript
30+
const commonAssertions = require("dataform-assertions");
31+
32+
const commonAssertionsResult = commonAssertions({
33+
globalAssertionsParams: {
34+
// If not provided, the default values will be used
35+
"database": "your-database",
36+
"schema": "your-schema",
37+
"location": "your-location",
38+
"tags": ["your-tags"],
39+
"disabledInEnvs": ["your-disabled-environments"]
40+
},
41+
rowConditions: {
42+
"your-table": {
43+
"your-condition": "your-SQL-condition"
44+
}
45+
}
46+
});
47+
```
48+
49+
You can find a more complete example in [`definitions/example.js`](./definitions/example.js).
50+
51+
52+
## Available assertions
53+
54+
This package includes the following types of assertions:
55+
56+
- **Row conditions**: Check if the rows in a table satisfy a given SQL condition.
57+
- **Unique key conditions**: Check if a given primary key (can be a set of columns) is not duplicated in a table.
58+
- **Data freshness conditions**: Check if the data in a table is fresh enough given some conditions.
59+
- **Data completeness conditions**: Check if the data in a column have less than a given percentage of null values.
60+
61+
## Supported warehouses
62+
63+
This package has been tested with BigQuery. It has not been tested with other warehouses.
64+
65+
# License
66+
67+
This project is licensed under the MIT License. See the LICENSE file for details.

dataform.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"warehouse": "bigquery",
3+
"defaultSchema": "dataform",
4+
"assertionSchema": "dataform_assertions",
5+
"defaultDatabase": "sandbox-hrialan",
6+
"defaultLocation": "EU",
7+
"vars":{
8+
"env":"dv"
9+
}
10+
}

definitions/example.js

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
const commonAssertions = require("../index");
2+
3+
const commonAssertionsResult = commonAssertions({
4+
globalAssertionsParams: {
5+
"database": "sandbox-hrialan",
6+
"schema": "assertions_" + dataform.projectConfig.vars.env,
7+
"location": "EU",
8+
"tags": ["assertions"],
9+
// Sometimes data quality is not good in some environments,
10+
// so we can disable the assertions in those environments.
11+
// "disabledInEnvs": ["dv", "qa"]
12+
},
13+
rowConditions: {
14+
"first_table": {
15+
"id_not_null": "id IS NOT NULL",
16+
"id_strict_positive": "id > 0"
17+
},
18+
"second_table": {
19+
"id_not_null": "id IS NOT NULL"
20+
}
21+
},
22+
uniqueKeyConditions: {
23+
"first_table": ["id"],
24+
"second_table": ["id"]
25+
},
26+
dataFreshnessConditions: {
27+
"first_table": {
28+
"dateColumn": "updated_date",
29+
"timeUnit": "DAY",
30+
"delayCondition": 1,
31+
},
32+
"second_table": {
33+
"dateColumn": "updated_date",
34+
"timeUnit": "MONTH",
35+
"delayCondition": 3,
36+
}
37+
},
38+
dataCompletenessConditions: {
39+
"first_table": {
40+
// Format: "column": allowedPercentageNull
41+
"updated_date": 1, // 1% of null values allowed in the updated_date column
42+
"id": 20
43+
},
44+
"second_table": {
45+
"id": 30
46+
}
47+
}
48+
});
49+
50+
/*
51+
* ASSERTIONS AUDIT TABLE EXAMPLE
52+
* The following code snippet is used to publish the results of the created assertions in a table for audit purposes.
53+
* The result is a table with the following columns:
54+
* | assertion_name | assertion_type |
55+
* |----------------|----------------|
56+
* | id_not_null | row_condition |
57+
* | ... | ... |
58+
*/
59+
60+
let selectClauses = [];
61+
62+
for (const key in commonAssertionsResult) {
63+
if (commonAssertionsResult.hasOwnProperty(key)) {
64+
const commonAssertionsResultForKey = commonAssertionsResult[key];
65+
if (commonAssertionsResultForKey.length > 0) {
66+
const selectClause = commonAssertionsResultForKey.map(assertion => {
67+
return `SELECT "${assertion.proto.target.name}" AS assertion_name, '${key}' AS assertion_type`;
68+
}).join("\n UNION ALL \n");
69+
70+
selectClauses.push(selectClause);
71+
}
72+
}
73+
}
74+
75+
const sqlQuery = selectClauses.join("\n UNION ALL \n");
76+
77+
publish("assertions_audit", {
78+
type: "table"
79+
}).query(
80+
(ctx) => sqlQuery
81+
);

definitions/first_table.sqlx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
config {
2+
type: "table"
3+
}
4+
5+
SELECT
6+
1 AS id,
7+
CURRENT_DATE() AS updated_date
8+
UNION ALL
9+
SELECT
10+
2 AS id,
11+
CURRENT_DATE() AS updated_date

definitions/second_table.sqlx

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
config {
2+
type: "table"
3+
}
4+
5+
SELECT
6+
NULL AS id,
7+
DATE(1970, 1, 1) AS updated_date
8+
UNION ALL
9+
SELECT
10+
2 AS id,
11+
DATE(1970, 1, 1) AS updated_date
12+
UNION ALL
13+
SELECT
14+
NULL AS id,
15+
DATE(1970, 1, 1) AS updated_date
16+
UNION ALL
17+
SELECT
18+
NULL AS id,
19+
DATE(1970, 1, 1) AS updated_date
20+
UNION ALL
21+
SELECT
22+
3 AS id,
23+
NULL AS updated_date
24+
UNION ALL
25+
SELECT
26+
NULL AS id,
27+
DATE(1970, 1, 1) AS updated_date

0 commit comments

Comments
 (0)