Skip to content

Commit

Permalink
Document group by example for Soda Core with failed rows check (#1984)
Browse files Browse the repository at this point in the history
* Added group by example for Soda Core with failed rows check

* Adjusted SQL per suggestion

* Corrected SQL
  • Loading branch information
janet-can authored Jan 5, 2024
1 parent 2bde90c commit c3c9521
Show file tree
Hide file tree
Showing 5 changed files with 42 additions and 9 deletions.
Binary file added docs/assets/images/group-by-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/group-by-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/group-by-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions examples/group-by.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Group check results by category with Soda Core

You can use a SQL query in a failed row check to group failed check results by one or more categories using Soda Core.

Use a SQL editor to build and test a SQL query with your data source, then add the query to a failed rows check to execute it during a Soda scan.

The following example illustrates how to build a query that identifies the countries where the average age of people is less than 25.

1. Begining with a basic query, the output shows the data this example works with.
```sql
SELECT * FROM Customers;
```
![group-by-1](/docs/assets/images/group-by-1.png){:height="600px" width="600px"}
2. Build a query to select groups with the relevant aggregations.
```sql
SELECT country, AVG(age) as avg_age
FROM Customers
GROUP BY country
```
![group-by-2](/docs/assets/images/group-by-2.png){:height="600px" width="600px"}
3. Identify the "bad" group (where the average age is less than 25) from among the grouped results.
```sql
SELECT country, AVG(age) as avg_age
FROM Customers
GROUP BY country
HAVING AVG(age) < 25
```
![group-by-3](/docs/assets/images/group-by-3.png){:height="600px" width="600px"}
4. Now that the query yields the expected results, add the query to a failed row check, as per the following example.
```yaml
checks for dim_customers:
- failed rows:
name: Average age of citizens is less than 25
fail query: |
SELECT country, AVG(age) as avg_age
FROM Customers
GROUP BY country
HAVING AVG(age) < 25
```
12 changes: 3 additions & 9 deletions examples/postgres_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,21 +71,15 @@ checks for dim_customer:
name: No duplicate phone numbers
- freshness(date_first_purchase) < 7d:
name: Data in this dataset is less than 7 days old
- schema:
warn:
when schema changes: any
name: Columns have not been added, removed, or changed
EOT
# run the scan!
# run the scan
soda scan -d adventureworks -c configuration.yml checks.yml
# note that an error is thrown for one test, as change-over-time checks
# require you to connect to Soda Cloud
```






1 comment on commit c3c9521

@franciscojmfo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have one check result for each failed group having the fields of the group by clause informed as attributes or similar.

Please sign in to comment.