new macro enhancements related to audit_helper.compare_and_classify_query_results, based on a threshold parameter #119

ddumasdd · 2025-01-17T17:01:53Z

Describe the feature

A clear and concise description of what you want to happen.
audit_helper.compare_and_classify_query_results is great, but analyzing the diifs is very very time consuming

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.
This query below gives an example of what I do to get the "diffs that I care about, above a threshold". If you could build a macro that does this, HUGE benefit.

The below is done on a model (table = theResults) materialized from audit_helper.compare_and_classify_query_results

--- get the rows not identical. This will be the driver cte at the bottom
WITH CTE_ AS (
SELECT * FROM theResults
WHERE
DBT_AUDIT_ROW_STATUS <> 'identical'
ORDER BY dbt_audit_surrogate_key
),

--- get the differing hashes on the primary key
cte_min_max AS
(
SELECT
dbt_audit_surrogate_key,
min(DBt_AUDIT_ROW_HASH) AS min_DBt_AUDIT_ROW_HASH,
max(DBt_AUDIT_ROW_HASH) AS max_DBt_AUDIT_ROW_HASH
FROM CTE_
GROUP BY all
)

SELECT
cte_min_max.Dbt_audit_surrogate_key

 -- these are the measures that I needed for my example... they are the same used in audit_helper.compare_and_classify_query_results
 
,cte_1.CONVERTED_AMOUNT AS CONVERTED_AMOUNT1
,cte_2.CONVERTED_AMOUNT AS CONVERTED_AMOUNT2	
,(nvl(cte_1.CONVERTED_AMOUNT,0) - nvl(cte_2.CONVERTED_AMOUNT,0))::int AS CONVERTED_AMOUNT_diff

,cte_1.CONVERTED_AMOUNT_DEBIT AS CONVERTED_AMOUNT_DEBIT1
,cte_2.CONVERTED_AMOUNT_DEBIT AS CONVERTED_AMOUNT_DEBIT2	
,(nvl(cte_1.CONVERTED_AMOUNT_DEBIT,0) - nvl(cte_2.CONVERTED_AMOUNT_DEBIT,0))::int AS CONVERTED_AMOUNT_DEBIT_diff	

,cte_1.CONVERTED_AMOUNT_CREDIT AS CONVERTED_AMOUNT_CREDIT1
,cte_2.CONVERTED_AMOUNT_CREDIT AS CONVERTED_AMOUNT_CREDIT2	
,(nvl(cte_1.CONVERTED_AMOUNT_CREDIT,0) - nvl(cte_2.CONVERTED_AMOUNT_CREDIT,0))::int AS CONVERTED_AMOUNT_CREDIT_diff

FROM
cte_min_max
LEFT JOIN CTE_ AS cte_1 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.min_DBt_AUDIT_ROW_HASH = cte_1.DBt_AUDIT_ROW_HASH)
LEFT JOIN CTE_ AS cte_2 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.max_DBt_AUDIT_ROW_HASH = cte_2.DBt_AUDIT_ROW_HASH)
WHERE
--ignore identical hashed rows
cte_min_max.min_DBt_AUDIT_ROW_HASH <> cte_min_max.max_DBt_AUDIT_ROW_HASH
--just show the diffs > than a threshold. this could be a parameter for the macro
AND
(
abs(CONVERTED_AMOUNT_diff) > 1
or abs(CONVERTED_AMOUNT_DEBIT_diff) > 1
or abs(CONVERTED_AMOUNT_CREDIT_diff) > 1
)
ORDER BY cte_min_max.Dbt_audit_surrogate_key

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
NO, will work on all databases

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
This will benefit everyone using audit_helper.compare_and_classify_query_results

Are you interested in contributing this feature? No - I am not a python person

The text was updated successfully, but these errors were encountered:

ddumasdd added enhancement New feature or request triage labels Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new macro enhancements related to audit_helper.compare_and_classify_query_results, based on a threshold parameter #119

new macro enhancements related to audit_helper.compare_and_classify_query_results, based on a threshold parameter #119

ddumasdd commented Jan 17, 2025 •

edited

Loading

new macro enhancements related to audit_helper.compare_and_classify_query_results, based on a threshold parameter #119

new macro enhancements related to audit_helper.compare_and_classify_query_results, based on a threshold parameter #119

Comments

ddumasdd commented Jan 17, 2025 • edited Loading

Describe the feature

Describe alternatives you've considered

--- get the differing hashes on the primary key cte_min_max AS ( SELECT dbt_audit_surrogate_key, min(DBt_AUDIT_ROW_HASH) AS min_DBt_AUDIT_ROW_HASH, max(DBt_AUDIT_ROW_HASH) AS max_DBt_AUDIT_ROW_HASH FROM CTE_ GROUP BY all )

Additional context

Who will this benefit?

Are you interested in contributing this feature? No - I am not a python person

ddumasdd commented Jan 17, 2025 •

edited

Loading

--- get the differing hashes on the primary key
cte_min_max AS
(
SELECT
dbt_audit_surrogate_key,
min(DBt_AUDIT_ROW_HASH) AS min_DBt_AUDIT_ROW_HASH,
max(DBt_AUDIT_ROW_HASH) AS max_DBt_AUDIT_ROW_HASH
FROM CTE_
GROUP BY all
)