Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new macro enhancements related to audit_helper.compare_and_classify_query_results, based on a threshold parameter #119

Open
ddumasdd opened this issue Jan 17, 2025 · 0 comments
Labels
enhancement New feature or request triage

Comments

@ddumasdd
Copy link

ddumasdd commented Jan 17, 2025

Describe the feature

A clear and concise description of what you want to happen.
audit_helper.compare_and_classify_query_results is great, but analyzing the diifs is very very time consuming

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.
This query below gives an example of what I do to get the "diffs that I care about, above a threshold". If you could build a macro that does this, HUGE benefit.

The below is done on a model (table = theResults) materialized from audit_helper.compare_and_classify_query_results

--- get the rows not identical. This will be the driver cte at the bottom
WITH CTE_ AS (
SELECT * FROM theResults
WHERE
DBT_AUDIT_ROW_STATUS <> 'identical'
ORDER BY dbt_audit_surrogate_key
),

--- get the differing hashes on the primary key
cte_min_max AS
(
SELECT
dbt_audit_surrogate_key,
min(DBt_AUDIT_ROW_HASH) AS min_DBt_AUDIT_ROW_HASH,
max(DBt_AUDIT_ROW_HASH) AS max_DBt_AUDIT_ROW_HASH
FROM CTE_
GROUP BY all
)

SELECT
cte_min_max.Dbt_audit_surrogate_key

 -- these are the measures that I needed for my example... they are the same used in audit_helper.compare_and_classify_query_results
 
,cte_1.CONVERTED_AMOUNT AS CONVERTED_AMOUNT1
,cte_2.CONVERTED_AMOUNT AS CONVERTED_AMOUNT2	
,(nvl(cte_1.CONVERTED_AMOUNT,0) - nvl(cte_2.CONVERTED_AMOUNT,0))::int AS CONVERTED_AMOUNT_diff

,cte_1.CONVERTED_AMOUNT_DEBIT AS CONVERTED_AMOUNT_DEBIT1
,cte_2.CONVERTED_AMOUNT_DEBIT AS CONVERTED_AMOUNT_DEBIT2	
,(nvl(cte_1.CONVERTED_AMOUNT_DEBIT,0) - nvl(cte_2.CONVERTED_AMOUNT_DEBIT,0))::int AS CONVERTED_AMOUNT_DEBIT_diff	

,cte_1.CONVERTED_AMOUNT_CREDIT AS CONVERTED_AMOUNT_CREDIT1
,cte_2.CONVERTED_AMOUNT_CREDIT AS CONVERTED_AMOUNT_CREDIT2	
,(nvl(cte_1.CONVERTED_AMOUNT_CREDIT,0) - nvl(cte_2.CONVERTED_AMOUNT_CREDIT,0))::int AS CONVERTED_AMOUNT_CREDIT_diff	

FROM
cte_min_max
LEFT JOIN CTE_ AS cte_1 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.min_DBt_AUDIT_ROW_HASH = cte_1.DBt_AUDIT_ROW_HASH)
LEFT JOIN CTE_ AS cte_2 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.max_DBt_AUDIT_ROW_HASH = cte_2.DBt_AUDIT_ROW_HASH)
WHERE
--ignore identical hashed rows
cte_min_max.min_DBt_AUDIT_ROW_HASH <> cte_min_max.max_DBt_AUDIT_ROW_HASH
--just show the diffs > than a threshold. this could be a parameter for the macro
AND
(
abs(CONVERTED_AMOUNT_diff) > 1
or abs(CONVERTED_AMOUNT_DEBIT_diff) > 1
or abs(CONVERTED_AMOUNT_CREDIT_diff) > 1
)
ORDER BY cte_min_max.Dbt_audit_surrogate_key

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
NO, will work on all databases

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
This will benefit everyone using audit_helper.compare_and_classify_query_results

Are you interested in contributing this feature? No - I am not a python person

@ddumasdd ddumasdd added enhancement New feature or request triage labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant