You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A clear and concise description of what you want to happen.
audit_helper.compare_and_classify_query_results is great, but analyzing the diifs is very very time consuming
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
This query below gives an example of what I do to get the "diffs that I care about, above a threshold". If you could build a macro that does this, HUGE benefit.
The below is done on a model (table = theResults) materialized from audit_helper.compare_and_classify_query_results
--- get the rows not identical. This will be the driver cte at the bottom
WITH CTE_ AS (
SELECT * FROM theResults
WHERE
DBT_AUDIT_ROW_STATUS <> 'identical'
ORDER BY dbt_audit_surrogate_key
),
--- get the differing hashes on the primary key
cte_min_max AS
(
SELECT
dbt_audit_surrogate_key,
min(DBt_AUDIT_ROW_HASH) AS min_DBt_AUDIT_ROW_HASH,
max(DBt_AUDIT_ROW_HASH) AS max_DBt_AUDIT_ROW_HASH
FROM CTE_
GROUP BY all
)
SELECT
cte_min_max.Dbt_audit_surrogate_key
-- these are the measures that I needed for my example... they are the same used in audit_helper.compare_and_classify_query_results
,cte_1.CONVERTED_AMOUNT AS CONVERTED_AMOUNT1
,cte_2.CONVERTED_AMOUNT AS CONVERTED_AMOUNT2
,(nvl(cte_1.CONVERTED_AMOUNT,0) - nvl(cte_2.CONVERTED_AMOUNT,0))::int AS CONVERTED_AMOUNT_diff
,cte_1.CONVERTED_AMOUNT_DEBIT AS CONVERTED_AMOUNT_DEBIT1
,cte_2.CONVERTED_AMOUNT_DEBIT AS CONVERTED_AMOUNT_DEBIT2
,(nvl(cte_1.CONVERTED_AMOUNT_DEBIT,0) - nvl(cte_2.CONVERTED_AMOUNT_DEBIT,0))::int AS CONVERTED_AMOUNT_DEBIT_diff
,cte_1.CONVERTED_AMOUNT_CREDIT AS CONVERTED_AMOUNT_CREDIT1
,cte_2.CONVERTED_AMOUNT_CREDIT AS CONVERTED_AMOUNT_CREDIT2
,(nvl(cte_1.CONVERTED_AMOUNT_CREDIT,0) - nvl(cte_2.CONVERTED_AMOUNT_CREDIT,0))::int AS CONVERTED_AMOUNT_CREDIT_diff
FROM
cte_min_max
LEFT JOIN CTE_ AS cte_1 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.min_DBt_AUDIT_ROW_HASH = cte_1.DBt_AUDIT_ROW_HASH)
LEFT JOIN CTE_ AS cte_2 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.max_DBt_AUDIT_ROW_HASH = cte_2.DBt_AUDIT_ROW_HASH)
WHERE
--ignore identical hashed rows
cte_min_max.min_DBt_AUDIT_ROW_HASH <> cte_min_max.max_DBt_AUDIT_ROW_HASH
--just show the diffs > than a threshold. this could be a parameter for the macro
AND
(
abs(CONVERTED_AMOUNT_diff) > 1
or abs(CONVERTED_AMOUNT_DEBIT_diff) > 1
or abs(CONVERTED_AMOUNT_CREDIT_diff) > 1
)
ORDER BY cte_min_max.Dbt_audit_surrogate_key
Additional context
Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
NO, will work on all databases
Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
This will benefit everyone using audit_helper.compare_and_classify_query_results
Are you interested in contributing this feature? No - I am not a python person
The text was updated successfully, but these errors were encountered:
Describe the feature
A clear and concise description of what you want to happen.
audit_helper.compare_and_classify_query_results is great, but analyzing the diifs is very very time consuming
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
This query below gives an example of what I do to get the "diffs that I care about, above a threshold". If you could build a macro that does this, HUGE benefit.
The below is done on a model (table = theResults) materialized from audit_helper.compare_and_classify_query_results
--- get the rows not identical. This will be the driver cte at the bottom
WITH CTE_ AS (
SELECT * FROM theResults
WHERE
DBT_AUDIT_ROW_STATUS <> 'identical'
ORDER BY dbt_audit_surrogate_key
),
--- get the differing hashes on the primary key
cte_min_max AS
(
SELECT
dbt_audit_surrogate_key,
min(DBt_AUDIT_ROW_HASH) AS min_DBt_AUDIT_ROW_HASH,
max(DBt_AUDIT_ROW_HASH) AS max_DBt_AUDIT_ROW_HASH
FROM CTE_
GROUP BY all
)
SELECT
cte_min_max.Dbt_audit_surrogate_key
FROM
cte_min_max
LEFT JOIN CTE_ AS cte_1 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.min_DBt_AUDIT_ROW_HASH = cte_1.DBt_AUDIT_ROW_HASH)
LEFT JOIN CTE_ AS cte_2 ON (cte_min_max.dbt_audit_surrogate_key = cte_1.dbt_audit_surrogate_key AND cte_min_max.max_DBt_AUDIT_ROW_HASH = cte_2.DBt_AUDIT_ROW_HASH)
WHERE
--ignore identical hashed rows
cte_min_max.min_DBt_AUDIT_ROW_HASH <> cte_min_max.max_DBt_AUDIT_ROW_HASH
--just show the diffs > than a threshold. this could be a parameter for the macro
AND
(
abs(CONVERTED_AMOUNT_diff) > 1
or abs(CONVERTED_AMOUNT_DEBIT_diff) > 1
or abs(CONVERTED_AMOUNT_CREDIT_diff) > 1
)
ORDER BY cte_min_max.Dbt_audit_surrogate_key
Additional context
Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
NO, will work on all databases
Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
This will benefit everyone using audit_helper.compare_and_classify_query_results
Are you interested in contributing this feature? No - I am not a python person
The text was updated successfully, but these errors were encountered: