Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick_are_queries_identical macros is not working as expected when there is a empty query result/table is involved #117

Open
1 of 5 tasks
mahilaeshwar opened this issue Dec 7, 2024 · 0 comments
Labels
bug Something isn't working triage

Comments

@mahilaeshwar
Copy link

mahilaeshwar commented Dec 7, 2024

Describe the bug

quick_are_queries_identical macros is not working as expected when there is a empty query result/table is involved.

Steps to reproduce

Scenario 1:
when both the query_a and query_b returns no result(empty output) then the quick_are_queries_identical returns false but ideally it should return true since both the query returns empty result.

Expected results True

Actual results False

{% set old_query %}
  select
    *
from `bigquery-public-data.faa.us_airports` LIMIT 0
{% endset %}
{% set new_query %}
  select
    *
  from `bigquery-public-data.faa.us_airports` LIMIT 0
{% endset %}
{{ 
  audit_helper.quick_are_queries_identical(
    old_query, 
    new_query, 
    columns=['object_id','global_id','faa_identifier','name','latitude','longitude','airport_geom','elevation','icao_id','airport_type','service_city','state_abbreviation','country','oper_status','airport_use','iap_exists','dod_hiflip','far_91','far_93','mil_code','airspace_analysis','us_high','us_low','ak_high','ak_low','us_area','pacific']
  )
}}

Screenshots and log output

image

Scenario 2:
when one of the query_a and query_b returns no result(empty output) then the quick_are_queries_identical returns true but ideally it should return false since one of the query returns value and other return empty result.

Expected results False

Actual results True

{% set old_query %}
  select
    *
from `bigquery-public-data.faa.us_airports`
{% endset %}
{% set new_query %}
  select
    *
  from `bigquery-public-data.faa.us_airports` LIMIT 0
{% endset %}
{{ 
  audit_helper.quick_are_queries_identical(
    old_query, 
    new_query, 
    columns=['object_id','global_id','faa_identifier','name','latitude','longitude','airport_geom','elevation','icao_id','airport_type','service_city','state_abbreviation','country','oper_status','airport_use','iap_exists','dod_hiflip','far_91','far_93','mil_code','airspace_analysis','us_high','us_low','ak_high','ak_low','us_area','pacific']
  )
}}

Screenshots and log output

image

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_external_tables
    version: [">=0.8.0", "<0.9.0"]
  
  - package: dbt-labs/dbt_utils
    version: [">=0.9.0", "<1.0.0"]

  - package: dbt-labs/codegen
    version: [">=0.8.0", "<=0.9.0"]

  - package: dbt-labs/audit_helper
    version: [">=0.6.0", "<0.12.1"]

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

latest

Additional context

I believe the reason this is happening is because bit_xor of no result will be NULL and when we do count(distinct hash_result) the NULL value won't be taken into account.

    select count(distinct hash_result) = 1 as are_tables_identical
    from (
        select bit_xor(farm_fingerprint(to_json_string(query_a))) as hash_result
        from query_a

        union all
        
        select bit_xor(farm_fingerprint(to_json_string(query_b))) as hash_result
        from query_b
    ) as hashes

to handle this we can add IFNULL() to the hash_result

Are you interested in contributing the fix?

Yes!

@mahilaeshwar mahilaeshwar added bug Something isn't working triage labels Dec 7, 2024
mahilaeshwar added a commit to mahilaeshwar/dbt-audit-helper that referenced this issue Dec 8, 2024
@mahilaeshwar mahilaeshwar changed the title quick_are_queries_identical macros is not working as expected when there is a empty query result/table is involved quick_are_queries_identical macros is not working as expected when there is a empty query result/table is involved Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant