Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition_by should not be applied in __dbt_tmp for incremental model #231

Open
kenho811 opened this issue May 29, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@kenho811
Copy link

kenho811 commented May 29, 2024

Describe the enhancement requested

I observe that when running a incremental model,

1.) __dbt_tmp is created;
2.) data in __dbt_tmp is inserted into destination table.

It is all good, until I add a partition_by to the model.

What I want is to have partition in the destination table. However, I see that when doing incremental load, the __dbt_tmp is also partitioned by the column.

It caused the undesirable effect where my incremental data in __dbt_tmp got split into a lot of partitions (because of partition_by). When I load data from __dbt_tmp into the destination table, it took a long time.

For the incremental load, I only load last day's data. And I want partition to be done on another key in the destination table .

Is this possible?

=========

Example

I have the below model's config

{{
    config(
        enabled=True,
        materialized='incremental',
        incremental_strategy='append',
        unique_key=['pkey'],
        partition_by=['sehk_code'],
        localsort_by=['as_of_date']
    )
}}

dbt-dremio created the below __dbt_tmp

/* {"app": "dbt", "dbt_version": "1.7.13", "profile_name": "dremio", "target_name": "prod", "node_id": "model.dremio.rollup_daily_hk_stock_shareholding_change"} */

  
    
  

  create table "Minio"."finance"."l1"."rollup_daily_hk_stock_shareholding_change__dbt_tmp"
   partition by ("sehk_code")
  
  localsort by ("as_of_date")

....

I believe this is unnecessary?

Should this partition by only applied to the destination table? (and not the __dbt_tmp) ??

@kenho811
Copy link
Author

I still don't quite understand why the partition_by is applied on the __dbt_tmp (source table), instead of the actual model I am creating (i.e. the destination table)

@bcmeireles bcmeireles added the enhancement New feature or request label Dec 4, 2024
@wabu
Copy link

wabu commented Dec 8, 2024

I did stumbled on the same issue, and it seems that the partition_by is not at all applied to the destination table.

@kenho811, did you get partitioning on the destination at all or only on __dbt_tmp?

If @kenho811 can confirm that partitioning is only applied on __dbt_tmp and not at the destination table at all, I'd rather call it a bug than an enhancement, as the documentation cleary states that it should be possible to get partitioning on the incremental model and it won't be possible to create incremental models with partitioning at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants