Skip to content

Add an option in the generate-model-yml macro to preserve data type case of the target database. #252

Open
@fionayz

Description

@fionayz

Describe the feature

Add an option in the generate-model-yml macro to preserve data type of the target database.

  • proposed option name: case_sensitive_data_types
  • Optional or Required: Optional
  • proposed default value: False . When set to True, do not turn data types to lower case

Additional context

When I use the macro to create model yaml with ClickHouse as the target database. It turns the data types into lower case as the following. This causes failure to dbt run once we enforce contract to the model as it looks for exact match to the data types returned from the database.

version

dbt-core = "1.8.0"
dbt-clickhouse = "1.8.0"

example schema created from current generate-model-yml with ClickHouse target

version: 2
models:
- name: foundation__inventory_dimension
  config:
    contract:
      enforced: true
  description: ''
  columns:
  - name: shop_id
    data_type: datetime64(3)
    description: ''
  - name: inventory_item_id
    data_type: uint64
    description: ''
  - name: inventory_group_id
    data_type: uint64
    description: ''
  - name: inventory_updated_at
    data_type: datetime64(3)
    description: ''
  - name: inventory_quantity
    data_type: int64
    description: ''
  - name: _row_version
    data_type: datetime64(3)
    description: ''
  - name: _is_deleted
    data_type: bool
    description: ''

Contract failed example

  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
  
  | column_name          | definition_type | contract_type | mismatch_reason    |
  | -------------------- | --------------- | ------------- | ------------------ |
  | _is_deleted          | Bool            | bool          | data type mismatch |
  | _row_version         | DateTime64(3)   | datetime64(3) | data type mismatch |
  | inventory_group_id   | UInt64          | uint64        | data type mismatch |
  | inventory_item_id    | UInt64          | uint64        | data type mismatch |
  | inventory_quantity   | Int64           | int64         | data type mismatch |
  | inventory_updated_at | DateTime64(3)   | datetime64(3) | data type mismatch |
  | shop_id              | UInt64          | uint64        | data type mismatch |

Describe alternatives you've considered

Alternatively, I think

  1. we can probably change the contract assert behaviour in the dbt-core to bear with case difference on data types (here?). But the scope of impact can be large.
  2. we can probably also change the dbt-clickhouse adapter to optional return lower case. But this doesn't feel like a nature expectation to each individual adapter.

I think to provide option from generate-model-yml is the most natural, flexible and less interruptive (to the core or adapters) way.

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.

Our project currently needs this feature so that we can generate model yml to 100+ models , save manual toil fixing the cases on data type in order to enable data contract.

This option should also benefit all the users of this macro.

Are you interested in contributing this feature?

Yes! PR on its way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions