You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our pipeline state tables have grown upwards of 20 gigabytes after only a month of operation, which leads to heavy IO on snowflake and considerable slowdowns in our pipelines (restoring the state takes at least 15 minutes across the board). We haven't found a way to pass restore_from_destination in an Airflow context, and it is unclear whether that's the way forward, or if it's a bug that needs to be addressed. We don't add anything to the state, to the best of our understanding, it should only contain incremental information, which we may not even need since we are looking into using the airflow context anyway.
Expected behavior
The state should be relatively small if it only contains the incremental ranges.
Steps to reproduce
Use DLT with Snowflake as destination with an incremental strategy.
Operating system
Linux
Runtime environment
Google Cloud Composer
Python version
3.11
dlt data source
SQLAlchemy (Snowflake)
dlt destination
Snowflake
Other deployment details
We are using a GCS external stage for loading to the Snowflake destination.
Additional information
No response
The text was updated successfully, but these errors were encountered:
SELECT "VERSION", "ENGINE_VERSION", "PIPELINE_NAME", "STATE", "CREATED_AT", "_DLT_LOAD_ID" FROM "MY_SCHEMA"."_DLT_PIPELINE_STATE" AS s JOIN "MY_SCHEMA"."_DLT_LOADS" AS l ON l."LOAD_ID" = s."_DLT_LOAD_ID" WHERE "PIPELINE_NAME" = 'my_pipeline' AND l."STATUS" = 0 ORDER BY "LOAD_ID" DESC
dlt version
1.5.0
Describe the problem
Our pipeline state tables have grown upwards of 20 gigabytes after only a month of operation, which leads to heavy IO on snowflake and considerable slowdowns in our pipelines (restoring the state takes at least 15 minutes across the board). We haven't found a way to pass
restore_from_destination
in an Airflow context, and it is unclear whether that's the way forward, or if it's a bug that needs to be addressed. We don't add anything to the state, to the best of our understanding, it should only contain incremental information, which we may not even need since we are looking into using the airflow context anyway.Expected behavior
The state should be relatively small if it only contains the incremental ranges.
Steps to reproduce
Use DLT with Snowflake as destination with an incremental strategy.
Operating system
Linux
Runtime environment
Google Cloud Composer
Python version
3.11
dlt data source
SQLAlchemy (Snowflake)
dlt destination
Snowflake
Other deployment details
We are using a GCS external stage for loading to the Snowflake destination.
Additional information
No response
The text was updated successfully, but these errors were encountered: