-
Notifications
You must be signed in to change notification settings - Fork 22
IBM watsonx.data destination connector: improve performance with partitioning, metadata cleanup, and increased retries #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e partitioning, periodic metadata cleanup, and increased connection retries
"id" varchar, | ||
"record_id" varchar, | ||
"parent_id" varchar | ||
) | ||
WITH ( | ||
delete_mode = 'copy-on-write', | ||
format = 'PARQUET', | ||
format_version = '2' | ||
format_version = '2', | ||
partitioning = ARRAY['record_id'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@potter-potter We decided not to use partitioning right? We can probably remove this line.
But still I think we should include a section about partitioning, it could simply just say what partitioning is used for and that it can slightly increase the performance (3-4 sentences). And also add a link to docs about partitioning. and to Presto docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@potter-potter bump
format = 'PARQUET', | ||
format_version = '2' | ||
format_version = '2', | ||
partitioning = ARRAY['record_id'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably state somewhere that this SQL command is using Presto SQL syntax.
https://prestodb.io/docs/current/connector/iceberg.html
See the updated
CREATE TABLE
statement, the new Python script, and additions to Max Connection Retries and Max Retries, in https://unstructured-53-ibm-watsonxdata-2025-04-22.mintlify.app/ui/destinations/ibm-watsonxdata