Skip to content

IBM watsonx.data destination connector: improve performance with partitioning, metadata cleanup, and increased retries #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Paul-Cornell
Copy link
Collaborator

@Paul-Cornell Paul-Cornell commented Apr 23, 2025

See the updated CREATE TABLE statement, the new Python script, and additions to Max Connection Retries and Max Retries, in https://unstructured-53-ibm-watsonxdata-2025-04-22.mintlify.app/ui/destinations/ibm-watsonxdata

…e partitioning, periodic metadata cleanup, and increased connection retries
@Paul-Cornell Paul-Cornell changed the title [Hold][WIP] IBM watsonx.data destination connector: improve performance with table partitioning, periodic metadata cleanup, and increased connection retries IBM watsonx.data destination connector: improve performance with partitioning, metadata cleanup, and increased retries Apr 23, 2025
@Paul-Cornell Paul-Cornell marked this pull request as ready for review April 23, 2025 20:16
"id" varchar,
"record_id" varchar,
"parent_id" varchar
)
WITH (
delete_mode = 'copy-on-write',
format = 'PARQUET',
format_version = '2'
format_version = '2',
partitioning = ARRAY['record_id']
Copy link

@mpolomdeepsense mpolomdeepsense Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potter-potter We decided not to use partitioning right? We can probably remove this line.

But still I think we should include a section about partitioning, it could simply just say what partitioning is used for and that it can slightly increase the performance (3-4 sentences). And also add a link to docs about partitioning. and to Presto docs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 194 to +196
format = 'PARQUET',
format_version = '2'
format_version = '2',
partitioning = ARRAY['record_id']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably state somewhere that this SQL command is using Presto SQL syntax.
https://prestodb.io/docs/current/connector/iceberg.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants