Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1904571 Add option to configure v2 cleaner interval #1055

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sfc-gh-mbobowski
Copy link
Contributor

@sfc-gh-mbobowski sfc-gh-mbobowski commented Feb 11, 2025

Overview

SNOW-1904571

For Snowpipe based connector when having a topic with multiple partitions with no data produced the cleaner is never started because pipe doe not exists.

                // cleaner starts along with the partition task, but until table, stage and pipe
                // aren't created - there is no point in querying the stage.
                if (isFirstRun.get()
                    && checkPreRequisites() != CleanerPrerequisites.PIPE_COMPATIBLE) {
                  LOGGER.debug(
                      "neither table {} nor stage {} nor pipe {} have been initialized yet,"
                          + " skipping cycle...",
                      tableName,
                      stageName,
                      pipeName);
                  return;
                }

This generates additional describe table / stage / pipe calls that some users find costly and noisy.

This PR adds snowflake.snowpipe.v2CleanerIntervalSeconds configuration options to run cleaner with a lower frequency.

@sfc-gh-mbobowski sfc-gh-mbobowski requested a review from a team as a code owner February 11, 2025 14:42
@@ -84,7 +84,8 @@ def __init__(self, kafkaAddress, schemaRegistryAddress, kafkaConnectAddress, cre
}
else:
self.client_config = {
"bootstrap.servers": kafkaAddress
"bootstrap.servers": kafkaAddress,
"broker.address.family": "v4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it related to the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - yes it was intended :)
After upgrading to Sequoia I couldn't run e2e test locally because Kafka producers started to resolve Kafka address as ipv6 by default. Adding this option fixes local setup and do no harm to the CI execution.

Copy link
Contributor

@sfc-gh-akowalczyk sfc-gh-akowalczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, probably we just unnecessarily modified the test file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants