Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using s3 as a staging destination for Redshift, allow users to specify region of s3 bucket #2349

Open
nsnider-fabric opened this issue Feb 25, 2025 · 2 comments · May be fixed by #2389
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@nsnider-fabric
Copy link

Feature description

Are you a dlt user?

Yes, I run dlt in production.

Use case

When copying data from S3 to redshift, if the s3 bucket is in a different region than your redshift cluster, it is required to specify the REGION of the S3 bucket in your COPY statement. This is documented in AWS docs here - https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-source-s3.html#copy-region

With the current dlt <> redshift integration, there is no way for the user to specify the region in the COPY statement. This causes load jobs to fail permanently when the s3 bucket and redshift cluster are in different regions.

Proposed solution

The COPY statement is templated here I believe:

https://github.com/dlt-hub/dlt/blob/devel/dlt/destinations/impl/redshift/redshift.py#L107

The goal would be to enable the user to configure the REGION with something like a config.toml or environment variables

Related issues

No response

@sh-rp sh-rp added good first issue Good for newcomers enhancement New feature or request labels Feb 26, 2025
@djudjuu djudjuu self-assigned this Mar 3, 2025
@sh-rp sh-rp moved this from Todo to Planned in dlt core library Mar 3, 2025
@sh-rp
Copy link
Collaborator

sh-rp commented Mar 4, 2025

@djudjuu it's probably enough to add the region from the staging credentials in the RedshiftCopyFileLoadJob SQL statement, so a one line code change. For us to test it, you'd have to create an s3 bucket in a different region than our redshift database resides and see if it works, you can also ask the user to check it out, they'll have to specify a different region for the staging destination (filesystem) and end-destination (redshift)

@yannik207
Copy link

hey @sh-rp ,
I updated the class locally and it seems to work for me.
Should the region have a default value like 'us-east-1' or what would you suggest?
How about error handling should this be implemented too?

I would be happy to contribute to this issue :)

yannik207 pushed a commit to yannik207/dlt that referenced this issue Mar 9, 2025
@yannik207 yannik207 linked a pull request Mar 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: Planned
Development

Successfully merging a pull request may close this issue.

4 participants