Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 URL support for image-service and pipeline #183

Open
sat01a opened this issue Jul 10, 2023 · 3 comments
Open

S3 URL support for image-service and pipeline #183

sat01a opened this issue Jul 10, 2023 · 3 comments
Assignees

Comments

@sat01a
Copy link

sat01a commented Jul 10, 2023

Time to time we get datasets with images that they are not on the web and need to be ingested into ALA.
The easy way can be: upload them into subfolder of the DR on a S3 bucket and put the paths in the DwCA files.
This may need a translation of s3:// to https:// or support of s3;// in image-service.

The other workaround (for limited number of images) can be, uploading them into the image-service first and then linking them back in the DwCA.

Needs more discussion to find the best solution.

@sadeghim
Copy link
Member

sadeghim commented Aug 9, 2023

Hi @sat01a do we have any update on this? The process of loading actual image files are tedious now and involves Database update and image-reindex. It will be very good if we can give it some priority.

@sbearcsiro
Copy link
Contributor

@sadeghim You should already be able to address s3 objects via a HTTP URL. If you have the s3 client library available you should be able to use s3Client.getUrl(bucket, path) or the equivalent if it's a public object. For private objects you can supply presigned URLs, like so:

            GeneratePresignedUrlRequest generatePresignedUrlRequest =
                    new GeneratePresignedUrlRequest(bucket, path)
                            .withMethod(HttpMethod.GET)
                            .withExpiration(expiration)
            s3Client.generatePresignedUrl(generatePresignedUrlRequest)

@peggynewman
Copy link

@sbearcsiro We understand that we can't have public buckets/objects. I don't want to add temporary presigned URLs to objects in the data. It is messy and unmanageable. Is there another option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants