Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify download APIs to return file location in minio mounted FS #116

Open
Vismayak opened this issue Jan 9, 2025 · 1 comment · May be fixed by #117
Open

Modify download APIs to return file location in minio mounted FS #116

Vismayak opened this issue Jan 9, 2025 · 1 comment · May be fixed by #117
Assignees

Comments

@Vismayak
Copy link
Contributor

Vismayak commented Jan 9, 2025

To modify pyclowder API for systems where the Clowder filesystem is mounted
Objective: Adapt pyclowder to leverage a mounted Clowder filesystem instead of downloading files from MinIO

Implementation Plan:

1. Introduce a flag (e.g., mounted=True) to toggle between mounted filesystem usage and standard MinIO operations.
2. For file downloads, when the mounted flag is set, return the file’s path on the mounted filesystem instead of downloading.
3. For dataset downloads, when the mounted flag is set, return a list of file paths (directories) from the mounted filesystem.
We can use pyclowder.datasets.get_file_list and continue to use the modified files download function

Similar to how it was previously done for a local mounted filesysytem, we will modify the _check_for_local_file in connectors. We will introduce a new environment variable MINIO_MOUNT like the MOUNTED_PATHS. The file will then check the minio mount with the file id and return that pathway instead. This will involve editing the extractors.py and connectors.py file

Testing Ground

In the extractors-huggingface repo, the two extractors
pipeline-image-classification-file and the pipleine-image-classigication-dataset will be the best place to test the modified APIs

Mount the minio bucket to a clowderfs directory in the VM, then test the pyclowder APIs with the extractors locally.

@Vismayak Vismayak self-assigned this Jan 9, 2025
@Vismayak
Copy link
Contributor Author

Vismayak commented Jan 9, 2025

The recent code pushes to the branch has been able to read local mounted files. I have run inference with the pipeline-image-classification-file
image
Need to complete code for dataset reading, for now I am able to print all the filepaths but need to see how the extractor can utilize the file paths.

@Vismayak Vismayak linked a pull request Jan 17, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant