Skip to content

Modify download APIs to return file location in minio mounted FS #116

Open
@Vismayak

Description

@Vismayak

To modify pyclowder API for systems where the Clowder filesystem is mounted
Objective: Adapt pyclowder to leverage a mounted Clowder filesystem instead of downloading files from MinIO

Implementation Plan:

1. Introduce a flag (e.g., mounted=True) to toggle between mounted filesystem usage and standard MinIO operations.
2. For file downloads, when the mounted flag is set, return the file’s path on the mounted filesystem instead of downloading.
3. For dataset downloads, when the mounted flag is set, return a list of file paths (directories) from the mounted filesystem.
We can use pyclowder.datasets.get_file_list and continue to use the modified files download function

Similar to how it was previously done for a local mounted filesysytem, we will modify the _check_for_local_file in connectors. We will introduce a new environment variable MINIO_MOUNT like the MOUNTED_PATHS. The file will then check the minio mount with the file id and return that pathway instead. This will involve editing the extractors.py and connectors.py file

Testing Ground

In the extractors-huggingface repo, the two extractors
pipeline-image-classification-file and the pipleine-image-classigication-dataset will be the best place to test the modified APIs

Mount the minio bucket to a clowderfs directory in the VM, then test the pyclowder APIs with the extractors locally.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions