You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To modify pyclowder API for systems where the Clowder filesystem is mounted Objective: Adapt pyclowder to leverage a mounted Clowder filesystem instead of downloading files from MinIO
Implementation Plan:
1. Introduce a flag (e.g., mounted=True) to toggle between mounted filesystem usage and standard MinIO operations.
2. For file downloads, when the mounted flag is set, return the file’s path on the mounted filesystem instead of downloading.
3. For dataset downloads, when the mounted flag is set, return a list of file paths (directories) from the mounted filesystem.
We can use pyclowder.datasets.get_file_list and continue to use the modified files download function
Similar to how it was previously done for a local mounted filesysytem, we will modify the _check_for_local_file in connectors. We will introduce a new environment variable MINIO_MOUNT like the MOUNTED_PATHS. The file will then check the minio mount with the file id and return that pathway instead. This will involve editing the extractors.py and connectors.py file
Testing Ground
In the extractors-huggingface repo, the two extractors
pipeline-image-classification-file and the pipleine-image-classigication-dataset will be the best place to test the modified APIs
Mount the minio bucket to a clowderfs directory in the VM, then test the pyclowder APIs with the extractors locally.
The text was updated successfully, but these errors were encountered:
The recent code pushes to the branch has been able to read local mounted files. I have run inference with the pipeline-image-classification-file
Need to complete code for dataset reading, for now I am able to print all the filepaths but need to see how the extractor can utilize the file paths.
To modify pyclowder API for systems where the Clowder filesystem is mounted
Objective: Adapt pyclowder to leverage a mounted Clowder filesystem instead of downloading files from MinIO
Implementation Plan:
1. Introduce a flag (e.g., mounted=True) to toggle between mounted filesystem usage and standard MinIO operations.2. For file downloads, when the mounted flag is set, return the file’s path on the mounted filesystem instead of downloading.
3.
For dataset downloads, when the mounted flag is set, return a list of file paths (directories) from the mounted filesystem.We can use pyclowder.datasets.get_file_list and continue to use the modified files download function
Similar to how it was previously done for a local mounted filesysytem, we will modify the
_check_for_local_file
in connectors. We will introduce a new environment variableMINIO_MOUNT
like theMOUNTED_PATHS
. The file will then check the minio mount with the file id and return that pathway instead. This will involve editing the extractors.py and connectors.py fileTesting Ground
In the extractors-huggingface repo, the two extractors
pipeline-image-classification-file and the pipleine-image-classigication-dataset will be the best place to test the modified APIs
Mount the minio bucket to a
clowderfs
directory in the VM, then test the pyclowder APIs with the extractors locally.The text was updated successfully, but these errors were encountered: