-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance when using CloudBucketMount as a dataset for model training #1839
Comments
Hey, thanks for the issue report.
If the code is simple it'd be great to have it for a reproduction.
We are using this as of a few days ago 🙂. The cache is not preserved across Function executions though, so I'd expect the caching benefit is only for the 2nd and subsequent N reads on a file, not the first. I'm guessing you ran this test within the last 24 hours and thus should have had the caching? Overall Is your dateset loader using readahead? Here's an example of what I mean, where it's called "predownload": https://docs.mosaicml.com/projects/streaming/en/latest/_modules/streaming/base/dataset.html. |
Seems like the training is still very slow, I've tested on both L4 and A100.
for the dataset, you can use the COCO dataset: https://cocodataset.org/#download and store it in s3. for the ultralytics fork, you can use: P.S: I've tried to remove the neptune code, didn't seem to make it any noticeably faster. |
Firstly, thanks for the splendid work the team has done for this project, amazing stuff.
I was trying to figure out a way to train my ML models using modal, my dataset is in s3 (10k+ images, high res) so I thought it would be interesting to try out the
CloudBucketMount
to read from the dataset. It worked well, but it seemed to be considerably slow compared to using an L4 in another server (using an AWS server finishes the yolov8n model in like 3 minutes, in modal it timed out at 50 mins with less than 50% progress).I was wondering, would it be faster if the mountpoint caching was utilized to cache object data? in case that wasn't used so far. This should lead to less requests for the files and faster performance.
I'm suspecting it would be better for me to just sync my s3 data with a connected modal volume on startup.
The text was updated successfully, but these errors were encountered: