Releases: Lightning-AI/litData
Releases · Lightning-AI/litData
0.2.19
What's Changed
- Fix: failing tests due to future warning related to torch.loads(weights_only=True) by @deependujha in #272
- support downloading from azure blob storage by @jaehwana2z in #262
- Bump Lightning-AI/utilities from 0.11.5 to 0.11.6 by @dependabot in #274
- resolved downloading data from azure blob storage by @mohanreddypmr in #275
- Fix filename in merging compressed datasets by @bhimrazy in #277
- Bad overriding of thread._delete by @tchaton in #278
- Bump version 0.2.19 by @tchaton in #279
New Contributors
- @jaehwana2z made their first contribution in #262
- @mohanreddypmr made their first contribution in #275
Full Changelog: v0.2.18...v0.2.19
0.2.18
What's Changed
- Always send the rank when broadcasting by @awaelchli in #257
- fix: Handle missing 'encryption' field in legacy dataset by @csy1204 in #259
- Update map() and optimize() documentation by @senarvi in #264
- Correct README.md for CombinedStreamingDataset with proportions by @hiyyg in #266
- Update README.md by @tchaton in #267
- Update README.md by @tchaton in #268
- Update README.md by @tchaton in #269
- Bump version 0.2.18 by @tchaton in #270
New Contributors
Full Changelog: v0.2.17...v0.2.18
v0.2.17
This release contains new features and fixes for distributed training.
Important: This release fixes hangs in distributed training by ensuring the same number of batches are returned on each rank (#237). However, this and other fixes change how samples are assigned to ranks and is therefore a breaking change. Resuming from checkpoints created with an older version of LitData will not be valid (if you are using the stateful data loader feature).
What's Changed
- Feat: Updates readme and a few nitpicks by @deependujha in #223
- docs: add
Specify cache directoryby @csy1204 in #229 - Enable compatibility with Numpy 2.0 by @weiji14 in #230
- Fix typo in resolver.py by @lud-ds in #239
- Feature: Add support for encryption and decryption of data at chunk/sample level by @bhimrazy in #219
- Fix uneven batches in distributed dataloading by @awaelchli in #237
- feat: add a custom storage options param by @csy1204 in #246
- Fix index errors on world size > 0 by @awaelchli in #252
New Contributors
- @csy1204 made their first contribution in #229
- @weiji14 made their first contribution in #230
- @lud-ds made their first contribution in #239
Full Changelog: v0.2.16...v0.2.17
v0.2.16
What's Changed
- Feat: adds support for reading mosaic mds written dataset by @bhimrazy in #210
- Fix: local path issue in distributed optimize method by @deependujha in #214
- Fix resuming dataset state by @awaelchli in #217
Full Changelog: v0.2.15...v0.2.16
v0.2.15
What's Changed
- Fix: dataloader state dict indexerror by @esivonxay-cognitiv in #198
- Bump coverage from 7.5.3 to 7.5.4 by @dependabot in #200
- Bump pytest from 8.2.1 to 8.2.2 by @dependabot in #203
- Bump lightning-cloud from 0.5.69 to 0.5.70 by @dependabot in #202
- Feat: checkpoint optimize function to restart after crash by @deependujha in #206
- Fix: magic serializer issue by @deependujha in #207
- Bump version 0.2.15 by @deependujha in #208
New Contributors
- @esivonxay-cognitiv made their first contribution in #198
Full Changelog: v0.2.14...v0.2.15
Release 0.2.14
What's Changed
- Add utility to merge datasets together by @tchaton in #190
- Fix: unexpected behaviours (bugs) in train_test_split fixed by @deependujha in #192
- Release LitData 0.2.14 by @tchaton in #194
Full Changelog: v0.2.13...v0.2.14
Release 0.2.13
What's Changed
- Update README with config for MinIO by @bhimrazy in #174
- Bump pypa/gh-action-pypi-publish from 1.8.14 to 1.9.0 by @dependabot in #177
- Fix: error while splitting dataset with
splits=[0.1, 0.2, 0.7]and support split of 0.0 by @deependujha in #187 - Feat: Append data to pre-optimize dataset by @deependujha in #184
- Fix: Resolve the default weights of the combined dataset by @tchaton in #188
- Bump version 0.2.13 by @tchaton in #189
Full Changelog: v0.2.12...v0.2.13
Release v0.2.12
What's Changed
- Resolve num_workers when the user provides 0 by @tchaton in #173
- Add feature to slice, subsample and split dataset by @deependujha in #161
- Release version 0.2.12 by @tchaton in #176
New Contributors
- @deependujha made their first contribution in #161
Full Changelog: v0.2.11...v0.2.12
Release v0.2.11
Release v0.2.10
What's Changed
- Fix litdata on colab by @tchaton in #166
- Add first draft for multi modal model training text & image by @rakro101 in #160
- Bump coverage from 7.5.0 to 7.5.3 by @dependabot in #149
- Bump lightning-cloud from 0.5.68 to 0.5.69 by @dependabot in #150
- Bump version 0.2.10 by @tchaton in #167
New Contributors
Full Changelog: v0.2.9...v0.2.10