Releases: MLMI2-CSSI/foundry
v0.7.2 -- Download using HTTPS by default
Summary
Updated dataset download functionality to use HTTPS as the default; this means a user does not have to set up a Globus Endpoint in order to download or load a dataset. This should cut down on the barrier to entry for using Foundry Datasets and improve overall user experience. Documentation was updated accordingly.
Added minor updates to fix broken links in documentation and improve/standardize docstrings.
Fixed a broken GitBook integration such that changes made in our documentation are now reflected in the GitHub repo.
Changes made in line with JOSS paper submission review.
What's Changed
- docstring updates by @ascourtas in #415
- Add a Dependabot config to autoupdate GitHub action versions by @kurtmckee in #393
- Joss https default by @ascourtas in #416
New Contributors
- @kurtmckee made their first contribution in #393
Full Changelog: v0.7.1...v0.7.2
v0.7.1 - Code fixes, splits specification, and metadata validation and handling
This release addresses some previous bugs with loading datasets on init(); that functionality has been removed for the time being, in favor of a more robust refactor in future releases.
In addition to code fixes, we've added the following functionality and improvements:
Users can now specify to download specific splits when they load_data(); this reduces the time and RAM required for people to use datasets when they may only need part of it.
Ex: tr = f.load_data(split="train")
Also, dataset metadata are now validated with appropriate error handling, so a user publishing a new dataset is instantly notified if any part of their specification is incompatible with the metadata schema. This will improve the user experience for both dataset publishers and consumers.
Additionally, this release includes code cleanup, docs improvements, and new applied AI examples.
What's Changed
- Metadata error handling by @blue442 in #377
- remove breaking changes that load on init by @ascourtas in #406
- Split specification by @blue442 in #344
- automating api documentation using github action by @blue442 in #342
- Removing remnants of XTract by @blaiszik in #355
- Merge Split specification from dev into main (#344) by @blaiszik in #356
- Update README.md with contributing instructions by @ascourtas in #357
- add jingrui examples by @ascourtas in #363
- Adds note for quickstart globus set to false by @marshallmcdonnell in #374
Full Changelog: v0.7.0...v0.7.1
[YANKED] v0.7.0 -- New feature: loading split specifications
Yanked due to mistakenly deploying "Load on init" changes before they were ready
Overview
Users can now specify to download specific splits when they load_data()
; this reduces the time and RAM required for people to use datasets when they may only need part of it.
Ex: tr = f.load_data(split="train")
Additionally, this release includes code cleanup, docs improvements, and new applied AI examples.
What's Changed
- Split specification by @blue442 in #344
- automating api documentation using github action by @blue442 in #342
- Removing remnants of XTract by @blaiszik in #355
- Merege Split specification from dev into main (#344) by @blaiszik in #356
- Update README.md with contributing instructions by @ascourtas in #357
- add jingrui examples by @ascourtas in #363
- Load on init by @blue442 in #358
- Adds note for quickstart globus set to false by @marshallmcdonnell in #374
New Contributors
- @marshallmcdonnell made their first contribution in #374
Full Changelog: v0.6.3...v0.7.0
v0.6.3 -- Patch for requirements
Patch to update the base requirements for DLHub SDK to 2.0.3
v0.6.2 -- Swap default download method to HTTPS
- This release changes the default download path to HTTPS from Globus. Users can still download using Globus by setting the flag
globus=True
in the Foundryload()
function.
v0.6.0 -- Add ability to upload via HTTPS
Users now have the option to upload datasets via HTTPS instead of using Globus Connect Personal; this saves users time and energy.
Includes function name changes as well as updates for compatibility with DLHub SDK >= 2.0.x (container-service compat release).
Other changes include:
- minor bugfixes
- data publishing notebook updates
v0.5.1 -- Fix PyPI installation, improve docmentation
What's Changed
- Improve HTTPS downloads by @WardLT in #277
- Update README.md by @Aadit-Ambadkar in #275
- updated model pub notebook by @kjschmidt913 in #284
- Set Header Images to an absolute URL via raw.githubusercontent by @cyschneck in #283
- remove extraneous "no" by @sgbaird in #299
- Add website link by @WardLT in #301
New Contributors
- @WardLT made their first contribution in #277
- @cyschneck made their first contribution in #283
- @sgbaird made their first contribution in #299
Full Changelog: v0.5.0...v0.5.1
v0.5.0 -- Publish models to DLHub and improved dataset functionality
Overview
New functionality includes:
- publish models to DLHub using
publish_model()
search()
capability for datasets- generate a BibTex citation using
get_citation()
We also improved info logging, added testing for Python 3.10, parallelized HTTPS downloads for faster dataset loading, and added buttons in our example notebooks to open them in Google Colab.
Also includes code cleanup, minor bug fixes, and improved testing capabilities.
What's Changed
- Fix Logging and Remove unnecessary code by @Aadit-Ambadkar in #228
- address code style (Issue #245) by @isaac-darling in #246
- Blaiszik patch 1 by @blaiszik in #238
- add Python 3.10 tests and flake8 error checking by @isaac-darling in #231
- Add search by @blaiszik in #256
- Create README.md by @kjschmidt913 in #261
- Open in Colab Buttons by @Aadit-Ambadkar in #253
- Moving some functions to utils by @blaiszik in #262
- added the publish wrapper by @isaac-darling in #235
- Update README.md by @kjschmidt913 in #269
- Update README.md by @kjschmidt913 in #270
- Update read logic by @blaiszik in #271
- Added new search test. Removed some stray commented-out code. by @blaiszik in #272
- Paralellize HTTPS downloads. Remove joblib and six requirements by @blaiszik in #273
- Get citation function by @blaiszik in #274
- Dev by @ascourtas in #265
Full Changelog: v0.4.0...v0.5.0
v0.4.0 - Search functionality, plus updates to logging and testing
What's Changed
- Added
search()
functionality to search through datsets, in addition to listing them all withlist()
- Improved logging capabilities and error messages
- Added flake8 linting as mandatory
- Made style changes per flake8 feedback
- Added additional testing functionality
Full Changelog: v0.3.0...v0.4.0
v0.3.0 - Minor release for compatibility with newest FuncX and DLHub
Overview
Compatibility with FuncX 1.0.x and DLHub SDK 1.0.0. Also includes testing clean up and improvements, the ability for a user to pass the funcX endpoint ID into run()
, initial PyTorch Dataset support, and minor changes.
What's Changed
- test python package dependency caching optimization by @BraedenCu in #191
- add option to pass in funcx_endpoint in run() by @ascourtas in #194
- Add Convert to Pytorch Dataset Functionality by @Aadit-Ambadkar in #200
- Replace PNG with SVG by @Aadit-Ambadkar in #202
- Fix .load function by @Aadit-Ambadkar in #227
- fix unused imports, code style, and update syntax by @isaac-darling in #229
- merge tests into single file by @isaac-darling in #233
- To tf dataset by @Aadit-Ambadkar in #201
- Funcx 1.0.x compat by @ascourtas in #249
- Dev - funcx and dlhub 1.0.0 compatibility by @ascourtas in #250
New Contributors
- @BraedenCu made their first contribution in #191
- @isaac-darling made their first contribution in #229
Full Changelog: 0.2.2...v0.3.0