Releases: ohmeow/blurr
v.1.0.0
The official v.1 release of ohmeow-blurr
This is a massive refactoring over the previous iterations of blurr, including namespace modifications that will make it easier for us to add in support for vision, audio, etc... transformers in the future. If you've used any of the previous versions of blurr
or the development build we covered in part 2 of the W&B study group, please make sure you read the docs and note the namespace changes.
To get up to speed with how to use this library, check out the W&B x fastai x Hugging Face study group. The docs are your friend and full of examples as well. I'll be working on updating the other examples floating around the internet as I have time.
If you have any questions, please use the hf-fastai
channel in the fastai discord or github issues. As always, any and all PRs are welcome.
0.0.26 release
Checkout the readme for more info.
This release fixes a couple of issues and also includes a few breaking changes. Make sure you update your version of fastai to >= 2.3.1 and your huggingface transformers to >= 4.5.x
Goodby 2020 release!
- Updated the Seq2Seq models to use some of the latest huggingface bits like tokenizer.prepare_seq2seq_batch.
- Separated out the Seq2Seq and Token Classification metrics into metrics-specific callbacks for a better separation of concerns. As a best practice, you should now only use them as fit_one_cycle, etc.. callbacks rather than attach them to your Learner.
- NEW: Translation are now available in blurr, joining causal language modeling and summarization in our core Seq2Seq stack
- NEW: Integration of huggingface's Seq2Seq metrics (rouge, bertscore, meteor, bleu, and sacrebleu). Plenty of info on how to set this up in the docs.
- NEW: Added default_text_gen_kwargs, a method that given a huggingface config, model, and task (optional), will return the default/recommended kwargs for any text generation models.
- A lot of code cleanup (e.g., refactored naming and removal of redundant code into classes/methods)
- More model support and more tests across the board! Check out the docs for more info
- Misc. validation improvements and bug fixes.
See the docs for each task for more info!
PyTorch 1.7 and fast.ai 2.1.x compliant release
Makes blurr PyTorch 1.7 and fast.ai 2.1.x compliant.
Added new examples section
Misc. improvements/fixes.
On-the-fly Batch-Time Tokenization Release
This release simplifies the API and introduces a new on-the-fly tokenization feature whereby all tokenization happens during mini-batch creation. There are several upsides to this approach. First, it gets you training faster. Second, it reduces RAM utilization during the reading of your raw data (esp. nice with very large datasets that would give folks problems on platforms like colab). And lastly, I believe the approach provides some flexibility to include data augmentation and/or build adverserial models amongst other things.