NVIDIA Neural Modules 1.15.0
Highlights
NeMo ASR
- HybridTransducer-CTC ASR
- Greedy timestamp decoding with inference script
- MHA adapters
- Conformer local attention (longformer)
- High level beam search API
- Multiblank transducer
- Multi-channel audio processing model
- AIstore for ASR datasets
NeMo Megatron
- ALiBi position embeddings support for T5
NeMo TTS
- Chinese TTS pipeline with polyphone disambiguation
NeMo Core
- Optimizer based EMA
- MLFlow logger support
Models
- stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
- stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.
Detailed Changelogs
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.12
ASR
Changelog
- optimized loop and bugfix by @Jorjeous :: PR: #5573
- Update torchmetrics by @nithinraok :: PR: #5566
- Add an option to defer data setup from init to setup by @anteju :: PR: #5569
- AIStore for ASR datasets by @anteju :: PR: #5462
- Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
- Update documentation and tutorials for Adapters by @titu1994 :: PR: #5610
- Conformer local attention by @sam1373 :: PR: #5525
- Add core classes and functions for online clustering diarizer part 1 by @tango4j :: PR: #5526
- [Add] ASR+VAD Inference Pipeline by @stevehuang52 :: PR: #5575
- [ASR] Audio processing base, multi-channel enhancement models by @anteju :: PR: #5356
- Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
- Add Beam Search support to ASR transcribe() by @titu1994 :: PR: #5443
- Multiblank Transducer by @hainan-xv :: PR: #5527
- pin torchmetrics version by @nithinraok :: PR: #5720
- Update torchaudio dependency version for tutorials by @titu1994 :: PR: #5781
- update torchmetrics to latest version by @nithinraok :: PR: #5801
- Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
- [BugFix] Updated CTC decoders installation in tutorial by @vsl9 :: PR: #5833
- update torchmetrics args confusionmatrix by @nithinraok :: PR: #5853
- indentation fix by @nithinraok :: PR: #5861
- Fix wrong label mapping in batch_inference for label_model by @fayejf :: PR: #5767
TTS
Changelog
- Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
- [TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
- [TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
- Fixed RadTTS unit test by @borisfom :: PR: #5572
- [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
- Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
- [TTS] add tts dict cust notebook by @ekmb :: PR: #5662
- [TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
- typo and link fixed by @ekmb :: PR: #5741
- link fixed by @ekmb :: PR: #5745
- Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
- Docs g2p update by @ekmb :: PR: #5769
- [TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776
NLP / NMT
Changelog
- Text generation improvement (UI client, data parallel support) by @yidong72 :: PR: #5437
- O2 style amp for gpt3 ptuning by @JimmyZhang12 :: PR: #5246
- Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
- Bert interleaved by @shanmugamr1992 :: PR: #5556
- Port stateless timer to exp manager by @MaximumEntropy :: PR: #5584
- Add interface for making amax reduction optional for FP8 by @ksivaman :: PR: #5447
- Propagate attention_dropout flag for GPT-3 by @mikolajblaz :: PR: #5669
- Enc-Dec model size reporting fixes by @MaximumEntropy :: PR: #5623
- Add prompt learning tests by @arendu :: PR: #5649
- Fix missing torchelastic fixes for PTL 1.8 by @MaximumEntropy :: PR: #5691
- ALiBi Positional Embeddings by @michalivne :: PR: #5467
- Megatron export triton update by @Davood-M :: PR: #5766
- Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
- Update description for question answering tutorial by @Zhilin123 :: PR: #5814
- TPMLP for T5-based models by @Davood-M :: PR: #5840
- Megatron positional encoding alibi fix by @michalivne :: PR: #5808
Export
Changelog
General Improvements
Changelog
- Update to pytorch 22.12 container by @ericharper :: PR: #5694
- optimized loop and bugfix by @Jorjeous :: PR: #5573
- Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
- remove useless files. by @XuesongYang :: PR: #5580
- [Fix] setup_multiple validation/test data by @anteju :: PR: #5585
- Move to optimizer based EMA implementation by @SeanNaren :: PR: #5169
- [Temp workaround] Disable test with cache_audio to unblock CI by @anteju :: PR: #5615
- [EMA] Change success message to reduce confusion by @SeanNaren :: PR: #5621
- Temporarily disable prompt learning CI tests by @ericharper :: PR: #5633
- [Dockerfile] Remove AIS archive from docker image by @anteju :: PR: #5629
- [workflow] add exclude labels option to ignore cherry-picks in releas… by @XuesongYang :: PR: #5645
- Add DLLogger support to exp_manager by @milesial :: PR: #5658
- Fix EMA restart by allowing device to be set by the class init by @SeanNaren :: PR: #5668
- Remove SDP (moved to separate repo) - merge to main by @erastorgueva-nv :: PR: #5630
- temp disable speaker recognision CI test by @fayejf :: PR: #5696
- Don't print exp_manager warning when max_steps == -1 by @milesial :: PR: #5725
- Add tabular data generation documents to the index file by @yidong72 :: PR: #5733
- fix token id bug by @yidong72 :: PR: #5777
- Update numpy requirements from 1.21 to 1.22 by @Zhilin123 :: PR: #5785
- Fix setuptools to usable version by @titu1994 :: PR: #5798
- add apt-get upgrade -y in dockerfile by @fayejf :: PR: #5817
- Update NeMo Multi-Run docs by @titu1994 :: PR: #5844
- add ambernet to readme by @fayejf :: PR: #5872
- update apex install instructions for 1.15 by @ericharper :: PR: #5901