NVIDIA Neural Modules 1.16.0
Highlights
NeMo ASR
- ASR Evaluator
- Multi-channel dereverberation algorithm
- Hybrid ASR-TTS Models
- Flashlight Decoder Beam Search
- FastConformer Encoder with 8x subsampling
NeMo TTS
- SSL Voice Conversion
- Spectrogram Enhancer
- VITS
NeMo Megatron
- Per microbatch dataloader for GPT and BERT
- Adapters compatible with Faster Transformer
NeMo Core
- Nested model support
NeMo Tools
- NeMo Forced Aligner
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.01
ASR
Changelog
- Fix for incorrect computation of batched alignment in transducers by @Kipok :: PR: #5692
- Set the stream position to 0 for pydub by @jonghwanhyeon :: PR: #5752
- [Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
- ASR evaluator by @fayejf :: PR: #5728
- [ASR][Test] Enable test for cache audio with a single worker by @anteju :: PR: #5763
- Flashlight Decoder for Nemo by @trias702 :: PR: #5790
- Fix data simulator by @stevehuang52 :: PR: #5813
- [ASR] Mask-based dereverb algorithm by @anteju :: PR: #5693
- Concat dataset and aistore support for label models by @Kipok :: PR: #5826
- Adding new features and speed up for multi-speaker data simulator by @tango4j :: PR: #5846
- Add Esperanto ASR example by @andrusenkoau :: PR: #5772
- Fix memory allocation of NeMo Multi-speaker Data Simulator by @stevehuang52 :: PR: #5864
- [ASR] Separate Audio-to-Text (BPE, Char) dataset construction by @artbataev :: PR: #5774
- Reduce memory usage in getMultiScaleCosAffinityMatrix function by @gabitza-tech :: PR: #5876
- Hybrid ASR-TTS models by @artbataev :: PR: #5659
- Set providers for onnxruntime inference session by @athitten :: PR: #5903
- [ASR] Configurable metrics for audio-to-audio + removed experimental decorators by @anteju :: PR: #5827
- Correct doc for RNNT transcribe() function by @titu1994 :: PR: #5904
- Update isort to the latest version by @artbataev :: PR: #5895
- FilterbankFeaturesTA to match FilterbankFeatures by @msis :: PR: #5913
- Fix hybridasr bug by @VahidooX :: PR: #5950
- replace symbols by @nithinraok :: PR: #5974
- fast conformer configs and doc by @bmwshop :: PR: #5970
- Update TitaNet-L and MSDD models by @nithinraok :: PR: #6023
- Fix enhancer usage by @artbataev :: PR: #6059
- update librosa args by @nithinraok :: PR: #6086
- Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
- Fix k2 and torchaudio installation (Docker, macOS). Cherry-pick (#6094) by @artbataev :: PR: #6124
TTS
Changelog
- [TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
- [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
- No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
- Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
- Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
- Update radtts' infer path by @blisc :: PR: #5788
- [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
- [TTS] porting VITS implementation by @treacker :: PR: #5600
- [TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
- [TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
- TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
- Remove MCD_DTW tarball by @redoctopus :: PR: #5889
- Hybrid ASR-TTS models by @artbataev :: PR: #5659
- Moved eval notebook data to aws by @redoctopus :: PR: #5911
- [G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
- [G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
- fix links, add missing file by @ekmb :: PR: #6044
- [TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
- [TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
- [TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
- Fix enhancer usage by @artbataev :: PR: #6059
- [TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
- Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
- [TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
- [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
NLP / NMT
Changelog
- Fix P-Tuning Truncation by @vadam5 :: PR: #5663
- Adithyare/prompt learning seed by @arendu :: PR: #5749
- Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
- Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
- add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
- remove transformer version upper bound by @Zhilin123 :: PR: #5831
- Adithyare/adapter new placement by @arendu :: PR: #5791
- Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
- validation batch sizing and drop_last controls by @arendu :: PR: #5830
- Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
- Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
- RETRO model finetuning by @yidong72 :: PR: #5800
- Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
- Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
- set max_steps for lr decay through config by @anmolgupt :: PR: #5780
- Fix Prompt text space issue by @aklife97 :: PR: #5983
- Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091
NeMo Tools
Changelog
- [Tools] NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
- [Tools] Fix ctc segmentation: exclude audacity files by @ekmb :: PR: #6009
Export
Changelog
General Improvements
Changelog
- Pin lightning version less than 1.9.0 by @SeanNaren :: PR: #5822
- Davidm/cherrypick r1.16.0 by @Davood-M :: PR: #6082
- Update files for lightning 1.9.0 by @SeanNaren :: PR: #5823
- Tn doc 16 by @yzhang123 :: PR: #5954
- Ensure EMA checkpoints are also deleted when normal checkpoints are by @SeanNaren :: PR: #5724
- [Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
- Fix EMA topk checkpoint deletion by @SeanNaren :: PR: #5758
- [BugFix] decoder timestamp count has a mismatch when is decoded by @tango4j :: PR: #5825
- Update 00_NeMo_Primer.ipynb by @schaltung :: PR: #5740
- Sanitize params before DLLogger log_hyperparams by @milesial :: PR: #5736
- NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
- Add EMA Docs, fix common collection documentation by @SeanNaren :: PR: #5757
- Add container info to main page by @fayejf :: PR: #5816
- CommonVoice support for script by @SeanNaren :: PR: #5797
- Support nested NeMo models by @artbataev :: PR: #5671
- fix max len generation t5 by @ekmb :: PR: #5852
- NFA samples fix by @erastorgueva-nv :: PR: #5856
- fix(readme): fix typo by @jqueguiner :: PR: #5883
- Block large files from being merged into NeMo main by @SeanNaren :: PR: #5898
- Pin isort version by @artbataev :: PR: #5914
- fixed missing long_description_content_type by @XuesongYang :: PR: #5909
- Update container to 23.01 by @ericharper :: PR: #5917
- remove conda pynini install by @ekmb :: PR: #5921
- Update align.py by @Slyne :: PR: #6043
- Fixing data simulator argument and bash scripting error by @tango4j :: PR: #6112
- Update apex commit by @ericharper :: PR: #6148