NVIDIA Neural Modules 1.17.0
Highlights
NeMo ASR
- Online Clustering Diarizer
- High Level Diarization API
- PyCTC Decode Beam Search Support
- RNNT Beam Search Alignment Extraction
- InterCTC Loss
- AIStore Documentation
- ASR & AWS Multi-node Integration
- Convolution Invariant SDR losses
NeMo TTS
NeMo Megatron
- SqaredReLU, SwiGLU, No-Dropout
- Rotary Position Embedding
- Untie word embeddings and output projection
NeMo Core
- Dynamic freezing of modules during training
- NeMo Multi-Run Documentation
- ClearML Logging
- Early Stopping
- Experiment Manager Docs Update
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.02
Detailed Changelogs
ASR
Changelog
- Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
- Use module-based k2 import guard by @artbataev :: PR: #6006
- Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
- Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
- Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
- InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
- Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
- Convert esperanto into a notebook by @SeanNaren :: PR: #6070
- [ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
- [ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
- Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
- Add file class based inference API for diarization by @SeanNaren :: PR: #5945
- Ngram by @karpnv :: PR: #6063
- remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
- Streaming conformer CTC export by @messiaen :: PR: #5837
- [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
- Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
- ASR Beam search documentation by @titu1994 :: PR: #6244
TTS
Changelog
- [TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
- [TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
- [TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
- Added list_available_models by @treacker :: PR: #5967
- Update Fastpitch energy bug by @blisc :: PR: #5969
- removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
- ONNX export for RadTTS by @borisfom :: PR: #5880
- Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
- Vits doc by @treacker :: PR: #5989
- Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
- Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
- [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
- [TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
- [TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
- [TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
- [TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
- [TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
- [TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
- [TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
- [TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
- [TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
- [TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
- [TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
- remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
- [TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
- [TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
- [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
NLP / NMT
Changelog
- add new lannguages to doc by @yzhang123 :: PR: #5939
- Distributed Adam optimizer overlaps param all-gather with forward compute by @timmoon10 :: PR: #5684
- Refactor the retrieval services for microservice architecture by @yidong72 :: PR: #5910
- make validation accuracy reporting optional for adapters/ptuning by @arendu :: PR: #5843
- Add BERT support for overlapping forward compute with distopt communication by @timmoon10 :: PR: #6024
- [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
- adding early stop callback to ptuning by @arendu :: PR: #6028
- Pr doc tn by @yzhang123 :: PR: #6041
- Adds several configurable flags for Megatron GPT models by @MaximumEntropy :: PR: #5991
- P-tuning refactor Part 1/N by @arendu :: PR: #6054
- Fast glu activations by @MaximumEntropy :: PR: #6058
- P-tuning refactor Part 2/N by @arendu :: PR: #6056
- P-tuning refactor Part 3/N by @arendu :: PR: #6106
- Explicitly check for united embeddings when logging params by @MaximumEntropy :: PR: #6085
- Add flag to get attention from fusion by @ericharper :: PR: #6049
- Improving text memmap generated index files error messages by @michalivne :: PR: #6093
- Megatron Encoder-Decoder Sampler Function by @michalivne :: PR: #6095
- Sentence piece legacy false compatibility by @arendu :: PR: #6154
- convert Megatron LM ckpt to NeMo PP support. by @yidong72 :: PR: #6159
- Avoid multiple warnings for loss mask by @mikolajblaz :: PR: #6062
- Propagate LayerNorm1P to TE by @mikolajblaz :: PR: #6061
- Filter p-tuning by example length by @arendu :: PR: #6182
- Add sequence parallel support to Rope positional embedding by @yidong72 :: PR: #6178
- Use a separate communicator for DP AMAX reduction by @erhoo82 :: PR: #6022
- Add persistent workers to GPT by @ericharper :: PR: #6205
- Micro batch loader for bert model by @shanmugamr1992 :: PR: #6046
- GPT P tuning Eval changes (#5952) by @aklife97 :: PR: #6272
- add template for taskname=taskname by @Zhilin123 :: PR: #6283
- added RPE + fixed RMSNorm by @Davood-M :: PR: #6304
- simplified notebook for p-tuning by @arendu :: PR: #6326
- Added num decoder blocks in megatron export by @Davood-M :: PR: #6331
Text Normalization / Inverse Text Normalization
Export
Changelog
- ONNX export for RadTTS by @borisfom :: PR: #5880
- Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
- Update docs for ExpManager and Exportable frameworks by @titu1994 :: PR: #6165
- Streaming conformer CTC export by @messiaen :: PR: #5837
- MixedFusedRMSNorm Export Fix by @Davood-M :: PR: #6296
- Added num decoder blocks in megatron export by @Davood-M :: PR: #6331
Bugfixes
Changelog
- Fix bug where GPT always enabled distopt overlapped param sync by @timmoon10 :: PR: #5995
- CS bugfix by @bmwshop :: PR: #6122
- RNNT patch by @titu1994 :: PR: #6231
- Notebook fixes by @titu1994 :: PR: #6212
- Small fixes for flashlight decoder by @trias702 :: PR: #6071
- Various fixes in docs and RNNT by @titu1994 :: PR: #6156
- Fix k2 and torchaudio installation (Docker, macOS) by @artbataev :: PR: #6094
- update and deprecate warning for Mic notebook by @fayejf :: PR: #6307
- small bugfix and add asr evaluator to doc by @fayejf :: PR: #6229
- Bug fixing for bucketing dataset by @VahidooX :: PR: #6191
- Fix character beam decoding algorithm with vocab index map by @titu1994 :: PR: #6140
- fix typo in asr evaluator readme by @fayejf :: PR: #6053
- Fix typos by @titu1994 :: PR: #6241
- [ASR]:fixed augmentor arguments for transcribe functionality of Hybrid CTC-RNNT model by @KunalDhawan :: PR: #6290
- Fix hybrid transcribe by @ArtyomZemlyak :: PR: #6003
- Fix buckeing seeding by @VahidooX :: PR: #6254
- Fix for CTC decoder setup by @vsl9 :: PR: #6303
- Fix RNNT Joint narrow() by @titu1994 :: PR: #6336
- Fix bugs with interctc mixin by @Kipok :: PR: #6228
- Update IPA dict path in tutorial by @redoctopus :: PR: #6208
- [TTS] fix broken tutorial for Tacotron2 by @XuesongYang :: PR: #6199
- [TTS] fix bugs for chinese and german tutorials. by @XuesongYang :: PR: #6216
- Fix radtts sort r17 by @borisfom :: PR: #6344
- Quick Fix for RadTTS test by @blisc :: PR: #6034
- Disabling radtts tests untin we have real model by @borisfom :: PR: #6036
- fix val loss computation in megatron by @anmolgupt :: PR: #5871
- Fix incomplete batches by @mikolajblaz :: PR: #6083
- Avoid unnecessarily accessing data loader with pipeline parallelism by @timmoon10 :: PR: #6164
- bugfix: file handlers are not closed. by @XuesongYang :: PR: #5956
- Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator by @stevehuang52 :: PR: #5897
- Fix Windows bug with save_restore_connector by @trias702 :: PR: #5919
- fix broken link by @ericharper :: PR: #5968
- Fix torchaudio installation by @artbataev :: PR: #5850
- Fix reinstall.sh dependencies by @titu1994 :: PR: #6027
- Adding changes to fix the mv error by @tango4j :: PR: #6087
- Fix README by @flx42 :: PR: #6137
- Fix typos in voiceapp notebook by @titu1994 :: PR: #6262
- [BugFix] Fix diarization result path errors in tutorial notebook for r1.17.0 by @tango4j :: PR: #6234
- [BugFix] Fix the wrong branch name in speaker diarization inference notebook by @tango4j :: PR: #6301
General Improvements
Changelog
- Dynamic freezing in Nemo by @trias702 :: PR: #5879
- Move settings to . Remove deprecated by @artbataev :: PR: #5947
- update container info in readme by @fayejf :: PR: #5981
- Update PUBLICATIONS.md by @titu1994 :: PR: #5963
- [G2P] backward compatibility for english tokenizer and bugfix by @github-actions[bot] :: PR: #5984
- replace symbols by @github-actions[bot] :: PR: #5990
- correct bash style according to SC2236. by @XuesongYang :: PR: #6025
- Update align.py by @github-actions[bot] :: PR: #6045
- Add Customization Dataset Preparation Tool by @Zhilin123 :: PR: #6029
- Updated data simulator config part in Speaker_Diarization_Training.ipynb by @tango4j :: PR: #6072
- Add citation by @ericharper :: PR: #6077
- [TTS] Spectrogram Enhancer: correct dim for length when loading data by @github-actions[bot] :: PR: #6074
- Add ClearML Logging by @ArtyomZemlyak :: PR: #6014
- update readme with new badges by @XuesongYang :: PR: #6110
- [CI] Set readthedocs python version to 3.8 by @SeanNaren :: PR: #6079
- Update dataset preparation tool to fix bug relating to non jsonl input file by @Zhilin123 :: PR: #6147
- update finetune configs by @nithinraok :: PR: #6152
- Added ckpt to nemo for T5/T0 models by @Davood-M :: PR: #6141
- Save model parallel .nemo in ExpManager by @arendu :: PR: #6115
- Upgrade setuptools by @fayejf :: PR: #6163
- Update container version in main readme by @fayejf :: PR: #6171
- metric update by @arendu :: PR: #6169
- Upgrade base container to PyTorch 23.02 by @ericharper :: PR: #6162
- Link to nm launcher by @ericharper :: PR: #6226
- Make AIS CLI installation optional by @anteju :: PR: #6314
- remove pinned numba version in Dockerfile by @fayejf :: PR: #6341
- Cherry-pick recent distopt commits by @timmoon10 :: PR: #6343
- Update readme by @ericharper :: PR: #6363