Highlights

NeMo ASR

Online Clustering Diarizer
High Level Diarization API
PyCTC Decode Beam Search Support
RNNT Beam Search Alignment Extraction
InterCTC Loss
AIStore Documentation
ASR & AWS Multi-node Integration
Convolution Invariant SDR losses

NeMo TTS

NeMo Megatron

SqaredReLU, SwiGLU, No-Dropout
Rotary Position Embedding
Untie word embeddings and output projection

NeMo Core

Dynamic freezing of modules during training
NeMo Multi-Run Documentation
ClearML Logging
Early Stopping
Experiment Manager Docs Update

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.02

Detailed Changelogs

ASR

Changelog

Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
Use module-based k2 import guard by @artbataev :: PR: #6006
Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
Convert esperanto into a notebook by @SeanNaren :: PR: #6070
[ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
[ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
Add file class based inference API for diarization by @SeanNaren :: PR: #5945
Ngram by @karpnv :: PR: #6063
remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
Streaming conformer CTC export by @messiaen :: PR: #5837
[TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
ASR Beam search documentation by @titu1994 :: PR: #6244

TTS

Changelog

[TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
[TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
[TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
Added list_available_models by @treacker :: PR: #5967
Update Fastpitch energy bug by @blisc :: PR: #5969
removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
ONNX export for RadTTS by @borisfom :: PR: #5880
Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
Vits doc by @treacker :: PR: #5989
Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
[TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
[TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
[TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
[TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
[TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
[TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
[TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
[TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
[TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
[TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
[TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
[TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
[TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
[TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155

NLP / NMT

Changelog

add new lannguages to doc by @yzhang123 :: PR: #5939
Distributed Adam optimizer overlaps param all-gather with forward compute by @timmoon10 :: PR: #5684
Refactor the retrieval services for microservice architecture by @yidong72 :: PR: #5910
make validation accuracy reporting optional for adapters/ptuning by @arendu :: PR: #5843
Add BERT support for overlapping forward compute with distopt communication by @timmoon10 :: PR: #6024
[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
adding early stop callback to ptuning by @arendu :: PR: #6028
Pr doc tn by @yzhang123 :: PR: #6041
Adds several configurable flags for Megatron GPT models by @MaximumEntropy :: PR: #5991
P-tuning refactor Part 1/N by @arendu :: PR: #6054
Fast glu activations by @MaximumEntropy :: PR: #6058
P-tuning refactor Part 2/N by @arendu :: PR: #6056
P-tuning refactor Part 3/N by @arendu :: PR: #6106
Explicitly check for united embeddings when logging params by @MaximumEntropy :: PR: #6085
Add flag to get attention from fusion by @ericharper :: PR: #6049
Improving text memmap generated index files error messages by @michalivne :: PR: #6093
Megatron Encoder-Decoder Sampler Function by @michalivne :: PR: #6095
Sentence piece legacy false compatibility by @arendu :: PR: #6154
convert Megatron LM ckpt to NeMo PP support. by @yidong72 :: PR: #6159
Avoid multiple warnings for loss mask by @mikolajblaz :: PR: #6062
Propagate LayerNorm1P to TE by @mikolajblaz :: PR: #6061
Filter p-tuning by example length by @arendu :: PR: #6182
Add sequence parallel support to Rope positional embedding by @yidong72 :: PR: #6178
Use a separate communicator for DP AMAX reduction by @erhoo82 :: PR: #6022
Add persistent workers to GPT by @ericharper :: PR: #6205
Micro batch loader for bert model by @shanmugamr1992 :: PR: #6046
GPT P tuning Eval changes (#5952) by @aklife97 :: PR: #6272
add template for taskname=taskname by @Zhilin123 :: PR: #6283
added RPE + fixed RMSNorm by @Davood-M :: PR: #6304
simplified notebook for p-tuning by @arendu :: PR: #6326
Added num decoder blocks in megatron export by @Davood-M :: PR: #6331

Text Normalization / Inverse Text Normalization

Changelog

[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982

Export

Changelog

ONNX export for RadTTS by @borisfom :: PR: #5880
Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
Update docs for ExpManager and Exportable frameworks by @titu1994 :: PR: #6165
Streaming conformer CTC export by @messiaen :: PR: #5837
MixedFusedRMSNorm Export Fix by @Davood-M :: PR: #6296
Added num decoder blocks in megatron export by @Davood-M :: PR: #6331

Bugfixes

Changelog

Fix bug where GPT always enabled distopt overlapped param sync by @timmoon10 :: PR: #5995
CS bugfix by @bmwshop :: PR: #6122
RNNT patch by @titu1994 :: PR: #6231
Notebook fixes by @titu1994 :: PR: #6212
Small fixes for flashlight decoder by @trias702 :: PR: #6071
Various fixes in docs and RNNT by @titu1994 :: PR: #6156
Fix k2 and torchaudio installation (Docker, macOS) by @artbataev :: PR: #6094
update and deprecate warning for Mic notebook by @fayejf :: PR: #6307
small bugfix and add asr evaluator to doc by @fayejf :: PR: #6229
Bug fixing for bucketing dataset by @VahidooX :: PR: #6191
Fix character beam decoding algorithm with vocab index map by @titu1994 :: PR: #6140
fix typo in asr evaluator readme by @fayejf :: PR: #6053
Fix typos by @titu1994 :: PR: #6241
[ASR]:fixed augmentor arguments for transcribe functionality of Hybrid CTC-RNNT model by @KunalDhawan :: PR: #6290
Fix hybrid transcribe by @ArtyomZemlyak :: PR: #6003
Fix buckeing seeding by @VahidooX :: PR: #6254
Fix for CTC decoder setup by @vsl9 :: PR: #6303
Fix RNNT Joint narrow() by @titu1994 :: PR: #6336
Fix bugs with interctc mixin by @Kipok :: PR: #6228
Update IPA dict path in tutorial by @redoctopus :: PR: #6208
[TTS] fix broken tutorial for Tacotron2 by @XuesongYang :: PR: #6199
[TTS] fix bugs for chinese and german tutorials. by @XuesongYang :: PR: #6216
Fix radtts sort r17 by @borisfom :: PR: #6344
Quick Fix for RadTTS test by @blisc :: PR: #6034
Disabling radtts tests untin we have real model by @borisfom :: PR: #6036
fix val loss computation in megatron by @anmolgupt :: PR: #5871
Fix incomplete batches by @mikolajblaz :: PR: #6083
Avoid unnecessarily accessing data loader with pipeline parallelism by @timmoon10 :: PR: #6164
bugfix: file handlers are not closed. by @XuesongYang :: PR: #5956
Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator by @stevehuang52 :: PR: #5897
Fix Windows bug with save_restore_connector by @trias702 :: PR: #5919
fix broken link by @ericharper :: PR: #5968
Fix torchaudio installation by @artbataev :: PR: #5850
Fix reinstall.sh dependencies by @titu1994 :: PR: #6027
Adding changes to fix the mv error by @tango4j :: PR: #6087
Fix README by @flx42 :: PR: #6137
Fix typos in voiceapp notebook by @titu1994 :: PR: #6262
[BugFix] Fix diarization result path errors in tutorial notebook for r1.17.0 by @tango4j :: PR: #6234
[BugFix] Fix the wrong branch name in speaker diarization inference notebook by @tango4j :: PR: #6301

General Improvements

Changelog

Dynamic freezing in Nemo by @trias702 :: PR: #5879
Move settings to . Remove deprecated by @artbataev :: PR: #5947
update container info in readme by @fayejf :: PR: #5981
Update PUBLICATIONS.md by @titu1994 :: PR: #5963
[G2P] backward compatibility for english tokenizer and bugfix by @github-actions[bot] :: PR: #5984
replace symbols by @github-actions[bot] :: PR: #5990
correct bash style according to SC2236. by @XuesongYang :: PR: #6025
Update align.py by @github-actions[bot] :: PR: #6045
Add Customization Dataset Preparation Tool by @Zhilin123 :: PR: #6029
Updated data simulator config part in Speaker_Diarization_Training.ipynb by @tango4j :: PR: #6072
Add citation by @ericharper :: PR: #6077
[TTS] Spectrogram Enhancer: correct dim for length when loading data by @github-actions[bot] :: PR: #6074
Add ClearML Logging by @ArtyomZemlyak :: PR: #6014
update readme with new badges by @XuesongYang :: PR: #6110
[CI] Set readthedocs python version to 3.8 by @SeanNaren :: PR: #6079
Update dataset preparation tool to fix bug relating to non jsonl input file by @Zhilin123 :: PR: #6147
update finetune configs by @nithinraok :: PR: #6152
Added ckpt to nemo for T5/T0 models by @Davood-M :: PR: #6141
Save model parallel .nemo in ExpManager by @arendu :: PR: #6115
Upgrade setuptools by @fayejf :: PR: #6163
Update container version in main readme by @fayejf :: PR: #6171
metric update by @arendu :: PR: #6169
Upgrade base container to PyTorch 23.02 by @ericharper :: PR: #6162
Link to nm launcher by @ericharper :: PR: #6226
Make AIS CLI installation optional by @anteju :: PR: #6314
remove pinned numba version in Dockerfile by @fayejf :: PR: #6341
Cherry-pick recent distopt commits by @timmoon10 :: PR: #6343
Update readme by @ericharper :: PR: #6363

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.17.0

Highlights

NeMo ASR

NeMo TTS

NeMo Megatron

NeMo Core

Container

Detailed Changelogs

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

Export

Bugfixes

General Improvements

Contributors