Highlights

NeMo ASR

ASR Evaluator
Multi-channel dereverberation algorithm
Hybrid ASR-TTS Models
Flashlight Decoder Beam Search
FastConformer Encoder with 8x subsampling

NeMo TTS

SSL Voice Conversion
Spectrogram Enhancer
VITS

NeMo Megatron

Per microbatch dataloader for GPT and BERT
Adapters compatible with Faster Transformer

NeMo Core

Nested model support

NeMo Tools

NeMo Forced Aligner

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.01

ASR

Changelog

Fix for incorrect computation of batched alignment in transducers by @Kipok :: PR: #5692
Set the stream position to 0 for pydub by @jonghwanhyeon :: PR: #5752
[Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
ASR evaluator by @fayejf :: PR: #5728
[ASR][Test] Enable test for cache audio with a single worker by @anteju :: PR: #5763
Flashlight Decoder for Nemo by @trias702 :: PR: #5790
Fix data simulator by @stevehuang52 :: PR: #5813
[ASR] Mask-based dereverb algorithm by @anteju :: PR: #5693
Concat dataset and aistore support for label models by @Kipok :: PR: #5826
Adding new features and speed up for multi-speaker data simulator by @tango4j :: PR: #5846
Add Esperanto ASR example by @andrusenkoau :: PR: #5772
Fix memory allocation of NeMo Multi-speaker Data Simulator by @stevehuang52 :: PR: #5864
[ASR] Separate Audio-to-Text (BPE, Char) dataset construction by @artbataev :: PR: #5774
Reduce memory usage in getMultiScaleCosAffinityMatrix function by @gabitza-tech :: PR: #5876
Hybrid ASR-TTS models by @artbataev :: PR: #5659
Set providers for onnxruntime inference session by @athitten :: PR: #5903
[ASR] Configurable metrics for audio-to-audio + removed experimental decorators by @anteju :: PR: #5827
Correct doc for RNNT transcribe() function by @titu1994 :: PR: #5904
Update isort to the latest version by @artbataev :: PR: #5895
FilterbankFeaturesTA to match FilterbankFeatures by @msis :: PR: #5913
Fix hybridasr bug by @VahidooX :: PR: #5950
replace symbols by @nithinraok :: PR: #5974
fast conformer configs and doc by @bmwshop :: PR: #5970
Update TitaNet-L and MSDD models by @nithinraok :: PR: #6023
Fix enhancer usage by @artbataev :: PR: #6059
update librosa args by @nithinraok :: PR: #6086
Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
Fix k2 and torchaudio installation (Docker, macOS). Cherry-pick (#6094) by @artbataev :: PR: #6124

TTS

Changelog

[TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
[TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
Update radtts' infer path by @blisc :: PR: #5788
[TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
[TTS] porting VITS implementation by @treacker :: PR: #5600
[TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
[TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
Remove MCD_DTW tarball by @redoctopus :: PR: #5889
Hybrid ASR-TTS models by @artbataev :: PR: #5659
Moved eval notebook data to aws by @redoctopus :: PR: #5911
[G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
[G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
fix links, add missing file by @ekmb :: PR: #6044
[TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
[TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
[TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
Fix enhancer usage by @artbataev :: PR: #6059
[TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
[TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
[TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805

NLP / NMT

Changelog

Fix P-Tuning Truncation by @vadam5 :: PR: #5663
Adithyare/prompt learning seed by @arendu :: PR: #5749
Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
remove transformer version upper bound by @Zhilin123 :: PR: #5831
Adithyare/adapter new placement by @arendu :: PR: #5791
Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
validation batch sizing and drop_last controls by @arendu :: PR: #5830
Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
RETRO model finetuning by @yidong72 :: PR: #5800
Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
set max_steps for lr decay through config by @anmolgupt :: PR: #5780
Fix Prompt text space issue by @aklife97 :: PR: #5983
Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091

NeMo Tools

Changelog

[Tools] NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
[Tools] Fix ctc segmentation: exclude audacity files by @ekmb :: PR: #6009

Export

Changelog

No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
Set providers for onnxruntime inference session by @athitten :: PR: #5903
Add segmentation export to Audacity label file by @Ca-ressemble-a-du-fake :: PR: #5857

General Improvements

Changelog

Pin lightning version less than 1.9.0 by @SeanNaren :: PR: #5822
Davidm/cherrypick r1.16.0 by @Davood-M :: PR: #6082
Update files for lightning 1.9.0 by @SeanNaren :: PR: #5823
Tn doc 16 by @yzhang123 :: PR: #5954
Ensure EMA checkpoints are also deleted when normal checkpoints are by @SeanNaren :: PR: #5724
[Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
Fix EMA topk checkpoint deletion by @SeanNaren :: PR: #5758
[BugFix] decoder timestamp count has a mismatch when is decoded by @tango4j :: PR: #5825
Update 00_NeMo_Primer.ipynb by @schaltung :: PR: #5740
Sanitize params before DLLogger log_hyperparams by @milesial :: PR: #5736
NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
Add EMA Docs, fix common collection documentation by @SeanNaren :: PR: #5757
Add container info to main page by @fayejf :: PR: #5816
CommonVoice support for script by @SeanNaren :: PR: #5797
Support nested NeMo models by @artbataev :: PR: #5671
fix max len generation t5 by @ekmb :: PR: #5852
NFA samples fix by @erastorgueva-nv :: PR: #5856
fix(readme): fix typo by @jqueguiner :: PR: #5883
Block large files from being merged into NeMo main by @SeanNaren :: PR: #5898
Pin isort version by @artbataev :: PR: #5914
fixed missing long_description_content_type by @XuesongYang :: PR: #5909
Update container to 23.01 by @ericharper :: PR: #5917
remove conda pynini install by @ekmb :: PR: #5921
Update align.py by @Slyne :: PR: #6043
Fixing data simulator argument and bash scripting error by @tango4j :: PR: #6112
Update apex commit by @ericharper :: PR: #6148

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.16.0

Highlights

NeMo ASR

NeMo TTS

NeMo Megatron

NeMo Core

NeMo Tools

Container

ASR

TTS

NLP / NMT

NeMo Tools

Export

General Improvements

Contributors