Skip to content

Releases: NVIDIA/NeMo

NVIDIA Neural Modules 1.20.0

04 Aug 19:50
2baef81
Compare
Choose a tag to compare

Highlights

Models

NeMo ASR

  • Graph-RNN-T #6168
  • WildCard-RNN-T #6168
  • Confidence Ensembles for ASR
  • Token-and-Duration Transducer (TDT) #6536
  • Spellchecking ASR #6179
  • Numba FP16 RNNT Loss #6991

NeMo TTS

  • TTS Adapter Customization
  • TTS Dataloader Framework

NeMo Framework

  • LoRA for T5 and mT5 #6612
  • Flash Attention integration #6666
  • Mosaic 7B compatibility
  • Models with LongContext (32K) #6666, #6687, #6773

NeMo Tools

  • Speech Data Explorer: Utterance level ASR model comparsion #6669
  • Speech Data Processor: Spanish P&C
  • NeMo Forced Aligner: Large sequence alignment + memory reduction #6695

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.06

Detailed Changelogs

ASR

Changelog

TTS

Changelog
  • [TTS] Add callback for saving audio during FastPitch training by @rlangman :: PR: #6665
  • [TTS] Add script for text preprocessing by @rlangman :: PR: #6541
  • [TTS] Fix adapter duration issue by @hsiehjackson :: PR: #6697
  • [TTS] Filter out silent audio files during preprocessing by @rlangman :: PR: #6716
  • [TTS] fix inconsistent type hints for IpaG2p by @XuesongYang :: PR: #6733
  • [TTS] relax hardcoded prefix for phonemes and tones and infer phoneme set through dict by @XuesongYang :: PR: #6735
  • [TTS] corrected misleading deprecation warnings. by @XuesongYang :: PR: #6702
  • Fix TTS adapter tutorial by @hsiehjackson :: PR: #6741
  • [TTS][zh] refine hardcoded lowercase for ASCII letters. by @XuesongYang :: PR: #6781
  • [TTS] Append pretrained FastPitch & SpectrogamEnhancer pair to available models by @racoiaws :: PR: #7012

NLP / NMT

Changelog

NeMo Tools

Changelog

Bugfixes

Changelog

General Improvements

Changelog
Read more

NVIDIA Neural Modules 1.19.1

13 Jul 20:42
Compare
Choose a tag to compare

This release is a small patch to fix torchmetrics.

  • Remove deprecated arg compute_on_step. See #6979.

NVIDIA Neural Modules 1.19.0

15 Jun 23:46
2331b06
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Sharded Manifests for Tarred Datasets #6395
  • Frame-VAD model + datasets support #6441
  • Noise Norm Perturbation #6445
  • Code Switched Dataset with IID Sampling #6448

NeMo TTS

NeMo Megatron

  • Batch size rampup #6424
  • Unify dataset and model classes for all PEFT #6391
  • LoRA for GPT #6391
  • Convert interleaved pipeline model to non-interleaved #6498
  • Dialog Dataset for SFT #6654
  • Dynamic length batches for GPT SFT #6510
  • Merge LoRA weights into base model #6597

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.04

Detailed Changelogs

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Bugfixes

Changelog

General Improvements

Changelog

NVIDIA Neural Modules 1.18.1

17 May 19:09
Compare
Choose a tag to compare

Highlights

For the complete release note, please see NeMo 1.18.0 Release Notes

Bugfix

This patch release fixes a major bug in ASR Bucketing datasets that was introduced in r1.17.0 in PR #6191. Due to this bug, while each bucket is randomly shuffled before selection on each rank, only a single bucket would loop infinitely - without continuing onto subsequent buckets.

Effect: Significantly worse WER would be obtained since not all buckets would be used.

This has been patched and should work correctly in 1.18.1 onwards.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.03

NVIDIA Neural Modules 1.18.0

12 May 17:49
Compare
Choose a tag to compare

Highlights

Models

NeMo ASR

  • Hybrid Autoregressive Transducer (HAT) #6260
  • Apple MPS Support for ASR Inference #6289
  • InterCTC Support for Hybrid ASR Models #6215
  • RNNT N-Gram Fusion with mAES algo #6118
  • ASR + Apple M2 CPU/GPU MPS #6289

NeMo TTS

  • TTS directory structure refactor
  • User-set symbol vocabulary #6172

NeMo Megatron

  • Model parallelism from Megatron Core #6393
  • Continued training for P-tuning #6273
  • SFT for GPT-3 #6210
  • Tensor and pipeline model parallel conversion #6218
  • Megatron NMT Export to Riva

NeMo Core

Detailed Changelogs

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Export

Changelog

Bugfixes

Changelog
  • Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
  • [BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
  • Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
  • [BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
  • Fixing bug in unsort_tensor by @borisfom :: PR: #6320
  • Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
  • Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568

General improvements

Changelog

NVIDIA Neural Modules 1.17.0

05 Apr 00:10
d3017e4
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Online Clustering Diarizer
  • High Level Diarization API
  • PyCTC Decode Beam Search Support
  • RNNT Beam Search Alignment Extraction
  • InterCTC Loss
  • AIStore Documentation
  • ASR & AWS Multi-node Integration
  • Convolution Invariant SDR losses

NeMo TTS

NeMo Megatron

  • SqaredReLU, SwiGLU, No-Dropout
  • Rotary Position Embedding
  • Untie word embeddings and output projection

NeMo Core

  • Dynamic freezing of modules during training
  • NeMo Multi-Run Documentation
  • ClearML Logging
  • Early Stopping
  • Experiment Manager Docs Update

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.02

Detailed Changelogs

ASR

Changelog
  • Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
  • Use module-based k2 import guard by @artbataev :: PR: #6006
  • Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
  • Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
  • Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
  • InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
  • Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
  • Convert esperanto into a notebook by @SeanNaren :: PR: #6070
  • [ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
  • [ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
  • Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
  • Add file class based inference API for diarization by @SeanNaren :: PR: #5945
  • Ngram by @karpnv :: PR: #6063
  • remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
  • Streaming conformer CTC export by @messiaen :: PR: #5837
  • [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
  • Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
  • ASR Beam search documentation by @titu1994 :: PR: #6244

TTS

Changelog
  • [TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
  • [TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
  • [TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
  • Added list_available_models by @treacker :: PR: #5967
  • Update Fastpitch energy bug by @blisc :: PR: #5969
  • removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
  • ONNX export for RadTTS by @borisfom :: PR: #5880
  • Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
  • Vits doc by @treacker :: PR: #5989
  • Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
  • Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
  • [TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
  • [TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
  • [TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
  • [TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
  • [TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
  • [TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
  • [TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
  • [TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
  • [TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
  • [TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
  • [TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
  • remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
  • [TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
  • [TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
  • [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982

Export

Changelog

Bugfixes

Changelog
Read more

NVIDIA Neural Modules 1.16.0

08 Mar 04:35
1631118
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • ASR Evaluator
  • Multi-channel dereverberation algorithm
  • Hybrid ASR-TTS Models
  • Flashlight Decoder Beam Search
  • FastConformer Encoder with 8x subsampling

NeMo TTS

  • SSL Voice Conversion
  • Spectrogram Enhancer
  • VITS

NeMo Megatron

  • Per microbatch dataloader for GPT and BERT
  • Adapters compatible with Faster Transformer

NeMo Core

  • Nested model support

NeMo Tools

  • NeMo Forced Aligner

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.01

ASR

Changelog

TTS

Changelog
  • [TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
  • [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
  • No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
  • Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
  • Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
  • Update radtts' infer path by @blisc :: PR: #5788
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
  • [TTS] porting VITS implementation by @treacker :: PR: #5600
  • [TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
  • [TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
  • TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
  • Remove MCD_DTW tarball by @redoctopus :: PR: #5889
  • Hybrid ASR-TTS models by @artbataev :: PR: #5659
  • Moved eval notebook data to aws by @redoctopus :: PR: #5911
  • [G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
  • [G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
  • fix links, add missing file by @ekmb :: PR: #6044
  • [TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
  • [TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
  • [TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
  • Fix enhancer usage by @artbataev :: PR: #6059
  • [TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
  • Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
  • [TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805

NLP / NMT

Changelog
  • Fix P-Tuning Truncation by @vadam5 :: PR: #5663
  • Adithyare/prompt learning seed by @arendu :: PR: #5749
  • Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
  • Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
  • add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
  • remove transformer version upper bound by @Zhilin123 :: PR: #5831
  • Adithyare/adapter new placement by @arendu :: PR: #5791
  • Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
  • validation batch sizing and drop_last controls by @arendu :: PR: #5830
  • Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
  • Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
  • RETRO model finetuning by @yidong72 :: PR: #5800
  • Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
  • Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
  • set max_steps for lr decay through config by @anmolgupt :: PR: #5780
  • Fix Prompt text space issue by @aklife97 :: PR: #5983
  • Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091

NeMo Tools

Changelog

Export

Changelog

General Improvements

Changelog

NVIDIA Neural Modules 1.15.0

02 Feb 00:49
8c785ec
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • HybridTransducer-CTC ASR
  • Greedy timestamp decoding with inference script
  • MHA adapters
  • Conformer local attention (longformer)
  • High level beam search API
  • Multiblank transducer
  • Multi-channel audio processing model
  • AIstore for ASR datasets

NeMo Megatron

  • ALiBi position embeddings support for T5

NeMo TTS

  • Chinese TTS pipeline with polyphone disambiguation

NeMo Core

  • Optimizer based EMA
  • MLFlow logger support

Models

  • stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
  • stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.12

ASR

Changelog

TTS

Changelog
  • Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
  • [TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
  • [TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
  • Fixed RadTTS unit test by @borisfom :: PR: #5572
  • [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
  • Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
  • [TTS] add tts dict cust notebook by @ekmb :: PR: #5662
  • [TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
  • typo and link fixed by @ekmb :: PR: #5741
  • link fixed by @ekmb :: PR: #5745
  • Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
  • Docs g2p update by @ekmb :: PR: #5769
  • [TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776

NLP / NMT

Changelog

Export

Changelog
  • Add keep_initializers_as_inputs to _export method by @pks :: PR: #5731
  • Megatron export triton update by @Davood-M :: PR: #5766

General Improvements

Changelog

NVIDIA Neural Modules 1.14.0

24 Dec 02:49
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Hybrid CTC + Transducer loss ASR #5364
  • Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
  • ASR Adapters hyper parameter search scripts #5159
  • RNNT {ONNX, TorchScript} x GPU export infer #5248
  • Exportable MelSpectrogram (TorchScript) #5512
  • Audio To Audio Dataset Processor #5196
  • Multi Channel Audio Transcription #5479
  • Silence Augmentation #5476

NeMo Megatron

  • Support for the Mixture of Experts for T5
  • Fix PTL model size output for GPT-3 and BERT
  • BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

  • Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog
  • [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
  • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
  • Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
  • Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
  • Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
  • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
  • Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
  • Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
  • Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
  • Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
  • [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
  • Add Silence Augmentation by @fayejf :: PR: #5476
  • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
  • add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
  • [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
  • Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
  • Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
  • Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
  • Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
  • Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
  • Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
  • [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog
  • [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
  • [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
  • [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
  • [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
  • [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
  • [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
  • Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
  • [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
  • [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
  • [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
  • [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
  • [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
  • [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
  • [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
  • [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
  • [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
  • [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
  • JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
  • [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
  • TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
  • [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
  • [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
  • [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
  • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
  • [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
  • [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
  • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

General Improvements

Changelog
Read more

NVIDIA Neural Modules 1.13.0

07 Dec 21:14
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
  • Support for codeswitched manifests during training
  • Support for Language ID during inference for ML models
  • Support of cache-aware streaming for offline models
  • Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

  • Interleaved Pipeline schedule
  • Transformer Engine for GPT
  • HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
  • IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
  • Pipeline Parallel Support for T5 Prompt Learning
  • MegatronNMT export

NeMo TTS

  • TTS introductory tutorial
  • Phonemizer/espeak removal (Spanish/German)
  • Char-only support for Spanish/German models
  • Documentation Refactor

NeMo Core

  • Upgrade to NGC PyTorch 22.09 container
  • Add pre-commit hooks
  • Exponential moving average (EMA) of weights during training

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues
  • pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

NeMo Tools

Changelog

Export

Changelog
  • Fix export bug by @VahidooX :: PR: #5009
  • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
  • Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
  • Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
  • Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
  • replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
  • Megatron Export Update by @Davood-M :: PR: #5343
  • Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
  • export_utils bugfix by @Davood-M :: PR: #5480
  • Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog
Read more