-
Notifications
You must be signed in to change notification settings - Fork 414
Bilingual Japanese / English zipformer recipe (multi_ja_en) and MLS English recipe (mls_english) #2015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kinanmartin
wants to merge
145
commits into
k2-fsa:master
Choose a base branch
from
reazon-research:multi_ja_en_mls_english_clean
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Bilingual Japanese / English zipformer recipe (multi_ja_en) and MLS English recipe (mls_english) #2015
Changes from 1 commit
Commits
Show all changes
145 commits
Select commit
Hold shift + click to select a range
28f6545
WIP v0 MLS English recipe
kinanmartin ac0c0ed
update prepare.sh, fix asr_datamodule.py
kinanmartin a1fc642
change default path
kinanmartin defc71b
replace file
kinanmartin efe015d
cleaned-up version of recipe
kinanmartin 8c1c710
symlink copied files to librispeech recipe dir
kinanmartin 8985259
separate transcript prep stage from bpe train stage
kinanmartin a34d34a
pre-commit hooks
kinanmartin ce44150
readme
kinanmartin 68e3cea
instead of on-the-fly features, precompute fbank and manifests in pre…
kinanmartin d6e3c98
move compute_fbank_mls_english.py, add validate_manifest.py, add shar…
kinanmartin 4ca8ee9
adjusted prepare.sh to only calculate fbank and manifest together; ad…
kinanmartin 59519a4
fix validation manifest name
kinanmartin f2e0171
fix stage 2 and 3
kinanmartin fa84782
optimize with num_jobs on save_audios
kinanmartin abebb6a
new version of multi_ja_en prepare.sh script which swaps Librispeech …
kinanmartin c83b115
add fbank
baileyeet 61e81bf
Revert "add fbank"
baileyeet 3751441
deprecate params.bilingual=0, replace ReazonSpeechAsrDataModule for M…
kinanmartin 6d71d9c
remove bilingual tag from train.py
baileyeet 5417e09
restore version of mls_english compute_fbank_mls_english.py and prepa…
kinanmartin 782e1fb
fix stage 5 output pathing
kinanmartin f4b2987
switch mls_english clone from https to ssh
kinanmartin a8ecb16
use huggingface_hub library to download mls_english
kinanmartin 3307836
Combined updates. Changed BBPE path structure, changed dataset path s…
kinanmartin 2f1c611
fix decode script data module usage
kinanmartin eafbd64
add utility file for updating the storage_path of cutsets for use in …
kinanmartin b167ac7
add utility file for creating subsets of mls english. must be fixed t…
kinanmartin ad1be22
Parametrize dev and test split sizes.
kinanmartin 78ee595
Add failsafe for MLS English dev set key alternate name as validation
kinanmartin fd3fbe6
Update README.md to reflect MLS English dataset
kinanmartin c77a847
add step 4: display manifest stats to mls_eng
baileyeet cdf246c
update manifest dir path
baileyeet f3e59df
add stage 6 - update cutset paths to prepare
baileyeet ddc2daa
remove commented out codels
baileyeet f6ad423
changes to train script - no need for limiting utterance length here
baileyeet 19b62c0
remove unused local scripts
baileyeet 5f2f684
make prepare.sh symlinks relative
kinanmartin 70a7940
changes to asr_datamodule for musan support
baileyeet df923f3
typos
baileyeet 5ec9389
commenting
baileyeet de35cc2
remove comment
baileyeet f51621b
resolve typos and import issues
baileyeet 4e92879
update musan path
baileyeet 093a035
update musan paths
baileyeet 0f700ed
update musan symlinks
baileyeet d5cc030
attempt to fix musan paths
baileyeet aee7b87
working changes for musan mixing
baileyeet 310aaec
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet 542620c
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet f7fec4a
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet 154ef43
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet 6012edb
black and isort formatting
baileyeet dc4db37
PR review suggestions implemented
baileyeet 9d93d63
Update RESULTS.md
baileyeet aed139f
Musan implementation for ReazonSpeech (#1988)
baileyeet dbd8977
Manually fix merge conflict in multi_ja_en/ASR/zipformer/train.py
kinanmartin 1c5d792
Validate generated manifest files. (#338)
csukuangfj c92c606
WIP v0 MLS English recipe
kinanmartin ba6d8e8
update prepare.sh, fix asr_datamodule.py
kinanmartin 0ab0274
change default path
kinanmartin 1b8a306
replace file
kinanmartin e76b749
cleaned-up version of recipe
kinanmartin 313afea
symlink copied files to librispeech recipe dir
kinanmartin c532a50
separate transcript prep stage from bpe train stage
kinanmartin 24db8c1
pre-commit hooks
kinanmartin 996334f
readme
kinanmartin fe88d1d
instead of on-the-fly features, precompute fbank and manifests in pre…
kinanmartin a8f45bc
move compute_fbank_mls_english.py, add validate_manifest.py, add shar…
kinanmartin eb2168b
adjusted prepare.sh to only calculate fbank and manifest together; ad…
kinanmartin 2504b23
fix validation manifest name
kinanmartin 73dea24
fix stage 2 and 3
kinanmartin 0e86ef8
optimize with num_jobs on save_audios
kinanmartin 06e4291
new version of multi_ja_en prepare.sh script which swaps Librispeech …
kinanmartin 7d462aa
add fbank
baileyeet 31a37c7
Revert "add fbank"
baileyeet 99db0e4
deprecate params.bilingual=0, replace ReazonSpeechAsrDataModule for M…
kinanmartin 8b035a0
remove bilingual tag from train.py
baileyeet 7bea23e
restore version of mls_english compute_fbank_mls_english.py and prepa…
kinanmartin 2265e1a
fix stage 5 output pathing
kinanmartin 5682978
switch mls_english clone from https to ssh
kinanmartin 1093e78
use huggingface_hub library to download mls_english
kinanmartin 1b1a317
Combined updates. Changed BBPE path structure, changed dataset path s…
kinanmartin 68bff93
fix decode script data module usage
kinanmartin b25254f
add utility file for updating the storage_path of cutsets for use in …
kinanmartin d136086
add utility file for creating subsets of mls english. must be fixed t…
kinanmartin b6d43a4
Parametrize dev and test split sizes.
kinanmartin 9c318da
Add failsafe for MLS English dev set key alternate name as validation
kinanmartin 065ca31
Update README.md to reflect MLS English dataset
kinanmartin 0a4ed5e
add step 4: display manifest stats to mls_eng
baileyeet 1ddd3cd
update manifest dir path
baileyeet 606789b
add stage 6 - update cutset paths to prepare
baileyeet 76bae70
remove commented out codels
baileyeet ac94174
changes to train script - no need for limiting utterance length here
baileyeet 9c91775
remove unused local scripts
baileyeet 694ecb9
make prepare.sh symlinks relative
kinanmartin d7ee48e
Validate generated manifest files. (#338)
csukuangfj 1996507
changes to asr_datamodule for musan support
baileyeet ed2c0a4
typos
baileyeet 5fb4bdf
commenting
baileyeet 1cf544b
remove comment
baileyeet c610c6d
resolve typos and import issues
baileyeet 6272827
update musan path
baileyeet aeffb15
update musan paths
baileyeet 4475815
update musan symlinks
baileyeet a310d8f
attempt to fix musan paths
baileyeet 60f326b
working changes for musan mixing
baileyeet 95f58e6
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet 865b859
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet b19929c
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet 2f1f419
Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
baileyeet 7b4abba
black and isort formatting
baileyeet 8dd2c0f
PR review suggestions implemented
baileyeet 94cf8c3
support left pad for make_pad_mask (#1990)
yfyeung 0ca7595
Update RESULTS.md
baileyeet 11df2a8
Musan implementation for ReazonSpeech (#1988)
baileyeet 2d8e3fd
Fix transformer decoder layer (#1995)
csukuangfj f15a783
Validate generated manifest files. (#338)
csukuangfj c23af2e
musan implementation for mls_english
baileyeet ed79fa3
revert unrelated transformer.py diffs from rebase
baileyeet 636121c
remove bilingual tag from train.py
baileyeet 0967f5f
Manually fix merge conflict in multi_ja_en/ASR/zipformer/train.py
kinanmartin f210002
Validate generated manifest files. (#338)
csukuangfj ee2a6d6
remove bilingual tag from train.py
baileyeet dee07de
Validate generated manifest files. (#338)
csukuangfj f9ceead
Validate generated manifest files. (#338)
csukuangfj 4e05d70
fix stash commit
baileyeet 130c2a5
Merge branch 'multi_ja_en_mls_english_clean' into musan-mls-clean-final
baileyeet 5400f43
training and decoding compatibility changes
baileyeet 8c08c9c
Create RESULTS.md
baileyeet 8e18616
Update RESULTS.md
baileyeet 556a3f0
Update README.md
baileyeet 36fc1f1
Merge pull request #4 from reazon-research/musan-mls-clean-final
kinanmartin 7231cf4
Remove changes to files outside of relevant recipes
kinanmartin a4c1db5
reformat
baileyeet 2859c22
Update RESULTS.md
baileyeet 9a940c3
Update RESULTS.md
baileyeet f64a706
Update egs/multi_ja_en/ASR/RESULTS.md
kinanmartin ef7664e
Update egs/mls_english/ASR/local/utils/asr_datamodule.py
kinanmartin bc2560c
Update training commands and decode.py accuracy values, add streaming…
kinanmartin ecbe985
Update streaming train and export commands
kinanmartin a30e80c
Remove accidentally added submodule musan-k2-v2-reazonspeech-medium
baileyeet 9d389cd
Update egs/reazonspeech/ASR/local/compute_fbank_musan.py
baileyeet 8c84639
Update egs/mls_english/ASR/zipformer/streaming_decode.py
baileyeet d74e232
Merge branch 'master' into multi_ja_en_mls_english_clean
baileyeet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since your dataset is large, can you also try
--iterinstead of--epoch?