Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1275 commits
Select commit Hold shift + click to select a range
027132a
Move self-attention weights input to after ff1.
danpovey Feb 24, 2026
ecf8f1f
Merge remote-tracking branch 'zengwei_zapformer/deterministic_inverti…
danpovey Mar 11, 2026
6f7999e
Separate streaming and non-streaming versions of SequenceNorm and rem…
danpovey Mar 11, 2026
1f908bf
Make min_factor simply added linearly (not affect progress) and incre…
danpovey Mar 11, 2026
a217e3d
Increase cubic_decay_proportion=0.75 back to cubic_decay_proportion=0…
danpovey Mar 12, 2026
e274669
Increase cubic_decay_proportion from .8 to .85
danpovey Mar 12, 2026
af73215
Implement min_factor and max_factor in cosine scheduler via changing …
danpovey Mar 12, 2026
d343e0a
Revert "Move self-attention weights input to after ff1."
danpovey Mar 13, 2026
f0410f6
Documentation changes.
danpovey Mar 13, 2026
4ee3199
Merge branch 'deterministic_invertible2182conv' into deterministic_in…
danpovey Mar 13, 2026
1b32247
Large amount of code cleanup and removal.
danpovey Mar 13, 2026
075b702
Increase CorrelationLimiter limit from .35 to .45
danpovey Mar 14, 2026
f8db837
Replace CosineLRScheduler with HalfCosineLRScheduler
danpovey Mar 14, 2026
f3fd4d8
Implement InterpCosineLRScheduler
danpovey Mar 16, 2026
3cd1645
Some configuration changes; CorrelationLimiter power 0.45->0.4, cubic…
danpovey Mar 16, 2026
56d283d
Change where padding is done in ConvolutionModule and round up sequen…
danpovey Mar 16, 2026
8b1ed2e
Simplify the interface of model.py, moving SpecAug augmentation out i…
danpovey Mar 17, 2026
39a6647
Bug fix
danpovey Mar 17, 2026
315471b
fix import
danpovey Mar 17, 2026
ba0d20b
Bug fixes
danpovey Mar 17, 2026
145d2cc
Merge branch 'deterministic_invertible2191conv' into deterministic_in…
danpovey Mar 19, 2026
d2ad0bb
Change defaults and test code in optim.py, will not affect our runs.
danpovey Mar 19, 2026
2f227b3
Move code to batched_rubik, rubik instead of optim.py
danpovey Mar 19, 2026
096b914
add commonvoice dataset
yaozengwei Mar 19, 2026
27424b4
Replace FftConv with BasisConv
danpovey Mar 21, 2026
c3921f9
Fix wrong class names in super()
danpovey Mar 21, 2026
957b23b
Implement WeightedMean to bypass convolutions; this breakds streaming…
danpovey Mar 21, 2026
a429e48
Bug fix re src_key_padding_mask, use it.
danpovey Mar 21, 2026
fd8147c
Use 4, not 2, copies of the data.
danpovey Mar 24, 2026
bc7d0b6
Fix assertion.
danpovey Mar 24, 2026
c3e0e8c
Changes to multicopy_dataset.py and asr_datamodule.py; max-duration i…
danpovey Mar 25, 2026
47f059e
Various bug fixes
danpovey Mar 25, 2026
bcf578f
Bug fixes.
danpovey Mar 25, 2026
8533627
Make learning rate of scale be propto scale and make bias scale_limit…
danpovey Mar 30, 2026
286d09a
Remove specifiable weight decay from rubik, merge into bias and weigh…
danpovey Mar 30, 2026
85d6b32
Initialize scales to the actual parameter scales.
danpovey Mar 30, 2026
f7ef39f
Remove weight decay arg.
danpovey Mar 30, 2026
0eb4057
Change BasisConv to normal convolution, with bias=False; still in par…
danpovey Mar 30, 2026
ef3233b
Halve bias_scale_limits from (0.2,1.0) to (0.1,0.5).
danpovey Mar 31, 2026
0dd7b95
Do not respect pairs of sequences in AlternatingSpecAugment, use invi…
danpovey Mar 31, 2026
3a8d333
Add more diagnostic code in test.
danpovey Apr 1, 2026
0230f0b
Apply weighted_mean in self-attention also
danpovey Apr 1, 2026
dbe49d6
Bug fixes RE CV test and --max-duration/--num-copies types from 2236.
danpovey Apr 1, 2026
0602f1f
Add input sigmoid gating in self-attn.
danpovey Apr 2, 2026
3041218
Actually apply mask in weighted_mean; put sigmoid gating before weigh…
danpovey Apr 2, 2026
4255790
Increase value-head-dim from 64 to 98; reduce central num layers from…
danpovey Apr 2, 2026
dc689e7
Merge branch 'deterministic_invertible2240conv' into deterministic_in…
danpovey Apr 3, 2026
894149c
Remove depthwise_conv.lr_cale = 0.66
danpovey Apr 3, 2026
9485532
Restore depthwise_conv.lr_scale = 0.66
danpovey Apr 3, 2026
ad3c07e
Merge remote-tracking branch 'upstream/master' into deterministic_inv…
danpovey Apr 4, 2026
e1ada1f
Remove zapformer_denoise directory
danpovey Apr 4, 2026
9fb0cad
Remove zapformer2 directory
danpovey Apr 4, 2026
6961d60
Remove unnecessary change to get_parameter_groups_with_lrs() from ice…
danpovey Apr 4, 2026
b958bf1
Set bias_scale_limits to be the same as weight_scale_limits: (0.05,0.25)
danpovey Apr 1, 2026
5c5e21b
Change the num hours to num cuts in the weights of subsets.
danpovey Apr 4, 2026
af080b9
Bug fix regarding length of libri cuts
danpovey Apr 5, 2026
7a8d0db
Increase central num layers from 12 to 14.
danpovey Apr 3, 2026
a65c190
Add min_factor = 0.05 to InterpCosineLRScheduler, applied via interpo…
danpovey Apr 5, 2026
230eb01
Merge branch 'deterministic_invertible2274conv' into deterministic_in…
danpovey Apr 5, 2026
b1d77fb
Remove unnecessarily num_copies-related code and comments.
danpovey Apr 5, 2026
92bb285
Initialize out_proj scales of submodules to zero.
danpovey Apr 6, 2026
d8614fc
Revert zero-out-proj-scales and instead decrease out_proj initial sca…
danpovey Apr 6, 2026
e9b2079
Make num_copies rise linearly with epoch, starting from 1, to --max-c…
danpovey Apr 6, 2026
9e3c29a
Implement more accurate LR schedule with variable_combined_scheduler
danpovey Apr 6, 2026
0e865c4
Fix schedule of num_copies, start from 1 not 2.
danpovey Apr 6, 2026
4758c68
Bug fix RE adjust_factor
danpovey Apr 6, 2026
53ad727
Invert adjust_factor
danpovey Apr 6, 2026
7856b73
Fix f-string
danpovey Apr 6, 2026
fab7cde
Add torch.distributed.barrier() around anything that might call fix_r…
danpovey Apr 6, 2026
3c4b5bf
Re-set random seed after creating dataloader
danpovey Apr 6, 2026
b91c255
fix streaming decoding
yaozengwei Apr 7, 2026
5583735
Make LR schedule linearly decreasing rather than InterpCosine
danpovey Apr 7, 2026
78c285b
Merge remote-tracking branch 'zengwei_zapformer/deterministic_inverti…
danpovey Apr 7, 2026
526ce4a
Introduce min-copies; use no-speed-perturb copy of libr train data.
danpovey Apr 7, 2026
44401be
Revert optimizer schedule to status in 2285.
danpovey Apr 7, 2026
aaae98e
Cosmetic fix.
danpovey Apr 7, 2026
7a3daf6
Make num-copies rise exponentially with epoch rather than linearly.
danpovey Apr 8, 2026
bc2e4e8
Make LR depend on epoch not batch, as in 2284, and use linear decay w…
danpovey Apr 8, 2026
061afac
Use try-except for importing dist_barrier.
danpovey Apr 8, 2026
93c0efe
Reduce min_factor of LinearLRScheduler from 0.05 to 0.025
danpovey Apr 8, 2026
da056e3
Replace LinearLRScheduler with HalfCosineLRScheduler
danpovey Apr 8, 2026
538d589
Use VariableCombinedLRScheduler of linear-decay type; make num-copies…
danpovey Apr 8, 2026
140f275
Change LinearLRScheduler to InterpCosineLRScheduler, still with min_f…
danpovey Apr 8, 2026
5c817c1
squared_scale=0.75 in InterpCosineLRScheduler, make its final linear …
danpovey Apr 8, 2026
93c6405
Make InterpCosineLRScheduler more general to include linear function;…
danpovey Apr 9, 2026
182752b
Use the speed-perturb, not the nosp, version of the librispeech data.
danpovey Apr 9, 2026
566d591
Change code to get copies_per_epoch to be reversed, to minimize round…
danpovey Apr 9, 2026
c08b95b
Add debugging print statements to show model param values and random …
danpovey Apr 10, 2026
91ac712
Change debug statements for balance
danpovey Apr 10, 2026
2a8498f
Update debug statement
danpovey Apr 10, 2026
99b0412
Desynchronize the torch rng's; and rely on only the torch rng in time…
danpovey Apr 10, 2026
b7e1a95
Add comment
danpovey Apr 10, 2026
c5aaca0
Merge branch 'deterministic_invertible2297conv_debug' into determinis…
danpovey Apr 10, 2026
9c4e419
Bug fix
danpovey Apr 10, 2026
e11db7f
Set cuda seed in a more complete way
danpovey Apr 10, 2026
6263776
Remove debugging statements.
danpovey Apr 10, 2026
f981352
Do random seeding and initialization of sampler differently; do not m…
danpovey Apr 10, 2026
de8b0af
Bug fix importing numpy as np
danpovey Apr 10, 2026
7746732
Make limit_param_value non-randomized; also remove un-used python files.
danpovey Apr 10, 2026
0871a46
Move time_warp to alternating_spec_augment.py
danpovey Apr 10, 2026
e0fd17f
Fix import
danpovey Apr 10, 2026
0a94ca3
Do not require CTC
danpovey Apr 10, 2026
bc202c2
Make depthwise_conv non-central weights 10 times smaller
danpovey Apr 10, 2026
5319173
use try-except when importing time_warp to avoid multi-job problem
danpovey Apr 10, 2026
ff60d77
Merge branch 'deterministic_invertible3002conv' into deterministic_in…
danpovey Apr 10, 2026
a1b6764
Improve code regarding random number generators, for greater clarity …
danpovey Apr 11, 2026
c64b00e
Remove correlation limiter.
danpovey Apr 12, 2026
1d06aa8
Refactor AngularFreqBasis for caching and reuse
yaozengwei Apr 13, 2026
565ece0
minor update
yaozengwei Apr 13, 2026
486a368
Restore correlation_limiter but with power reduced from .4 to .35
danpovey Apr 14, 2026
936c82c
Merge remote-tracking branch 'zengwei_zapformer/deterministic_inverti…
danpovey Apr 14, 2026
21da053
Introduce scale of query_head_dim**-0.5 to keys
danpovey Apr 14, 2026
f26a443
Add code to compute and print projection overlap
danpovey Apr 15, 2026
39a56ff
Introduce a new loss term that makes the projection-overlap be at lea…
danpovey Apr 15, 2026
fdd72e6
take some changes to the metric from deterministic_invertible3021conv…
danpovey Apr 18, 2026
219a672
Remove correlation limiter loss
danpovey Apr 18, 2026
2e847d5
Fix bug in rubik.py about unnecessarily unsqueezing
danpovey Apr 18, 2026
9dc7235
Use fourth_power_rms in normalizing step size in batched_rubik
danpovey Apr 18, 2026
b0a1b50
Add model.encoder.compute_projection_overlap(verbose=True) every epoch
danpovey Apr 19, 2026
74ca295
Fix bug with compute_projection_overlap loss not being scaled.
danpovey Apr 19, 2026
0887c1d
Fix printing of projection overlap per epoch, w.r.t. DDP
danpovey Apr 19, 2026
71fdfaa
Change factor in setting alpha from .5 to .25, bigger safety factor t…
danpovey Apr 20, 2026
2d5697d
Do the remaining part of the shrinkage as linear shrinkage.
danpovey Apr 20, 2026
6582899
Revert safety factor on alpha from .25 to .5
danpovey Apr 20, 2026
6e391a6
Increase safety factor from 0.5 to 0.66
danpovey Apr 20, 2026
956a015
Incorporate refactoring of rubik, with safety_factor=0.66 and use the…
danpovey Apr 20, 2026
4bbd3eb
Reduce safety_factor from 0.66 to 0.5
danpovey Apr 20, 2026
28fb712
Bug fix in sign of linear decay in rubik
danpovey Apr 21, 2026
c4c8879
Take zapformer.py from 3045, reducing min of final residual_scale fro…
danpovey Apr 21, 2026
94f7bbd
Add random embedding projections to tensorboard.
danpovey Apr 21, 2026
09667e6
Print random state for debugging consistency.
danpovey Apr 21, 2026
de57093
Print augmented features sum to check consistency
danpovey Apr 21, 2026
fa3194d
project full grad, not just sign
danpovey Apr 21, 2026
f9d6db6
Move param-random-proj code to debug_params() function, separate from…
danpovey Apr 22, 2026
da575d8
Restore correlation limiter but with enormous limit, of 0.25.
danpovey Apr 22, 2026
912c605
Make debug printout of correlations more frequent and print the limit.
danpovey Apr 22, 2026
655dca3
Introduce adafactor_beta1=0.9, add conventional momentum into direct …
danpovey Apr 22, 2026
f640f6e
Increase direct from 0.15 to 0.25
danpovey Apr 22, 2026
0895abb
Reduce direct scale from .25 to .15
danpovey Apr 22, 2026
f325df4
Change adfactor_beta1 to -0.5 and direct to 0.05
danpovey Apr 23, 2026
d0cae9b
Print debug_grad less frequently.
danpovey Apr 23, 2026
9462cc5
Decrease direct from .05 to .01, adafactor_beta1 from -0.5 to -0.9 an…
danpovey Apr 24, 2026
8428671
take debugging-only changes to train.py from 3078.
danpovey Apr 24, 2026
ad80d42
Plot grad_proj for 50 out of every 1000 steps so we can get a sense f…
danpovey Apr 24, 2026
673c6b6
Make direct interpreted as a learning rate, not a scale, set it to 0.…
danpovey Apr 24, 2026
7f7b85f
Make adafactor update fully cancel, subtract it all on the next step …
danpovey Apr 24, 2026
ae82911
Increase direct scale from .00015 to .0015.
danpovey Apr 24, 2026
6c7bf06
Reduce beta2 used in no_momentum_step from 0.98 to 0.9.
danpovey Apr 24, 2026
a500f0e
Reduce beta2 used in no_momentum_step from 0.9 to 0.0
danpovey Apr 24, 2026
ccf5cfb
Warm up cancellation of direct gradient over 4k batches.
danpovey Apr 24, 2026
3eff001
Reduce direct=0.0015 to direct=0.0005.
danpovey Apr 24, 2026
730220e
Change adafactor_beta1 from 0.0 to -0.5 (beta1 used in no_momentum_step)
danpovey Apr 24, 2026
313bf5e
Have direct grad immediately normalized and have it warm down over 5k…
danpovey Apr 25, 2026
6815809
Have direct lr warm down to 0.2 of its initial value, not zero.
danpovey Apr 26, 2026
c9b8fa8
Change batched_rubik to ignore direct grad term and just use nesterov…
danpovey Apr 27, 2026
918f935
Remove lr_scale from OrthogonalLinear and replace it with weight_rms …
danpovey Apr 27, 2026
7bf24f7
Introduce factor of 0.5 in decay that comes from the math; reduce sca…
danpovey Apr 27, 2026
f9c6cb9
Code cleanups, and propagate recent updates to batched_rubik.py to ru…
danpovey Apr 27, 2026
e4e7234
Change factor in beta_ceil from 0.2 to 0.1 for slower warmup of beta.
danpovey Apr 28, 2026
6684bc4
Merge branch 'deterministic_invertible3105conv' into deterministic_in…
danpovey Apr 28, 2026
2cd1fc0
Remove all special things from initialization and training of depthwi…
danpovey Apr 28, 2026
29cddb1
Normalize direct grad separately from moving_grad
danpovey Apr 28, 2026
6aaa46c
Fix comment
danpovey Apr 28, 2026
d9a96ab
Bug fix, use moving_stats
danpovey Apr 28, 2026
70c4178
Apply nesterov also to adam_step and scaling_step
danpovey Apr 28, 2026
e8c4694
Merge branch 'deterministic_invertible3110conv' into deterministic_in…
danpovey Apr 29, 2026
d2d2f81
Restore code to down weight non-central depthwise_conv weights on ini…
danpovey May 1, 2026
91a2ef4
Make compute_projection_overlap more efficient.
danpovey May 3, 2026
14a9106
Remove parameter names from batched_rubik (not functional anyway), si…
danpovey May 5, 2026
733a896
Bug fixes and properly sync rubik.py with batched_rubik.py
danpovey May 5, 2026
20e3f4f
Fix various comments.
danpovey May 5, 2026
dd07263
Remove input sigmoid-scaling in ConvolutionModule.
danpovey May 6, 2026
e2cfc29
Bug fix
danpovey May 6, 2026
7711135
Update RESULTS.md to add basic zapformer recipe
danpovey May 13, 2026
03d4a70
Fix formula with linear_alpha having the wrong sign; add some debug c…
danpovey May 16, 2026
40b58c0
Increase cubic_decay_proportion from 0.8 to 1.0.
danpovey May 16, 2026
8e89313
Increase overlap minimum from .66 to .70
danpovey May 14, 2026
7a24ff9
Double nesterov scale.
danpovey May 17, 2026
d6139af
Take simpler version of batched_rubik.py that only has one set of stats.
danpovey May 17, 2026
1c80a84
REvert nesterov scale to 1.
danpovey May 17, 2026
857971a
Double nesterov scale to 2.0.
danpovey May 17, 2026
c4e2806
Decrease nesterov scale to 0.66.
danpovey May 17, 2026
f02f152
Remove linear decay; nesterov_scale 0.66->1.0; set cubic_decay_propor…
danpovey May 17, 2026
d19196f
Change test configuration
danpovey May 17, 2026
6c2e1d3
Make cubic_decay_proportion (actually scale) be rank**-0.25, as in na…
danpovey May 18, 2026
7f72684
Remove cubic_decay_proportion arg from train.py.
danpovey May 18, 2026
ff5fc52
Do not scale up step by more than one.
danpovey May 18, 2026
cbb2acf
Do sqrt() on the scale that normalizes the step size.
danpovey May 18, 2026
3f3d732
Have cubic_alpha be computed by quadratic formula for exact decay amo…
danpovey May 18, 2026
9f362a6
Completely remove the scaling in cubic_decay_step.
danpovey May 18, 2026
60bd8ec
Propagate the changes from batched_rubik.py to rubik.py.
danpovey May 18, 2026
a7bae34
Limit norm_grad to -3..3.
danpovey May 18, 2026
2422703
Remove clamping and instead fully normalize scale.
danpovey May 18, 2026
aa486ee
Update stats twice, once after time averaging.
danpovey May 19, 2026
5e18b1e
Remove unnecessary dist reduce
danpovey May 19, 2026
8e82824
Introduce beta2b_scale=0.1 to make stats dominated by grad not moving…
danpovey May 19, 2026
c2fc785
take changes from nanochat setup for memory usage.
danpovey May 20, 2026
26bdc2a
Remove beta2b, use just beta.
danpovey May 20, 2026
fd5a116
Extra printout, alpha_ratio
danpovey May 20, 2026
9de596b
Remove warmup for second beta2.
danpovey May 20, 2026
76d93f5
Clamp to -4..4.
danpovey May 20, 2026
b2908cd
Use invP as preconditioning for compute_alpha invocation.
danpovey May 21, 2026
10ac67a
Increase weight_rms in OrthogonalLinear to slow down learning of proj…
danpovey May 21, 2026
44ddf89
Go back to using beta2b, like reverse of 3178,
danpovey May 21, 2026
feb7711
Do conventional beta1 decay for first 200 steps.
danpovey May 21, 2026
9a348d6
Clamp zapformer layer output to -5..5 to prevent divergence early on …
danpovey May 21, 2026
aac8998
Finish changing input_scale of conv module to be inside conv_module.
danpovey May 21, 2026
b33113a
Merge branch 'deterministic_invertible3185conv' into deterministic_in…
danpovey May 21, 2026
d4d81a0
Propagate recent changes to batched_rubik.py to rubik.py
danpovey May 21, 2026
b8347d3
Introduce safety_factor=0.5 in alpha computation.
danpovey May 21, 2026
a595c61
Merge branch 'deterministic_invertible3186conv' into deterministic_in…
danpovey May 21, 2026
a56ebde
Merge branch 'deterministic_invertible3187conv' into deterministic_in…
danpovey May 21, 2026
f7148d0
fix merge issue in rubik.py
danpovey May 21, 2026
69da97a
remove clamp(-4,4) from rubik.
danpovey May 21, 2026
52593f8
Copy muon-core code from rubik_baseline_tb_dan_largeinit_simpler42 an…
danpovey May 22, 2026
903576b
Take muon-core rubik from rubik_baseline_tb_dan_largeinit_simpler45.
danpovey May 22, 2026
a0e8c17
Remove use of invP in computing alpha.
danpovey May 23, 2026
655be80
Introduce alpha_power = 0.5, making alpha closer to 1.
danpovey May 23, 2026
44f6639
Increase alpha_power from 0.5 to 0.75.
danpovey May 23, 2026
0692a87
Fixes suggested by AI from https://github.com/k2-fsa/icefall/pull/2082
danpovey May 24, 2026
cae74c4
Further fix
danpovey May 24, 2026
ba47201
Fix loading state dict dtype issue
danpovey May 25, 2026
909cd2d
Reduce beta1 in BatchedRubik[muon-core] from .99 to .98.
danpovey May 25, 2026
f2a811e
Changed muon-core update to have symmetric row-or-col normalization b…
danpovey May 25, 2026
2ef9d39
copy export-onnx.py from zipformer
pkufool May 25, 2026
27aea4e
Revert beta1 from 0.98 to 0.99.
danpovey May 25, 2026
c3fc7d6
fix to onnx exporting, not working yet
pkufool May 28, 2026
b638c5a
export transducer works
pkufool May 28, 2026
6d0080c
fix onnx inference
pkufool May 28, 2026
d284af8
minor fixes
pkufool May 28, 2026
9cb74d0
minor fix
pkufool May 28, 2026
f9869c1
merge kangwei's branch zapformer_preview
danpovey May 30, 2026
de5a49f
Fix some dtypes in optimizer.
danpovey May 30, 2026
3d67a58
Update the results.
danpovey May 30, 2026
ee43592
Set base-lr to 0.02.
danpovey May 30, 2026
6c2e9b6
Remove unnecessary state_dict/load_state_dict members.
danpovey May 30, 2026
881a8ed
Fix issue in matrix_shape() pointed out by AI on https://github.com/k…
danpovey May 30, 2026
3e78592
fix streaming jit export
pkufool Jun 1, 2026
f96e36e
Fix from master for ctc_loss bug in torch
danpovey Jun 1, 2026
916a250
Take zipformer/model.py from master.
danpovey Jun 1, 2026
ae69eea
fix streaming export and pretrained inference
pkufool Jun 1, 2026
8f94d85
Use batched_rubik optimizer [muon-core] in zipformer, with interp-cos…
danpovey Jun 1, 2026
8896e65
Merge remote-tracking branch 'kangwei/zapformer_preview' into determi…
danpovey Jun 2, 2026
60bcaa7
Add giga/cv test sets for zipformer
pkufool Jun 8, 2026
be8a101
Make code more robust w.r.t. COMPUTE_DTYPE.
danpovey Jun 9, 2026
73c7579
Merge changes from origin/zapformer3127
danpovey Jun 9, 2026
a1de0b2
Remove comment.
danpovey Jun 9, 2026
209e1c7
Merge branch 'deterministic_invertible3242conv' into deterministic_in…
danpovey Jun 11, 2026
bc6955d
Remove muon.py
danpovey Jun 11, 2026
7e077af
take zipformer/train.py from master, move this train.py to train_newo…
danpovey Jun 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions egs/librispeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
## Results

### zapformer (zapformer + pruned-transducer w/ CTC)

Note: --num-real-epochs 40 takes about the same time as 20 epochs with the zipformer CR-CTC recipe.
(each epoch is really 3 epochs due to speed-perturb). So the time for training will be roughly 40%
of the old zipformer recipe. The "--epoch 13" reported below is the last epoch, the smaller
number of epochs has to do with the --min-copies,--max-copies, we will add this into the
report later (later epochs take more real computation time because they make different SpecAug
copies of the data.)

# (non-streaming)
./zapformer/train.py --world-size 4 \
--min-copies 1 --max-copies 8 --num-real-epochs 40 \
--base-lr=0.023 --batches-per-epoch 2400 --start-epoch 1 --use-fp16 1 \
--exp-dir zapformer/exp \
--use-ctc 1 --use-transducer 1 \
--base-dim 64 --ctc-loss-scale 0.2 \
--full-libri 1 --max-duration 1200 --master-port 43039

| decoding method | test-clean | test-other | comment |
|--------------------------------------|------------|------------|---------------------|
| greedy_search | 1.81 | 3.73 | --epoch 13 --avg 3 |

Note on other results: dev-clean=1.73,dev-other,3.55, giga test=16.69 giga dev=1.733. (i.e. on the model trained with Libri only).


### zipformer (zipformer + pruned-transducer w/ CR-CTC)

See <https://github.com/k2-fsa/icefall/pull/1766> for more details.
Expand Down
112 changes: 112 additions & 0 deletions egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,21 @@

import argparse
import inspect
import glob
import logging
import re

from functools import lru_cache
from pathlib import Path
from typing import Any, Dict, Optional

import numpy as np # to set its random seed

import torch
import lhotse

from lhotse import CutSet, Fbank, FbankConfig, load_manifest, load_manifest_lazy

from lhotse.dataset import ( # noqa F401 for PrecomputedFeatures
CutConcatenate,
CutMix,
Expand Down Expand Up @@ -497,3 +505,107 @@ def gigaspeech_dev_cuts(self) -> CutSet:
def gigaspeech_test_cuts(self) -> CutSet:
logging.info("About to get Gigaspeech test cuts")
return load_manifest_lazy(self.args.manifest_dir / "cuts_TEST.jsonl.gz")


class GigaSpeech:
def __init__(self, manifest_dir: str):
"""
Args:
manifest_dir:
It is expected to contain the following files:

- gigaspeech_XL_split_2000/gigaspeech_cuts_XL.*.jsonl.gz
- gigaspeech_cuts_L.jsonl.gz
- gigaspeech_cuts_M.jsonl.gz
- gigaspeech_cuts_S.jsonl.gz
- gigaspeech_cuts_XS.jsonl.gz
- gigaspeech_cuts_DEV.jsonl.gz
- gigaspeech_cuts_TEST.jsonl.gz
"""
self.manifest_dir = Path(manifest_dir)

def train_XL_cuts_split(self) -> CutSet:
logging.info("About to get train-XL cuts")

filenames = list(
glob.glob(
f"{self.manifest_dir}/gigaspeech_XL_split_2000/gigaspeech_cuts_XL.*.jsonl.gz" # noqa
)
)

pattern = re.compile(r"gigaspeech_cuts_XL.([0-9]+).jsonl.gz")
idx_filenames = [(int(pattern.search(f).group(1)), f) for f in filenames]
idx_filenames = sorted(idx_filenames, key=lambda x: x[0])

sorted_filenames = [f[1] for f in idx_filenames]

logging.info(f"Loading {len(sorted_filenames)} splits")

return lhotse.combine(lhotse.load_manifest_lazy(p) for p in sorted_filenames)

def train_XL_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_XL.jsonl.gz"
logging.info(f"About to get train-XL cuts from {f}")
return CutSet.from_jsonl_lazy(f)

def train_L_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_L.jsonl.gz"
logging.info(f"About to get train-L cuts from {f}")
return CutSet.from_jsonl_lazy(f)

def train_M_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_M.jsonl.gz"
logging.info(f"About to get train-M cuts from {f}")
return CutSet.from_jsonl_lazy(f)

def train_S_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_S.jsonl.gz"
logging.info(f"About to get train-S cuts from {f}")
return CutSet.from_jsonl_lazy(f)

def train_XS_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_XS.jsonl.gz"
logging.info(f"About to get train-XS cuts from {f}")
return CutSet.from_jsonl_lazy(f)

def test_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_TEST.jsonl.gz"
logging.info(f"About to get TEST cuts from {f}")
return load_manifest_lazy(f)

def dev_cuts(self) -> CutSet:
f = self.manifest_dir / "gigaspeech_cuts_DEV.jsonl.gz"
logging.info(f"About to get DEV cuts from {f}")
return load_manifest_lazy(f)


class CommonVoice:
def __init__(self, manifest_dir: str):
"""
Args:
manifest_dir:
It is expected to contain the following files::

- cv22-en_cuts_train.jsonl.gz
- cv22-en_cuts_dev.jsonl.gz
- cv22-en_cuts_test.jsonl.gz
"""
self.manifest_dir = Path(manifest_dir)

def train_cuts(self) -> CutSet:
logging.info("CommonVoice: About to get train cuts")
return load_manifest_lazy(
self.manifest_dir / "cv22-en_cuts_train.jsonl.gz"
)

def dev_cuts(self) -> CutSet:
logging.info("CommonVoice: About to get dev cuts")
return load_manifest_lazy(
self.manifest_dir / "cv22-en_cuts_dev.jsonl.gz"
)

def test_cuts(self) -> CutSet:
logging.info("CommonVoice: About to get test cuts")
return load_manifest_lazy(
self.manifest_dir / "cv22-en_cuts_test.jsonl.gz"
)
1 change: 1 addition & 0 deletions egs/librispeech/ASR/zapformer/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
swoosh.pdf
Loading
Loading