Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DASB TTS Implementation #52

Draft
wants to merge 186 commits into
base: DASB
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
186 commits
Select commit Hold shift + click to select a range
0fafc1c
Tokotron LJSpeech: Update to work with the new tokenizer pipeline
flexthink Dec 30, 2024
e66a00e
Tokotron: Add Tokotron integration for LibriTTS (multi-speaker recipes)
flexthink Jan 12, 2025
54dab67
Tokotron: Fixes
flexthink Jan 13, 2025
2552b06
DASB: Tokotron: Cosmetic changes
flexthink Jan 14, 2025
f982325
DASB: More cosmetic changes from linters
flexthink Jan 15, 2025
cc9f6cc
Merge branch 'DASB' into DASB-tts
flexthink Jan 16, 2025
1357ff1
DASB: Tokotron: Relative paths
flexthink Jan 16, 2025
958ee87
DASB: Tokotron: Add choices for the model type
flexthink Jan 16, 2025
043eb9c
DASB: Tokotron: more clean-up
flexthink Jan 16, 2025
900481d
DASB: Tokotron: Updates for hyperparameter fitting
flexthink Jan 17, 2025
4dcd1d3
DASB: Batch size updates, device fixes
flexthink Jan 17, 2025
fc08f58
DASB: Tokotron: Fixes
flexthink Jan 17, 2025
fcb37c7
DASB: Ensure UTMOS is maximized rather than minimized!
flexthink Jan 17, 2025
9563cd5
DASB: Tokotron: Fixes
flexthink Jan 20, 2025
4442b44
DASB: Tokotron: Fixes
flexthink Jan 20, 2025
1e18ead
DASB: VALL-E: Initial import
flexthink Jan 20, 2025
3ce4d4d
DASB: Fixes
flexthink Jan 20, 2025
57d68cf
DASB: Fixes
flexthink Jan 20, 2025
c1c3b52
DASB: Add a "brokenness check" to ensure that tokens runs that produc…
flexthink Jan 21, 2025
123e124
DASB: Tokotron/VALL-E: Work in progress
flexthink Jan 21, 2025
0eb8d53
Merge branch 'DASB' into DASB-tts
flexthink Jan 21, 2025
b1ca7ad
DASB: Tokotron: Implement SQCodec, Mimi and WavTokenizer (single-spea…
flexthink Jan 21, 2025
2daeaa5
DASB: Cosmetic changes (pre-commit hooks)
flexthink Jan 21, 2025
99395f8
DASB: Update sample rates
flexthink Jan 22, 2025
0d70359
Merge branch 'DASB-tts-tmp-valle' into DASB-tts
flexthink Jan 22, 2025
3b6a99f
Merge branch 'DASB' into DASB-tts
flexthink Jan 23, 2025
3d3e04c
DASB: Tokotron: Add validation batch size customization (to avoid OOM)
flexthink Jan 23, 2025
16912d5
DASB: Tokotron: Minor fixes
flexthink Jan 23, 2025
ec47b0d
DASB: Fixes
flexthink Jan 23, 2025
5dba59d
DASB: Tokotron: Update priors
flexthink Jan 24, 2025
f7116a8
DASB: Fixes
flexthink Jan 24, 2025
199a37c
DASB: Tokotron: Fixes
flexthink Jan 27, 2025
dd7f3d3
DASB: Tokotron: Fix layer selection for Discrete SSL
flexthink Jan 27, 2025
46c8ba4
DASB: VALL-E: Add LibriTTS
flexthink Jan 28, 2025
ba6bddb
DASB: VALL-E: Fixes/Updates
flexthink Jan 28, 2025
d8a720c
DASB: VALL-E: Fixes
flexthink Jan 28, 2025
11c427b
DASB: VALL-E: Fixes
flexthink Jan 28, 2025
2d1a46a
DASB: Fix ST extraction
flexthink Jan 29, 2025
e53d7c6
DASB: Add support for using Orion Trial IDs instead of randomness
flexthink Jan 30, 2025
1d0aec0
DASB: Disable random directory name generation for the final test phase
flexthink Jan 30, 2025
e0bb265
DASB: Fixed the codebook count
flexthink Jan 31, 2025
5f5105f
DASB: Extraction fixes/updates
flexthink Jan 31, 2025
d02e870
DASB: Clean-up
flexthink Jan 31, 2025
0f2561d
DASB: Tokotron: Config updates
flexthink Jan 31, 2025
c9578e8
DASB: Cosmetic changes (pre-commit hooks)
flexthink Jan 31, 2025
7270d4e
DASB: Add the ability to turn off evaluation for debugging purposes.
flexthink Jan 31, 2025
2b22169
DASB: Add the ability to turn off evaluation
flexthink Jan 31, 2025
6eaa206
DASB: Tokotron: SQCodec update to use ternary coding
flexthink Feb 3, 2025
a99fddb
DASB: Device fix
flexthink Feb 3, 2025
650cf2e
DASB: Tokotron: Add the ability to add an "initialization model" when…
flexthink Feb 3, 2025
b43b565
DASB: A small fix for cases where strides are not compatble (not nece…
flexthink Feb 3, 2025
693d499
DASB: Extra logging
flexthink Feb 4, 2025
7b79ffc
DASB: Fix maximum validation set size
flexthink Feb 4, 2025
24bebfe
DASB: Add the ability to change the saved folder for Encodec
flexthink Feb 4, 2025
7ede118
DASB: Fixes
flexthink Feb 5, 2025
123248d
DASB: Tokotron: Fixes
flexthink Feb 5, 2025
0b11188
DASB: Tokotron: Fixes
flexthink Feb 5, 2025
4eaa7cd
DASB: Fixes
flexthink Feb 5, 2025
60e7d9e
DASB: Tokotron: Fixes
flexthink Feb 5, 2025
54df7ed
DASB: Tokotron LibriTTS: Fixes
flexthink Feb 5, 2025
3aa7de3
DASB: Fixes
flexthink Feb 6, 2025
10f8202
DASB: Fixes
flexthink Feb 6, 2025
2cd7c6a
DASB: Fixes
flexthink Feb 6, 2025
7e1bf0f
DASB: Fixes
flexthink Feb 6, 2025
2c72caf
DASB: Fixes
flexthink Feb 6, 2025
4b51644
VALL-E: Cosmetic changes, hparams updates
flexthink Feb 6, 2025
748cc86
DASB: Fixes
flexthink Feb 6, 2025
858b5d4
DASB: Fixes
flexthink Feb 6, 2025
7a5ea84
DASB: Fixes
flexthink Feb 6, 2025
30ee0c0
DASB: Fix prefix masking for VALL-E
flexthink Feb 6, 2025
3d89d2d
DASB: Update loss calculation to match ESPNet
flexthink Feb 6, 2025
779bf99
DASB: VALL-E: Fixes
flexthink Feb 6, 2025
92c40b6
VALL-E: Hyperparameter updates
flexthink Feb 6, 2025
5618797
DASB: Fix the sample rate
flexthink Feb 7, 2025
71cd316
DASB: Fixes
flexthink Feb 7, 2025
9e4c550
DASB: Encodec: Small fix
flexthink Feb 7, 2025
165eaac
DASB: Add Mimi, fix defaults for VALL-E Encodec
flexthink Feb 7, 2025
c1b30db
DASB: mimi fixes
flexthink Feb 7, 2025
c3b647e
DASB: add init_from
flexthink Feb 7, 2025
f27ebad
DASB: small updates
flexthink Feb 7, 2025
9840824
DASB: small updates
flexthink Feb 7, 2025
b4afc68
DASB: Add support for alignments
flexthink Feb 9, 2025
cbea7f7
DASB: Fixed
flexthink Feb 9, 2025
e48a91f
VALL-E: Fixes, add encodec
flexthink Feb 10, 2025
45d6130
DASB: Add encodec
flexthink Feb 10, 2025
e1635df
DASB: fixes
flexthink Feb 10, 2025
64b73e7
DASB: Fixes
flexthink Feb 10, 2025
79ca7a6
DASB: Vall-E: Multi-GPU inference fix
flexthink Feb 10, 2025
c6c6cf6
DASB: Fixes
flexthink Feb 10, 2025
e25d146
DASB: Fixes
flexthink Feb 10, 2025
45b3d1b
DASB: CPU/GPU fixes
flexthink Feb 10, 2025
370ab8e
DASB: Minor fixes
flexthink Feb 10, 2025
256fa35
DASB: Fixes
flexthink Feb 11, 2025
9f27332
DASB: Review debugging code
flexthink Feb 11, 2025
bad8999
VALL-E: Update token sequence initialization to account for special t…
flexthink Feb 11, 2025
39ddfd1
DASB: hparam file updates, new hparams for additional tokenizers
flexthink Feb 11, 2025
5acd1d3
VALL-E: Add files for multiple configurations
flexthink Feb 12, 2025
a78f011
DASB: Add Lifeteng-style curriculum, some config updates
flexthink Feb 13, 2025
953540b
DASB: Add init_from
flexthink Feb 13, 2025
f8b9a67
DASB: Add init_from
flexthink Feb 13, 2025
4f8cc9c
DASB: VALL-E: Implement checkpoint retention based on dWER
flexthink Feb 13, 2025
856df20
DASB: ESPNet Encodec support
flexthink Feb 13, 2025
be174df
DASB: Inference mode, remove an unused evaluator
flexthink Feb 14, 2025
750f3a4
DASB: Add customization for the validation batch size
flexthink Feb 14, 2025
55fc383
DASB: VALL-E: Add ESPNET Encodec
flexthink Feb 14, 2025
0730254
DASB: Add the ability to skip resampling
flexthink Feb 14, 2025
41afc01
DASB: Add the switch for LM head training
flexthink Feb 15, 2025
f529e62
DASB: Undo the gradient change - it did not help
flexthink Feb 15, 2025
554e52a
DASB: VALL-E: Add the ability to disable fixed batches, add the abili…
flexthink Feb 16, 2025
e2d7440
DASB: Fixes
flexthink Feb 16, 2025
d7fc323
DASB: Update wav2vec2
flexthink Feb 16, 2025
8a9e873
DASB: Add back LM head freezing (with a toggle)
flexthink Feb 17, 2025
a1f5e94
DASB: Fix for data parallel
flexthink Feb 17, 2025
e752146
DASB: Fix padding
flexthink Feb 17, 2025
c6d5883
DASB: VALL-E: Fix a crash
flexthink Feb 17, 2025
99588e3
DASB: VALL-E: Add LM head freezing
flexthink Feb 17, 2025
dad02cb
DASB: Vall-E: Fix data-parallel
flexthink Feb 18, 2025
63e9972
DASB: VALL-E: Update hyperparameters
flexthink Feb 18, 2025
bacc9f9
DASB: VALL-E: Add data scaling support
flexthink Feb 18, 2025
b1e270a
DASB: Tokotron: Add scaling + selction based on dWER (for comparison)
flexthink Feb 18, 2025
ef35a2f
DASB: Fixes
flexthink Feb 19, 2025
a6073f5
DASB: Add support for test set filtering
flexthink Feb 19, 2025
1be28c7
DASB: Add support for test set filtering
flexthink Feb 19, 2025
4e5f4eb
DASB: Add filtering (useful when some samples aren't present, e.g. wh…
flexthink Feb 19, 2025
8dadf96
DASB: Fixes
flexthink Feb 19, 2025
5272a73
DASB: Fixes
flexthink Feb 20, 2025
b0df9ac
DASB: VALL-E: Fixes for WavTokenizer (AR-only)
flexthink Feb 21, 2025
cf24b23
DASB: VALL-E: Update/add test stage logging
flexthink Feb 24, 2025
b6224d6
DASB: Fix extraction for clusters with no internet connection on comp…
flexthink Feb 24, 2025
d0900e0
DASB: VALL-E: Add layer selection, hpopt updates
flexthink Feb 24, 2025
c5a3f3a
DASB: Add support for eval_run flags
flexthink Feb 24, 2025
3ddbc57
DASB: VALL-E: Fixes
flexthink Feb 25, 2025
e1bfb7e
DASB: VALL-E: Fixes
flexthink Feb 25, 2025
851bd7d
DASB: VALL-E: Update max length
flexthink Feb 25, 2025
7463474
DASB: Fix WavTokenizer
flexthink Feb 25, 2025
05f8014
DASB: VALL-E: Add speaker prompt resampling
flexthink Feb 25, 2025
f94c61b
DASB: VALL-E: Add SQCodec
flexthink Feb 25, 2025
398304e
DASB: Tokotron: Update SQ-Codec ternary coding
flexthink Feb 25, 2025
c90037c
DASB: Add the ability to disable test runs
flexthink Feb 26, 2025
131eea3
DASB: Tokotron: Update ternary loss aggregation
flexthink Feb 27, 2025
7c5e82f
DASB: Fix an issue with contiguous tensors
flexthink Feb 27, 2025
7046db0
DASB: Tokotron: SQ-Codec Add the ability to bypass additional ternary…
flexthink Feb 28, 2025
ebe1811
DASB: Tokotron: Fixes
flexthink Mar 1, 2025
dae8bcb
DASB: Fixes: SQ-Codec refactoring (decouple from Tokotron, simplify)
flexthink Mar 1, 2025
9b09d20
DASB: VALL-E: Fixes
flexthink Mar 4, 2025
4c4663d
DASB: Update VALL-E for SQCodec
flexthink Mar 5, 2025
6af2d83
DASB: Fixes / clean-up
flexthink Mar 5, 2025
8c6a886
DASB: SQ-Codec: Make the special loss optional
flexthink Mar 5, 2025
583f42a
DASB: SQ Codec: Fixes
flexthink Mar 6, 2025
7a011eb
DASB: SQCodec: Fixes
flexthink Mar 6, 2025
24a4014
DASB: VALL-E: SQ-Codec updates
flexthink Mar 6, 2025
7e5d15d
DASB: SQCodec: Fixes
flexthink Mar 6, 2025
0f14a23
DASB: SQ-Codec: Fully implement ternary mode
flexthink Mar 6, 2025
10f8fdb
DASB: Fix SpeechTokenizer
flexthink Mar 6, 2025
c00962e
Fixes for SQCodec: Make offsets optional, align the shift with ternary
flexthink Mar 7, 2025
50ef659
DASB: SQ-Codec: Add chunking to avoid OOM
flexthink Mar 7, 2025
08b14ff
DASB: SQ-Codec: Update LibriTTS
flexthink Mar 8, 2025
d1ce08a
DASB: Add a mulltitrack ternary language model head (a separate proje…
flexthink Mar 8, 2025
15f096c
DASB: Vall-E: Multitrack fixes
flexthink Mar 8, 2025
981fe93
DASB: SQ-Codec: Fixes
flexthink Mar 9, 2025
de4aaaa
DASB: SQ-Codec: Remove the multi-track ternary head (it did not help)
flexthink Mar 10, 2025
e8af899
DASB: VALL-E Fix ternary loss masking
flexthink Mar 10, 2025
851eb84
DASB: SQCodec: Fixes
flexthink Mar 10, 2025
38ed432
DASB: Add the ability to filter priors
flexthink Mar 10, 2025
d5aea40
DASB: Removed debugging code
flexthink Mar 11, 2025
6cef549
DASB: VALL-E: SQ-Codec fixes
flexthink Mar 11, 2025
9fe48e4
DASB: SQ-Codec: Fix the sample rate
flexthink Mar 11, 2025
263f8b5
VALL-E: SQ-Codec: Add target dropout (optional, disabled by default)
flexthink Mar 11, 2025
fb2d573
DASB: SQ-Codec updates
flexthink Mar 12, 2025
51438b9
DASB: SQ-Codec: Add argmax mode
flexthink Mar 12, 2025
acbcfcf
DASB: SQ-Codec: Add argmax mode
flexthink Mar 12, 2025
b38c1cc
DASB: Fixes
flexthink Mar 12, 2025
44e93fd
SQCodec: Fixes
flexthink Mar 14, 2025
f875cd9
DASB: SQCodec: Fixes
flexthink Mar 14, 2025
69b346b
DASB: SQCodec: Update to predict everything autoregressively
flexthink Mar 15, 2025
f51b3a8
DASB: VALL-E: Fixes
flexthink Mar 15, 2025
1f05e76
DASB: SQCodec: Fixes, add LibriTTS
flexthink Mar 16, 2025
9011781
DASB: SQCodec updates
flexthink Mar 16, 2025
9a75652
DASB: VALL-E fixes
flexthink Mar 16, 2025
add349a
DASB: Fixes
flexthink Mar 17, 2025
331bad0
DASB: Train dataset data loader fix
flexthink Mar 18, 2025
3e60df4
DASB: VALL-E: SQ-Codec: Implemented token mode - with shifts.
flexthink Mar 20, 2025
797b430
DASB: SQ-Codec: Fixes
flexthink Mar 20, 2025
8a5137b
DASB: SQ-Codec: Add hybrid embeddings (learned for text, ternary for …
flexthink Mar 20, 2025
98651cb
DASB: Hybrid fixes
flexthink Mar 20, 2025
0b2877b
DASB: SQ-Codec: Fixed allocation
flexthink Mar 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 2 additions & 41 deletions benchmarks/DASB/LJSpeech/TTS/tokotron/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,17 +51,7 @@ def __init__(self, hparams, create_waveform_fn, device):
else:
self.evaluators = {}

bulk_evaluators = getattr(self.hparams, "bulk_evaluators", {})
if bulk_evaluators:
self.bulk_evaluators = {
key: evaluator_f()
for key, evaluator_f in bulk_evaluators.items()
if key in self.enabled_evaluators
}
else:
self.bulk_evaluators = {}

if not self.evaluators and not self.bulk_evaluators:
if not self.evaluators:
logger.warn(
"No evaluators were defined - this run will produce samples only"
)
Expand Down Expand Up @@ -98,9 +88,7 @@ def on_evaluate_start(self, stage, epoch):
self.create_reports()
self.modules.model.show_inference_progress = False
self.item_ids = []
details_keys = list(self.evaluators.keys()) + list(
self.bulk_evaluators.keys()
)
details_keys = list(self.evaluators.keys())
self.details = {evaluator_key: [] for evaluator_key in details_keys}
self.sample_text = []
self.sample_file_names = []
Expand Down Expand Up @@ -141,7 +129,6 @@ def on_evaluate_end(self):
dataset : speechbrain.dataio.dataset.DynamicItemDataset
a dataset
"""
self.evaluate_bulk()
self.write_summary()
logger.info("Evaluation done")

Expand Down Expand Up @@ -182,19 +169,6 @@ def get_report_columns(self, evaluator_key):
wavs_ref=bogus_wavs,
length_ref=bogus_length,
)
else:
bogus_file_name = self.output_folder / "bogus.wav"
evaluator = self.bulk_evaluators[evaluator_key]
sb.dataio.dataio.write_audio(
str(bogus_file_name),
bogus_wavs[0].cpu(),
samplerate=self.hparams.model_sample_rate,
)
result = evaluator.evaluate_files(
file_names=[bogus_file_name],
text=["BOGUS"],
file_names_ref=[bogus_file_name],
)

return ["uttid"] + list(result.details.keys())

Expand Down Expand Up @@ -228,19 +202,6 @@ def evaluate_batch(self, batch):
self.write_result(evaluator_key, batch.uttid, details)
self.details[evaluator_key].extend(details)

def evaluate_bulk(self):
"""Runs all configured bulk evaluators, which evaluate a directory
of files - rather than one file at a time"""
for evaluator_key, evaluator in self.bulk_evaluators.items():
result = evaluator.evaluate_files(
file_names=self.sample_file_names,
text=self.sample_text,
file_names_ref=self.ref_file_names,
)
self.details[evaluator_key].append(result.details)
details = undo_batch(result.details)
self.write_result(evaluator_key, self.item_ids, details)

def write_result(self, evaluator_key, uttid, details):
"""Outputs the result details to the report for the specified evaluator

Expand Down
76 changes: 41 additions & 35 deletions benchmarks/DASB/LJSpeech/TTS/tokotron/hparams/eval.yaml
Original file line number Diff line number Diff line change
@@ -1,50 +1,56 @@
# ############################################################################
# Evaluation Hyperparameters
# Common to old models, appended to main hyperparameters
#
# Authors: Artem Ploujnikov
# ############################################################################

eval_enabled: True
eval_sample_rate: 16000
eval_samples: null
eval_interval: 1
eval_asr_type: whisper
eval_asr_source: !apply:speechbrain.utils.hparams.choice
value: !ref <eval_asr_type>
choices:
encoder_decoder: speechbrain/asr-transformer-transformerlm-librispeech
whisper: openai/whisper-small
eval_asr_source: openai/whisper-small
evaluations: utmos,asr
tmp_folder: null
utmos_batch_size: 8
utmos_model_path: ./utmos
utmos_ckpt_name: epoch=3-step=7459.ckpt
utmos_ckpt_path: !ref <utmos_model_path>/<utmos_ckpt_name>
utmos_use_python: True
utmos_script: predict.py


eval_asr: !apply:speechbrain.utils.hparams.choice
value: !ref <eval_asr_type>
choices:
encoder_decoder: !name:eval.EncoderDecoderASRSpeechEvaluator
source: !ref <eval_asr_source>
sample_rate: !ref <eval_sample_rate>
overrides:
lm_weight: 0.0
whisper: !name:eval.WhisperASRSpeechEvaluator
source: !ref <eval_asr_source>
sample_rate: !ref <eval_sample_rate>
savedir: !ref <pretrained_model_save_folder>
eval_utmos_source: chaanks/wav2vec2-small
eval_utmos_save_path: !ref <pretrained_model_save_folder>/utmos
eval_utmos_model_name: utmos.ckpt
eval_utmos_model_url: https://huggingface.co/chaanks/UTMOS/resolve/main
eval_utmos_domain_id: null
eval_utmos_judge_id: null
eval_perf: False


eval_utmos: !name:eval.UTMOSSpeechEvaluator
source: !ref <eval_utmos_source>
save_path: !ref <eval_utmos_save_path>
model_name: !ref <eval_utmos_model_name>
model_url: !ref <eval_utmos_model_url>
domain_id: !ref <eval_utmos_domain_id>
judge_id: !ref <eval_utmos_judge_id>

eval_asr: !name:eval.WhisperASRSpeechEvaluator
source: !ref <eval_asr_source>
sample_rate: !ref <eval_sample_rate>
savedir: !ref <pretrained_model_save_folder>

evaluators:
utmos: !ref <eval_utmos>
asr: !ref <eval_asr>

bulk_evaluators:
utmos: !name:eval.UTMOSSpeechEvaluator
model_path: !ref <utmos_model_path>
output_folder: !ref <output_folder>
ckpt_path: !ref <utmos_ckpt_path>
batch_size: !ref <utmos_batch_size>
script: !ref <utmos_script>
use_python: !ref <utmos_use_python>
tmp_folder: !ref <tmp_folder>

eval_summary:
asr:
descriptive: ["wer", "cer", "wer_ref", "cer_ref", "dwer", "dcer"]
utmos:
descriptive: ["utmos"]

eval_summary_log:
utmos: utmos_utmos_mean
dwer: asr_dwer_median

eval_threshold:
dwer_max: 90.0

eval_threshold_set:
utmos: 0.0
79 changes: 41 additions & 38 deletions benchmarks/DASB/LJSpeech/TTS/tokotron/hparams/train_dac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,23 @@ experiment_name: tokotron/dac
# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 74443
__set_seed: !apply:torch.manual_seed [!ref <seed>]
run_name: !PLACEHOLDER
output_folder: !ref results/<experiment_name>/<seed>
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt
testing: True # If set to True, the test evlaution is done, otherwise skipped.


token_model_src: "facebook/encodec_24khz"
g2p_src: flexthink/soundchoice-g2p
vocoder_type: encodec
vocoder_src: "charactr/vocos-encodec-24khz"

# Model type
representation_mode: discrete

# Data files
data_folder: !PLACEHOLDER # e.g., /path/to/LibriSpeech
prepare_save_folder: !ref <data_folder>/prepared/dac
data_folder: !PLACEHOLDER
cached_data_folder: !PLACEHOLDER
prepare_save_folder: !ref <cached_data_folder>
pretrained_model_save_folder: !ref <prepare_save_folder>
prepare_archive_path: null
prepare_skip_ignore_folders: False
Expand All @@ -29,16 +34,27 @@ test_json: !ref <prepare_save_folder>/test.json
frozen_split_path: null
sample_path: null
progress_folder: !ref <output_folder>/progress
progress_archive: !ref <progress_folder>/progress.tar
progress_current: !ref <progress_folder>/current
progress_meta: !ref <progress_folder>/meta.yaml
num_audio_samples: 32
samples_interval: 5

splits: ["train", "valid", "test"]
split_ratio: [90, 5, 5]

tokens_folder: !PLACEHOLDER # Path to the folder where extracted tokens are saved.

tokens_loader: !new:utils.tokens.TokensLoader
data_path: !ref <tokens_folder>

token_model_kwargs:
n_quantizers: !ref <audio_tokens_per_step>

splits: ["train", "valid", "test"]
split_ratio: [90, 5, 5]
ckpt_key: dwer
ckpt_key_kind: min
ckpt_keep: 2
test_key: null
test_key_kind: min
ckpt_interval_minutes: 30 # save checkpoint every N min

# Training parameters
Expand All @@ -61,7 +77,7 @@ bos_index: 0
bos_width: 1

# stages related parameters
lr: 0.001
lr: 0.001 # @orion_step1: --lr~"loguniform(0.00001,0.005)"
lr_warmup_steps: 10000
lr_annealing_mode: step
guided_attention_weight: 50.0
Expand All @@ -85,33 +101,22 @@ model_bitrate: 8kbps

# Label encoder
label_encoder: !new:speechbrain.dataio.encoder.TextEncoder
token_list_file_text: ./hparams/char_en.txt
token_list_file_phn: ./hparams/arpabet.txt
token_list_file_text: char_en.txt
token_list_file_phn: arpabet.txt
token_list_file: !apply:speechbrain.utils.hparams.choice
value: !ref <input>
choices:
text: !ref <token_list_file_text>
phonemes: !ref <token_list_file_phn>

# Gate offset
gate_offset: !apply:Tokotron.distance_diff_loss_ramp
gate_offset: !apply:model.Tokotron.distance_diff_loss_ramp
beta: !ref <gate_loss_beta>
gamma: !ref <gate_loss_gamma>
max_weight: !ref <gate_loss_max_weight>

silence_padding: !ref <gate_offset>

# Token model (pretrained)
dac: !new:speechbrain.lobes.models.discrete.dac.DAC
sample_rate: !ref <model_sample_rate>
model_type: !ref <model_type>
model_bitrate: !ref <model_bitrate>
load_pretrained: True

# Token model (pretrained)
token_model: !new:Tokotron.DACFeatureExtractor
dac: !ref <dac>
n_quantizers: !ref <audio_tokens_per_step>

# Dataloader options
train_dataloader_opts:
Expand Down Expand Up @@ -143,20 +148,13 @@ sample_dataloader_opts:
padding_kwargs:
value: !ref <pad_index>

extract_features_opts:
dataloader_opts:
batch_size: !ref <batch_size>
token_model: !ref <token_model>
sample_rate: !ref <sample_rate>
model_sample_rate: !ref <model_sample_rate>


####################### Model parameters ###########################
# Transformer
d_model: 512
nhead: 4
enc_num_layers: 6
dec_num_layers: 12
enc_num_layers: 6 # @orion_step1: --enc_num_layers~"choices([3, 6, 12])"
dec_num_layers: 12 # @orion_step1: --dec_num_layers~"choices([3, 6, 12])"
d_ffn: 2048
transformer_dropout: 0.2
target_dropout: 0.2
Expand All @@ -165,6 +163,7 @@ audio_num_tokens: 1024
audio_emb_size: 1024
audio_emb_freeze: False
audio_emb_pretrained: False
audio_token_offsets: False
text_num_tokens: 39
phn_num_tokens: 52
input_num_tokens: !apply:speechbrain.utils.hparams.choice
Expand All @@ -178,7 +177,7 @@ attention_type: regularMHA

############################## models ################################

model: !new:Tokotron.TokotronTransformerModel # yamllint disable-line rule:line-length
model: !new:model.Tokotron.TokotronTransformerModel # yamllint disable-line rule:line-length
input_num_tokens: !ref <input_num_tokens>
audio_num_tokens: !ref <audio_num_tokens>
audio_tokens_per_step: !ref <audio_tokens_per_step>
Expand All @@ -198,15 +197,23 @@ model: !new:Tokotron.TokotronTransformerModel # yamllint disable-line rule:line
max_audio_length: !ref <max_audio_length>
infer_max_audio_length: !ref <infer_max_audio_length>

tokenizer: !new:utils.tokenizer_interface.DACTokenizer
model_type: !ref <model_type>
model_bitrate: !ref <model_bitrate>
n_codebooks: !ref <audio_tokens_per_step>
load_pretrained: True
tag: latest


modules:
model: !ref <model>
dac: !ref <dac>
tokenizer: !ref <tokenizer>

# define two optimizers here for two-stage training
opt_class: !name:torch.optim.Adam
lr: !ref <lr>

compute_cost: !new:Tokotron.TokotronLoss
compute_cost: !new:model.Tokotron.TokotronLoss
guided_attention_weight: !ref <guided_attention_weight>
guided_attention_sigma: !ref <guided_attention_sigma>
gate_weight: !ref <gate_loss_weight>
Expand All @@ -226,10 +233,6 @@ checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
lr_scheduler: !ref <lr_annealing>
counter: !ref <epoch_counter>

freezer: !new:preparation.Freezer
save_path: !ref <prepare_save_folder>
archive_path: !ref <prepare_archive_path>

epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
limit: !ref <number_of_epochs>

Expand Down
Loading