Adding multilingual support via fine-tuning code #23

loretoparisi · 2024-10-22T21:12:26Z

While both tiny and base model achieve impressive performances 💯 in terms of WER on academics datasets when compared to the "standard-de-facto" (e.g. Whisper), in a real world scenery a multi-lingual model would address more use cases.
Fine-tune train code could eventually enable support by the community.

evmaki · 2024-10-22T23:03:40Z

Yes, I think releasing some fine-tuning code would be beneficial. I can add it to our TODO list.

bil-ash · 2024-10-23T01:23:44Z

@evmaki Please provide the code for finetuning, would be beneficial for extending to other languages.

zalastone · 2024-10-27T17:11:55Z

Korean and Mandarin will boost their popularity in ASR use cases . it's great work guys

pprobst · 2024-10-28T18:46:34Z

+1

wredan · 2024-10-30T15:20:13Z

Results seem promising. Hope to have the multilingual support or some fine-tune code soon, thank you for your work :)

bil-ash · 2024-12-04T01:58:46Z

Any updates on this?

loretoparisi · 2024-12-04T23:05:27Z

The reason why this (multilingual support) is extremely important can been see from this simple chart related to whisper fine-tune for downstream tasks of transcription (WER) and translation (BLEU) and the correlation to the amount (in hours) of audio transcribed or translated respectively. By example while Spanish (ES) exhibits the Best WER (2.5) for speech recognition it has the Lowest BLEU score (24) for translation, while German (DE) has balanced performance (WER: 4, BLEU: 35), or - viceversa Portuguese (PT) shows the High BLEU score (39) for translation but a relatively low WER (4) for speech recognition, etc.

Approximative data derived from Robust Speech Recognition via Large-Scale Weak Supervision

evmaki mentioned this issue Oct 23, 2024

Training code / methods? #20

Closed

evmaki changed the title ~~Multilingual support~~ Multilingual support via fine-tuning code Oct 23, 2024

evmaki added the enhancement New feature or request label Oct 23, 2024

evmaki changed the title ~~Multilingual support via fine-tuning code~~ Adding multilingual support via fine-tuning code Oct 23, 2024

evmaki mentioned this issue Oct 25, 2024

Support Chinese? #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding multilingual support via fine-tuning code #23

Adding multilingual support via fine-tuning code #23

loretoparisi commented Oct 22, 2024

evmaki commented Oct 22, 2024

bil-ash commented Oct 23, 2024

zalastone commented Oct 27, 2024

pprobst commented Oct 28, 2024

wredan commented Oct 30, 2024

bil-ash commented Dec 4, 2024

loretoparisi commented Dec 4, 2024

Adding multilingual support via fine-tuning code #23

Adding multilingual support via fine-tuning code #23

Comments

loretoparisi commented Oct 22, 2024

evmaki commented Oct 22, 2024

bil-ash commented Oct 23, 2024

zalastone commented Oct 27, 2024

pprobst commented Oct 28, 2024

wredan commented Oct 30, 2024

bil-ash commented Dec 4, 2024

loretoparisi commented Dec 4, 2024