Skip to content

Latest commit

ย 

History

History
78 lines (60 loc) ยท 4.06 KB

snakers4_silero-models_stt.md

File metadata and controls

78 lines (60 loc) ยท 4.06 KB
layout background-class body-class category title summary image author tags github-link github-id featured_image_1 featured_image_2 accelerator demo-model-link
hub_detail
hub-background
hub
researchers
Silero Speech-To-Text Models
A set of compact enterprise-grade pre-trained STT Models for multiple languages.
silero_logo.jpg
Silero AI Team
audio
scriptable
snakers4/silero-models
silero_stt_model.jpg
silero_imagenet_moment.png
cuda-optional
# PyTorch์˜ ์ ์ ˆํ•œ ๋ฒ„์ „์ด ์ด๋ฏธ ์„ค์น˜๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
pip install -q torchaudio omegaconf soundfile
import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu์—์„œ๋„ ์ž˜ ๋Œ์•„๊ฐ€์ง€๋งŒ, cpu์—์„œ๋„ ์ถฉ๋ถ„ํžˆ ๋น ๋ฆ…๋‹ˆ๋‹ค.

model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # 'de', 'es'๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํ•จ์ˆ˜ ์‹œ๊ทธ๋‹ˆ์ฒ˜(function signature)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

# TorchAudio์™€ ํ˜ธํ™˜๋˜๋Š” ํ˜•์‹(์‚ฌ์šด๋“œ ํŒŒ์ผ ๋ฐฑ์—”๋“œ)์ค‘ ํ•˜๋‚˜์˜ ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

๋ชจ๋ธ ์„ค๋ช…

Silero Speech-To-Text ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์—ฌ๋Ÿฌ ์–ธ์–ด์— ๋Œ€ํ•ด ์†Œํ˜• ํผ ํŒฉํ„ฐ ํ˜•ํƒœ๋กœ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ๊ธ‰ STT๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ASR ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ๋‹ค์–‘ํ•œ ๋ฐฉ์–ธ, ์ฝ”๋ฑ, ๋„๋ฉ”์ธ, ๋…ธ์ด์ฆˆ, ๋‚ฎ์€ ์ƒ˜ํ”Œ๋ง ์†๋„์— ๊ฐ•์ธํ•ฉ๋‹ˆ๋‹ค(๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด ์˜ค๋””์˜ค๋Š” 16kHz๋กœ ๋‹ค์‹œ ์ƒ˜ํ”Œ๋งํ•ด์•ผ ํ•จ). ๋ชจ๋ธ์€ ์ƒ˜ํ”Œ ํ˜•ํƒœ์˜ ์ •๊ทœํ™”๋œ ์˜ค๋””์˜ค(์ฆ‰, [-1, 1] ๋ฒ”์œ„๋กœ์˜ ์ •๊ทœํ™”๋ฅผ ์ œ์™ธํ•œ ์–ด๋–ค ์ „์ฒ˜๋ฆฌ ์—†์ด)์™€ ํ† ํฐ ํ™•๋ฅ ์ด ์žˆ๋Š” ์ถœ๋ ฅ ํ”„๋ ˆ์ž„์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด ๋””์ฝ”๋” ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ์ž์ฒด์— ํฌํ•จํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ž๋ง‰์ด ๊ฒฐํ•ฉ๋œ ๋ชจ๋“ˆ์€, ํŠน์ •ํ•œ ๋‚ด๋ณด๋‚ด๊ธฐ ์ƒํ™ฉ์—์„œ ๋ ˆ์ด๋ธ”๊ฐ™์€ ๋ชจ๋ธ์˜ ์ƒ์„ฑ๋ฌผ์„ ์ €์žฅํ•  ๋•Œ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Speech์—์„œ Open-STT์™€ Silero Models์— ๋Œ€ํ•œ ๋…ธ๋ ฅ์ด ImageNet ๊ฐ™์€ ์ˆœ๊ฐ„์— ๋‹ค๊ฐ€๊ฐ€๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

์ง€์›๋˜๋Š” ์–ธ์–ด ๋ฐ ํ˜•์‹

์ง€์›๋˜๋Š” ์–ธ์–ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • English
  • German
  • Spanish

ํ•ญ์ƒ ์ตœ์‹  ์ง€์› ์–ธ์–ด ๋ชฉ๋ก์„ ๋ณด๋ ค๋ฉด repo๋ฅผ ๋ฐฉ๋ฌธํ•˜์—ฌ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ฒดํฌํฌ์ธํŠธ์— ๋Œ€ํ•œ yml file์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค . To see the always up-to-date language list, please visit our repo and see the yml file for all available checkpoints.

์ถ”๊ฐ€ ์˜ˆ์ œ ๋ฐ ๋ฒค์น˜๋งˆํฌ

์ถ”๊ฐ€ ์˜ˆ์ œ ๋ฐ ๊ธฐํƒ€ ๋ชจ๋ธ ํ˜•์‹์„ ๋ณด๋ ค๋ฉด ์ด link๋ฅผ ๋ฐฉ๋ฌธํ•˜์‹ญ์‹œ์˜ค. ํ’ˆ์งˆ ๋ฐ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ๋Š” wiki๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค. ๊ด€๋ จ ์ž๋ฃŒ๋Š” ์ˆ˜์‹œ๋กœ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ ๋ฌธํ—Œ