Skip to content

Latest commit

Β 

History

History
99 lines (82 loc) Β· 4.33 KB

nvidia_deeplearningexamples_tacotron2.md

File metadata and controls

99 lines (82 loc) Β· 4.33 KB
layout background-class body-class title summary category image author tags github-link github-id featured_image_1 featured_image_2 accelerator order demo-model-link
hub_detail
hub-background
hub
Tacotron 2
The Tacotron 2 model for generating mel spectrograms from text
researchers
nvidia_logo.png
NVIDIA
audio
NVIDIA/DeepLearningExamples
tacotron2_diagram.png
no-image
cuda
10

λͺ¨λΈ μ„€λͺ…

Tacotron 2 및 WaveGlow λͺ¨λΈμ€ μΆ”κ°€ 운율 정보 없이 원본 ν…μŠ€νŠΈμ—μ„œ μžμ—°μŠ€λŸ¬μš΄ μŒμ„±μ„ ν•©μ„±ν•  수 μžˆλŠ” ν…μŠ€νŠΈ μŒμ„± λ³€ν™˜ μ‹œμŠ€ν…œμ„ λ§Œλ“­λ‹ˆλ‹€. Tacotron 2 λͺ¨λΈμ€ 인코더-디코더 μ•„ν‚€ν…μ²˜λ₯Ό μ‚¬μš©ν•˜μ—¬ μž…λ ₯ ν…μŠ€νŠΈμ—μ„œ 멜 μŠ€νŽ™νŠΈλ‘œκ·Έλž¨(mel spectrogram)을 μƒμ„±ν•©λ‹ˆλ‹€. WaveGlow (torch.hubλ₯Ό ν†΅ν•΄μ„œλ„ μ‚¬μš© κ°€λŠ₯)λŠ” 멜 μŠ€νŽ™νŠΈλ‘œκ·Έλž¨μ„ μ‚¬μš©ν•˜μ—¬ μŒμ„±μ„ μƒμ„±ν•˜λŠ” 흐름 기반(flow-based) λͺ¨λΈμž…λ‹ˆλ‹€.

사전 ν›ˆλ ¨λœ Tacotron 2 λͺ¨λΈμ€ λ…Όλ¬Έκ³Ό λ‹€λ₯΄κ²Œ κ΅¬ν˜„λ˜μ—ˆμŠ΅λ‹ˆλ‹€. μ—¬κΈ°μ„œ μ œκ³΅ν•˜λŠ” λͺ¨λΈμ—μ„œλŠ” LSTM λ ˆμ΄μ–΄λ₯Ό μ •κ·œν™”ν•˜κΈ° μœ„ν•΄ Zoneout λŒ€μ‹  Dropout을 μ‚¬μš©ν•©λ‹ˆλ‹€.

μ˜ˆμ‹œ 사둀

μ•„λž˜ μ˜ˆμ œμ—μ„œλŠ”:

  • 사전 ν›ˆλ ¨λœ Tacotron2 및 Waveglow λͺ¨λΈμ€ torch.hubμ—μ„œ κ°€μ Έμ˜΅λ‹ˆλ‹€.
  • Tacotron2λŠ” ("Hello world, I miss you so much")와 같은 μž…λ ₯ ν…μŠ€νŠΈμ˜ ν…μ„œ ν‘œν˜„μ΄ 주어지면 κ·Έλ¦Όκ³Ό 같은 멜 μŠ€νŽ™νŠΈλ‘œκ·Έλž¨μ„ μƒμ„±ν•©λ‹ˆλ‹€.
  • WaveglowλŠ” 멜 μŠ€νŽ™νŠΈλ‘œκ·Έλž¨μ—μ„œ μ‚¬μš΄λ“œλ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.
  • 좜λ ₯ μ‚¬μš΄λ“œλŠ” 'audio.wav' νŒŒμΌμ— μ €μž₯λ©λ‹ˆλ‹€.

이 예제λ₯Ό μ‹€ν–‰ν•˜λ €λ©΄ λͺ‡ 가지 μΆ”κ°€ 파이썬 νŒ¨ν‚€μ§€κ°€ μ„€μΉ˜λ˜μ–΄ μžˆμ–΄μ•Ό ν•©λ‹ˆλ‹€. μ΄λŠ” ν…μŠ€νŠΈ 및 μ˜€λ””μ˜€λ₯Ό μ „μ²˜λ¦¬ν•˜λŠ” 것은 λ¬Όλ‘  λ””μŠ€ν”Œλ ˆμ΄ 및 μž…μΆœλ ₯ μ „μ²˜λ¦¬μ—λ„ ν•„μš”ν•©λ‹ˆλ‹€.

pip install numpy scipy librosa unidecode inflect librosa
apt-get update
apt-get install -y libsndfile1

LJ Speech dataset λ°μ΄ν„°μ…‹μ—μ„œ 사전 ν›ˆλ ¨λœ Tacotron2 λͺ¨λΈμ„ 뢈러였고 좔둠을 μ€€λΉ„ν•©λ‹ˆλ‹€.

import torch
tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')
tacotron2 = tacotron2.to('cuda')
tacotron2.eval()

사전 ν›ˆλ ¨λœ WaveGlow λͺ¨λΈ 뢈러였기

waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp16')
waveglow = waveglow.remove_weightnorm(waveglow)
waveglow = waveglow.to('cuda')
waveglow.eval()

λͺ¨λΈμ΄ λ‹€μŒκ³Ό 같이 λ§ν•˜κ²Œ ν•©μ‹œλ‹€.

text = "Hello world, I missed you so much."

μœ ν‹Έλ¦¬ν‹° λ©”μ„œλ“œλ₯Ό μ‚¬μš©ν•˜μ—¬ μž…λ ₯ ν˜•μ‹μ„ μ§€μ •ν•©λ‹ˆλ‹€.

utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')
sequences, lengths = utils.prepare_input_sequence([text])

μ—°κ²°λœ λͺ¨λΈμ„ μ‹€ν–‰ν•©λ‹ˆλ‹€.

with torch.no_grad():
    mel, _, _ = tacotron2.infer(sequences, lengths)
    audio = waveglow.infer(mel)
audio_numpy = audio[0].data.cpu().numpy()
rate = 22050

파일둜 μ €μž₯ν•˜μ—¬ λ“€μ–΄λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

from scipy.io.wavfile import write
write("audio.wav", rate, audio_numpy)

λ˜λŠ” IPython이 μžˆλŠ” λ…ΈνŠΈλΆμ—μ„œ λ°”λ‘œ λ“€μ–΄λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

from IPython.display import Audio
Audio(audio_numpy, rate=rate)

세뢀사항

λͺ¨λΈ μž…λ ₯ 및 좜λ ₯, ν•™μŠ΅ 방법, μΆ”λ‘  및 μ„±λŠ₯ 등에 λŒ€ν•œ 더 μžμ„Έν•œ μ •λ³΄λŠ” github 및 and/or NGCμ—μ„œ λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.

μ°Έκ³ λ¬Έν—Œ