Whisper-asr-colab is an aggregation package for speech-to-text and diarization, featuring an example implementation on Google Colab.
The main functions of this package are as follows:
- Speech-to-text (transcription), powered by faster-whisper
- Diarization, powered by pyannote-audio
- Online audio downloading, powered by yt-dlp
- Writing diarization results in docx format, powered by python-docx
Open whisper_asr_colab.ipynb on Google Colab or use the modules as shown below.
from whisper_asr_colab.worker import Worker
from whisper_asr_colab.audio import Audio
audio = "audiofile.m4a"
model_size = "turbo"
hf_token = "your hf token"
worker = Worker(
audio=Audio.from_path_or_url(audio),
model_size=model_size,
hugging_face_token=hf_token,
)
worker.run()