About • Installation • How To Use • Credits • License
This repository contains an implementation of an intelligent voice assistant. The solution is based on the combination of Automatic Speech Recognition, Text To Speech, and LLM models.
See the LauzHack Workshop with the discussion on how to create intelligent voice assistants.
To install the assistant, follow these steps:
-
(Optional) Create and activate new environment using
conda
orvenv
(+pyenv
).a.
conda
version:# create env conda create -n project_env python=PYTHON_VERSION # activate env conda activate project_env
b.
venv
(+pyenv
) version:# create env ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env # alternatively, using default python version python3 -m venv project_env # activate env source project_env
-
Install all required packages
pip install -r requirements.txt
-
(Optional) Install
pre-commit
:pre-commit install
-
Create an API key in Groq. Create a new file named
.env
in the root directory and copy-paste your API key into it.
To record and play sound, you need to define your hardware settings. See more in the PyTorch documentation (information about ffmpeg
specifically) and this tutorial. Usually, the format is alsa
for linux systems and avfoundation
for mac systems.
When the hardware is known, you can start AI AudioBot using this command:
python3 run.py stream_reader.source=YOUR_MICROPHONE \
stream_reader.format=YOUR_FORMAT \
stream_writer.format=YOUR_FORMAT
You can also change other parameters via Hydra options. See src/configs/audio_bot.yaml
.
HuggingFace was used for ASR and TTS models (Spectrogram Generator and Vocoder). Groq API with llama-3-8b-8192 model was used for LLM. The KWS model is taken from the 2022 version of the HSE DLA Course.