NoteTaker

This project is meant to eventually be a webapp, but the general idea is to take an .mp3 (or other audio format) file of a lecture or a meeting, generate a text transcript of that audio file, and create notes for that meeting or lecture.

Status:

Both the OpenAI and Open Source versions work well. My main goal now is to create a web interface for everything first (for both iterations). Then, I can work more on optimizing the format of notes that the model makes, and then eventually work on having a chat feature.

OpenAI Iteration:

Getting Started

Ensure you are in the OpenAI-Notetaker directory

pip install -r requirements.txt

OpenAI-Notetaker/app.py takes the following commandline arguments:
- api-key: This is a key generated by OpenAI API, this is required to run the program. (Eventually this should support using an environment variable but does not at this moment.)
- audio-file: This is an audio file that you want to be transcribed and notes taken on. You must either give a relative path based on the current working directory or an absolute path.

python app.py --api-key "YOUR_API_KEY" --audio-file "path/to/audio/file"

Open Source Iteration:

Getting Started

You must be on a linux machine with a CUDA-enabled GPU
CUDA toolkit must be installed
export CUDA_HOME='<path/to/cuda>

# There are redundancies between these commands and the requirements.txt but this just ensures proper setup
pip install packaging
pip install wheel
pip install ninja
pip install torch torchvision torchaudio
pip install setuptools
pip install flash-attn --no-build-isolation
pip install -r requirements.txt

OpenSource-Notetaker/app.py takes the following commandline arguments:
- whisper: This is an optional string representing which whisper model to load.
- audio-file: This is an audio file that you want to be transcribed and notes taken on. You must either give a relative path based on the current working directory or an absolute path.
- model: This is the HugginFace path to the LLM that you want to use to take notes with. Default is microsoft/Phi-3-mini-128k-instruct.
There is currently no option for customizing the language model to be used, this could be an addition in a later iteration, but Phi3-mini-128k is used by default.
Suggested requirements for the default implementation of this is an RTX-3080ti GPU, or any NVIDIA GPU with >= 12GB of VRAM

# From root directory (/Notetaker)
python OpenSource-Notetaker/app.py --audio-file "path/to/audio/file"

Issues:

I need to figure out how to:

~~Split a single audio file into workable batches~~
~~Ensure those batches have a dimension of 1~~
~~Figure out model training. (None of this is needed with whisper)~~
~~Turns out whisper is open source:~~ https://github.com/openai/whisper

Todo:

~~Implement open source whisper and then use huggingface models for the notetaking portion.~~
~~Fix issues (see above)~~
~~Create a pipeline for taking the transcription and creating (hopefully formatted) notes~~
Turn it into a webapp (Flask)
- Following this tutorial to add auth: https://www.digitalocean.com/community/tutorials/how-to-add-authentication-to-your-app-with-flask-login#step-7-setting-up-the-authorization-function
- Need to create something that encrypts api keys in the database and decrypts them upon login
- Need an upload page
Make use of environment variables for encryption secret key, database stuff, etc.
Need to format output files so they can be utilized by RAG later
Add a chat interface (implement chat functionality with GPT-3.5 and later open source transformers)
Implement RAG that accesses current user's transcriptions
Add GraphRAG from Microsoft

Datasets to Train On (Out of Scope)

TED-LIUM (https://www.openslr.org/51)
LibriSpeech ASR (https://openslr.org/12)
Audio-MNIST (https://github.com/soerenab/AudioMNIST)

Instructions (Outdated)

Create a virtual environment with Python 3.8 and then enter the following commands:

brew install ffmpeg # or use apt for linux
pip3 install requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
OpenAI-Notetaker		OpenAI-Notetaker
OpenSource-Notetaker		OpenSource-Notetaker
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
auth.py		auth.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoteTaker

Status:

OpenAI Iteration:

Getting Started

Open Source Iteration:

Getting Started

Issues:

Todo:

Datasets to Train On (Out of Scope)

Instructions (Outdated)

About

Releases

Packages

Languages

License

Log45/Notetaker

Folders and files

Latest commit

History

Repository files navigation

NoteTaker

Status:

OpenAI Iteration:

Getting Started

Open Source Iteration:

Getting Started

Issues:

Todo:

Datasets to Train On (Out of Scope)

Instructions (Outdated)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages