Skip to content

Log45/Notetaker

Repository files navigation

NoteTaker

This project is meant to eventually be a webapp, but the general idea is to take an .mp3 (or other audio format) file of a lecture or a meeting, generate a text transcript of that audio file, and create notes for that meeting or lecture.

Status:

Both the OpenAI and Open Source versions work well. My main goal now is to create a web interface for everything first (for both iterations). Then, I can work more on optimizing the format of notes that the model makes, and then eventually work on having a chat feature.

OpenAI Iteration:

Getting Started

  • Ensure you are in the OpenAI-Notetaker directory
pip install -r requirements.txt
  • OpenAI-Notetaker/app.py takes the following commandline arguments:
    • api-key: This is a key generated by OpenAI API, this is required to run the program. (Eventually this should support using an environment variable but does not at this moment.)
    • audio-file: This is an audio file that you want to be transcribed and notes taken on. You must either give a relative path based on the current working directory or an absolute path.
python app.py --api-key "YOUR_API_KEY" --audio-file "path/to/audio/file"

Open Source Iteration:

Getting Started

  • You must be on a linux machine with a CUDA-enabled GPU
  • CUDA toolkit must be installed
  • export CUDA_HOME='<path/to/cuda>
# There are redundancies between these commands and the requirements.txt but this just ensures proper setup
pip install packaging
pip install wheel
pip install ninja
pip install torch torchvision torchaudio
pip install setuptools
pip install flash-attn --no-build-isolation
pip install -r requirements.txt
  • OpenSource-Notetaker/app.py takes the following commandline arguments:
    • whisper: This is an optional string representing which whisper model to load.
    • audio-file: This is an audio file that you want to be transcribed and notes taken on. You must either give a relative path based on the current working directory or an absolute path.
    • model: This is the HugginFace path to the LLM that you want to use to take notes with. Default is microsoft/Phi-3-mini-128k-instruct.
  • There is currently no option for customizing the language model to be used, this could be an addition in a later iteration, but Phi3-mini-128k is used by default.
  • Suggested requirements for the default implementation of this is an RTX-3080ti GPU, or any NVIDIA GPU with >= 12GB of VRAM
# From root directory (/Notetaker)
python OpenSource-Notetaker/app.py --audio-file "path/to/audio/file"

Issues:

I need to figure out how to:

  1. Split a single audio file into workable batches
  2. Ensure those batches have a dimension of 1
  3. Figure out model training. (None of this is needed with whisper)
  4. Turns out whisper is open source: https://github.com/openai/whisper

Todo:

  1. Implement open source whisper and then use huggingface models for the notetaking portion.
  2. Fix issues (see above)
  3. Create a pipeline for taking the transcription and creating (hopefully formatted) notes
  4. Turn it into a webapp (Flask)
  5. Make use of environment variables for encryption secret key, database stuff, etc.
  6. Need to format output files so they can be utilized by RAG later
  7. Add a chat interface (implement chat functionality with GPT-3.5 and later open source transformers)
  8. Implement RAG that accesses current user's transcriptions
  9. Add GraphRAG from Microsoft

Datasets to Train On (Out of Scope)

Instructions (Outdated)

  • Create a virtual environment with Python 3.8 and then enter the following commands:
brew install ffmpeg # or use apt for linux
pip3 install requirements.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published