Extract audio from powerpoints to text
Requires Python 3. Then install the requirements.
pip install -r requirements.txt
- Copy a .pptx file into the
py_extractor
directory python extractor.py
- The extracted audio files can be found in
results/<PPTX_FILE_NAME>
- After
extractor.py
finishes, in the extracted audio files directory will bejoinall.py
python joinall.py
joined.txt
contains the transcribed audio, numbered with the corresponding slide number
If the powerpoint doesn't have audio on every slide, you will have to add the slide numbers to the skip
list in joinall.py
. There is currently no automation for this.