Skip to content

Commit

Permalink
Merge pull request #50 from NavodPeiris/dev
Browse files Browse the repository at this point in the history
add flexible dependency versioning
  • Loading branch information
NavodPeiris authored Oct 9, 2024
2 parents 66dd0f6 + 7eb565d commit fcd9ccb
Show file tree
Hide file tree
Showing 8 changed files with 42 additions and 22 deletions.
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,12 @@ venv
build
dist
speechlib.egg-info
.env
.env

*.swp
*.swo

# By default do not include these files for version control
# Override this by using 'git add -f'
*.wav
*.mp3
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ Transcriptor method takes 7 arguments.

4. model size ("tiny", "small", "medium", "large", "large-v1", "large-v2", "large-v3")

5. ACCESS_TOKEN: huggingface acccess token (also get permission to access `pyannote/[email protected]`)
5. ACCESS_TOKEN: huggingface acccess token
1. Permission to access `pyannote/[email protected]` and `pyannote/segmentation`
2. Token requires permission for 'Read access to contents of all public gated repos you can access'

6. voices_folder (contains speaker voice samples for speaker recognition)

Expand All @@ -86,13 +88,16 @@ transcript will also indicate the timeframe in seconds where each speaker speaks
### Transcription example:

```
import os
from speechlib import Transcriptor
file = "obama_zach.wav" # your audio file
voices_folder = "" # voices folder containing voice samples for recognition
language = "en" # language code
log_folder = "logs" # log folder for storing transcripts
modelSize = "tiny" # size of model to be used [tiny, small, medium, large-v1, large-v2, large-v3]
quantization = False # setting this 'True' may speed up the process but lower the accuracy
ACCESS_TOKEN = "your hf key" # get permission to access pyannote/[email protected] on huggingface
ACCESS_TOKEN = "huggingface api key" # get permission to access pyannote/[email protected] on huggingface
# quantization only works on faster-whisper
transcriptor = Transcriptor(file, log_folder, language, modelSize, ACCESS_TOKEN, voices_folder, quantization)
Expand All @@ -110,7 +115,7 @@ res = transcriptor.custom_whisper("D:/whisper_tiny_model/tiny.pt")
res = transcriptor.huggingface_model("Jingmiao/whisper-small-chinese_base")
# use assembly ai model
res = transcriptor.assemby_ai_model("your api key")
res = transcriptor.assemby_ai_model("assemblyAI api key")
res --> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
```
Expand Down Expand Up @@ -211,4 +216,4 @@ This library uses following huggingface models:

#### https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
#### https://huggingface.co/Ransaka/whisper-tiny-sinhala-20k-8k-steps-v2
#### https://huggingface.co/pyannote/speaker-diarization
#### https://huggingface.co/pyannote/speaker-diarization
3 changes: 2 additions & 1 deletion examples/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ greek_convo_short.mp3
greek_convo_short.wav
my_test.py
greek_convo.mp3
greek_convo.wav
greek_convo.wav
.env
5 changes: 3 additions & 2 deletions examples/transcribe.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
from speechlib import Transcriptor

file = "obama_zach.wav" # your audio file
Expand All @@ -6,7 +7,7 @@
log_folder = "logs" # log folder for storing transcripts
modelSize = "tiny" # size of model to be used [tiny, small, medium, large-v1, large-v2, large-v3]
quantization = False # setting this 'True' may speed up the process but lower the accuracy
ACCESS_TOKEN = "your hf key" # get permission to access pyannote/[email protected] on huggingface
ACCESS_TOKEN = "huggingface api key" # get permission to access pyannote/[email protected] on huggingface

# quantization only works on faster-whisper
transcriptor = Transcriptor(file, log_folder, language, modelSize, ACCESS_TOKEN, voices_folder, quantization)
Expand All @@ -24,4 +25,4 @@
res = transcriptor.huggingface_model("Jingmiao/whisper-small-chinese_base")

# use assembly ai model
res = transcriptor.assemby_ai_model("your api key")
res = transcriptor.assemby_ai_model("assemblyAI api key")
9 changes: 7 additions & 2 deletions library.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,16 @@ transcript will also indicate the timeframe in seconds where each speaker speaks
### Transcription example:

```
import os
from speechlib import Transcriptor
file = "obama_zach.wav" # your audio file
voices_folder = "" # voices folder containing voice samples for recognition
language = "en" # language code
log_folder = "logs" # log folder for storing transcripts
modelSize = "tiny" # size of model to be used [tiny, small, medium, large-v1, large-v2, large-v3]
quantization = False # setting this 'True' may speed up the process but lower the accuracy
ACCESS_TOKEN = "your hf key" # get permission to access pyannote/[email protected] on huggingface
ACCESS_TOKEN = "huggingface api key" # get permission to access pyannote/[email protected] on huggingface
# quantization only works on faster-whisper
transcriptor = Transcriptor(file, log_folder, language, modelSize, ACCESS_TOKEN, voices_folder, quantization)
Expand All @@ -94,7 +97,9 @@ res = transcriptor.custom_whisper("D:/whisper_tiny_model/tiny.pt")
res = transcriptor.huggingface_model("Jingmiao/whisper-small-chinese_base")
# use assembly ai model
res = transcriptor.assemby_ai_model("your api key")
res = transcriptor.assemby_ai_model("assemblyAI api key")
res --> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
```

#### if you don't want speaker names: keep voices_folder as an empty string ""
Expand Down
18 changes: 9 additions & 9 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
transformers==4.36.2
torch==2.1.2
torchaudio==2.1.2
pydub==0.25.1
pyannote.audio==3.1.1
speechbrain==0.5.16
accelerate==0.26.1
faster-whisper==0.10.1
openai-whisper==20231117
transformers>=4.36.2, <5.0.0
torch>=2.1.2, <3.0.0
torchaudio>=2.1.2, <3.0.0
pydub>=0.25.1, <1.0.0
pyannote.audio>=3.1.1, <4.0.0
speechbrain>=0.5.16, <1.0.0
accelerate>=0.26.1, <1.0.0
faster-whisper>=0.10.1, <1.0.0
openai-whisper>=20231117, <20240927
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setup(
name="speechlib",
version="1.1.9",
version="1.1.10",
description="speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names. This library also contain audio preprocessor functions.",
packages=find_packages(),
long_description=long_description,
Expand All @@ -19,7 +19,7 @@
"Programming Language :: Python :: 3.10",
"Operating System :: OS Independent",
],
install_requires=["transformers", "torch", "torchaudio", "pydub", "pyannote.audio", "speechbrain==0.5.16", "accelerate", "faster-whisper", "openai-whisper", "assemblyai"],
install_requires=["transformers>=4.36.2, <5.0.0", "torch>=2.1.2, <3.0.0", "torchaudio>=2.1.2, <3.0.0", "pydub>=0.25.1, <1.0.0", "pyannote.audio>=3.1.1, <4.0.0", "speechbrain>=0.5.16, <1.0.0", "accelerate>=0.26.1, <1.0.0", "faster-whisper>=0.10.1, <1.0.0", "openai-whisper>=20231117, <20240927", "assemblyai"],
python_requires=">=3.8",
)

Expand Down
2 changes: 1 addition & 1 deletion setup_instruction.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ for publishing:
pip install twine

for install locally for testing:
pip install dist/speechlib-1.1.9-py3-none-any.whl
pip install dist/speechlib-1.1.10-py3-none-any.whl

finally run:
twine upload dist/*
Expand Down

0 comments on commit fcd9ccb

Please sign in to comment.