JVM library for speech generation written in Kotlin and based on the C++ libraries bark.cpp and piper
-
Stretch to change the speed of audio without changing the pitch
-
Voice Activity Detection to extract speech from audio
-
Speech recognition to transcribe audio to text
-
Text generation to generate text from prompt
-
Noise reduction to remove noise from audio
- Generates PCM speech audio data from a string
- Supports any sampling rate and number of channels due to resampling and downmixing
-
Download latest release
-
Add library dependency
dependencies { implementation(file("/path/to/jar")) }
-
Unzip binaries
-
Add dependencies
dependencies { implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0") implementation("com.google.code.gson:gson:2.11.0") }
-
Download one of the voices here or use any other compatible voice
See the example module for implementation details
- Call
generate
to process the input string and get the generated speech
-
Load binaries
-
Bark
-
CPU
SpeechGeneration.Bark.loadCPU( ggmlBase = "/path/to/ggml-base", ggmlCpu = "/path/to/ggml-cpu", ggml = "/path/to/ggml", speechGenerationBark = "/path/to/speech-generation-bark", )
-
CUDA
SpeechGeneration.Bark.loadCUDA( ggmlBase = "/path/to/ggml-base", ggmlCpu = "/path/to/ggml-cpu", ggmlCuda = "/path/to/ggml-cuda", ggml = "/path/to/ggml", speechGenerationBark = "/path/to/speech-generation-bark", )
-
-
Piper
SpeechGeneration.Piper.load( espeak = "/path/to/espeak-ng", speechGenerationPiper = "/path/to/speech-generation-piper", )
-
-
Create an instance
-
Bark
SpeechGeneration.Bark.create( modelPath = "/path/to/model", )
-
Piper
SpeechGeneration.Piper.create( modelPath = "/path/to/model", configurationPath = "/path/to/configuration", )
-
-
Call
sampleRate
to get the audio producer sample rate -
Call
generate
to process the input string and get the generated speech -
Call
close
to release resources
- JVM version 9 or higher
This project is licensed under the Apache License 2.0