Skip to content

numq/speech-generation

Repository files navigation

Speech generation

JVM library for speech generation written in Kotlin and based on the C++ libraries bark.cpp and piper

See also

Features

  • Generates PCM speech audio data from a string
  • Supports any sampling rate and number of channels due to resampling and downmixing

Installation

  • Download latest release

  • Add library dependency

    dependencies {
        implementation(file("/path/to/jar"))
    }
  • Unzip binaries

Piper

  • Add dependencies

    dependencies {
         implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0")
         implementation("com.google.code.gson:gson:2.11.0")
    }
  • Download one of the voices here or use any other compatible voice

Usage

TL;DR

See the example module for implementation details

  • Call generate to process the input string and get the generated speech

Step-by-step

  • Load binaries

    • Bark

      • CPU

        SpeechGeneration.Bark.loadCPU(
            ggmlBase = "/path/to/ggml-base", 
            ggmlCpu = "/path/to/ggml-cpu",
            ggml = "/path/to/ggml",
            speechGenerationBark = "/path/to/speech-generation-bark",
        )
      • CUDA

        SpeechGeneration.Bark.loadCUDA(
            ggmlBase = "/path/to/ggml-base", 
            ggmlCpu = "/path/to/ggml-cpu",
            ggmlCuda = "/path/to/ggml-cuda",
            ggml = "/path/to/ggml",
            speechGenerationBark = "/path/to/speech-generation-bark",
        )
    • Piper

      SpeechGeneration.Piper.load(
        espeak = "/path/to/espeak-ng",
        speechGenerationPiper = "/path/to/speech-generation-piper",
      )
  • Create an instance

    • Bark

      SpeechGeneration.Bark.create(
          modelPath = "/path/to/model",
      )
    • Piper

      SpeechGeneration.Piper.create(
          modelPath = "/path/to/model",
          configurationPath = "/path/to/configuration",
      )
  • Call sampleRate to get the audio producer sample rate

  • Call generate to process the input string and get the generated speech

  • Call close to release resources

Requirements

  • JVM version 9 or higher

License

This project is licensed under the Apache License 2.0

Acknowledgments

About

JVM library for speech generation written in Kotlin and based on the C++ libraries bark.cpp and piper

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published