🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding Dino
- StyleTTS 2: High-quality speech synthesis
- Grounding DINO: Zero-shot object detection
🛠️ Other improvements
🤗 New contributors

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

See #1148 for more information and here for the list of supported models.

First, install the kokoro-js library, which uses Transformers.js, from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_bella",
});
audio.save("audio.wav");

Grounding DINO for zero-shot object detection

See #1137 for more information and here for the list of supported models.

Example: Zero-shot object detection with onnx-community/grounding-dino-tiny-ONNX using the pipeline API.

import { pipeline } from "@huggingface/transformers";

const detector = await pipeline("zero-shot-object-detection", "onnx-community/grounding-dino-tiny-ONNX");

const url = "http://images.cocodataset.org/val2017/000000039769.jpg";
const candidate_labels = ["a cat."];
const output = await detector(url, candidate_labels, {
  threshold: 0.3,
});

See example output

[
  { score: 0.45316222310066223, label: "a cat", box: { xmin: 343, ymin: 23, xmax: 637, ymax: 372 } },
  { score: 0.36190420389175415, label: "a cat", box: { xmin: 12, ymin: 52, xmax: 317, ymax: 472 } },
]

🛠️ Other improvements

Add the RawAudio class by @Th3G33k in #682
Update React guide for v3 by @sroussey in #1128
Add option to skip special tokens in TextStreamer by @sroussey in #1139

🤗 New contributors

@sroussey made their first contribution in #1128

Full Changelog: 3.2.4...3.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.3.0