Skip to content

3.3.0

Compare
Choose a tag to compare
@xenova xenova released this 15 Jan 13:28
· 8 commits to main since this release
e00ff3b

πŸ”₯ Transformers.js v3.3 β€” StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

πŸ€– New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

See #1148 for more information and here for the list of supported models.

First, install the kokoro-js library, which uses Transformers.js, from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_bella",
});
audio.save("audio.wav");

Grounding DINO for zero-shot object detection

See #1137 for more information and here for the list of supported models.

Example: Zero-shot object detection with onnx-community/grounding-dino-tiny-ONNX using the pipeline API.

import { pipeline } from "@huggingface/transformers";

const detector = await pipeline("zero-shot-object-detection", "onnx-community/grounding-dino-tiny-ONNX");

const url = "http://images.cocodataset.org/val2017/000000039769.jpg";
const candidate_labels = ["a cat."];
const output = await detector(url, candidate_labels, {
  threshold: 0.3,
});
See example output
[
  { score: 0.45316222310066223, label: "a cat", box: { xmin: 343, ymin: 23, xmax: 637, ymax: 372 } },
  { score: 0.36190420389175415, label: "a cat", box: { xmin: 12, ymin: 52, xmax: 317, ymax: 472 } },
]

πŸ› οΈ Other improvements

πŸ€— New contributors

Full Changelog: 3.2.4...3.3.0