Quick Start Guide - F5-TTS Demo Inference

Fastest Way to Get Started

# 1. List available samples
./demo_infer.sh --list

# 2. Run a quick test
./demo_infer.sh --sample 1 --gen-sample 2

# Done! Audio saved to demo_outputs/

All Available Methods

Method 1: Bash Wrapper (Recommended)

./demo_infer.sh --sample 1 --gen-sample 2

✅ Easiest, works with demo samples, clean output

Method 2: Python CLI

python demo_cli.py --sample 1 --gen-sample 2

✅ Python-based, good for scripting

Method 3: Preset Demos

python demo_inference.py

✅ Runs 4 demos automatically

Common Commands

# List samples
./demo_infer.sh --list

# Basic test
./demo_infer.sh --sample 1 --gen-sample 2

# Custom text (must be Pinyin with tone numbers)
./demo_infer.sh --sample 1 --gen-text "ni3 hao3 shi4 jie4"

# High quality (slower)
./demo_infer.sh --sample 2 --gen-sample 3 --nfe 64 --cfg 2.5

# Fast test (lower quality)
./demo_infer.sh --sample 1 --gen-sample 2 --nfe 16

# Custom output location
./demo_infer.sh --sample 1 --gen-sample 2 --output my_test.wav

Parameters

Parameter	Default	Range	Description
`--nfe`	32	16-64	Quality (higher=better, slower)
`--cfg`	2.0	1.0-3.0	Text faithfulness

What Are The 5 Demo Samples?

Sample	Duration	Description
1	2.90s	Short utterance
2	3.53s	Medium-short
3	3.92s	Medium (median)
4	4.34s	Medium-long
5	5.62s	Long utterance

All samples are from the Cantonese training dataset with Pinyin transcriptions.

Using Your Own Audio

For custom audio files, use Python directly:

python3 << 'EOF'
import sys
sys.path.append('/home/husrcf/Code/AIAA/AIAA2205-assignment2-F5-TTS/')
import torch, soundfile as sf
from src.f5_tts.infer.utils_infer import load_model, load_vocoder, infer_process
from src.f5_tts.Models.DiT import DiT

# Your audio and text (text MUST be in Pinyin with tone numbers)
ref_audio = "/path/to/your/audio.wav"
ref_text = "your ref text in pinyin"
gen_text = "text to generate in pinyin"

device = "cuda" if torch.cuda.is_available() else "cpu"
model = load_model(DiT, dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4),
                   "./ckpts/cantonese_data/model_last.pt",
                   vocab_file="./data/cantonese_data_pinyin/vocab.txt", device=device)
vocoder = load_vocoder("vocos")

audio, sr, _ = infer_process(ref_audio, ref_text, gen_text, model, vocoder,
                             mel_spec_type="vocos", nfe_step=32, cfg_strength=2.0, device=device)
sf.write("output.wav", audio, sr)
print(f"✓ Generated {len(audio)/sr:.2f}s audio → output.wav")
EOF

Troubleshooting

"CUDA out of memory"

The script will automatically use CPU if CUDA fails

"Error: Checkpoint not found"

Make sure the model is trained: ls -lh ckpts/cantonese_data/model_last.pt

Generated audio sounds weird

Ensure reference text exactly matches the reference audio
Try lowering --cfg to 1.5
Text must be in Pinyin format with tone numbers

Permission denied

chmod +x demo_infer.sh
./demo_infer.sh --list

Note About Training Data

These demos use samples from the training set. This is fine for:

✅ Testing that everything works
✅ Demonstrating model capabilities
✅ Quick sanity checks

But not ideal for:

❌ Evaluating generalization
❌ Claiming model performance on unseen data

For proper evaluation, use a held-out test set.

Quick Start: ./demo_infer.sh --list then ./demo_infer.sh --sample 1 --gen-sample 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide - F5-TTS Demo Inference

Fastest Way to Get Started

All Available Methods

Method 1: Bash Wrapper (Recommended)

Method 2: Python CLI

Method 3: Preset Demos

Common Commands

Parameters

What Are The 5 Demo Samples?

Using Your Own Audio

Troubleshooting

"CUDA out of memory"

"Error: Checkpoint not found"

Generated audio sounds weird

Permission denied

More Documentation

Note About Training Data

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quick Start Guide - F5-TTS Demo Inference

Fastest Way to Get Started

All Available Methods

Method 1: Bash Wrapper (Recommended)

Method 2: Python CLI

Method 3: Preset Demos

Common Commands

Parameters

What Are The 5 Demo Samples?

Using Your Own Audio

Troubleshooting

"CUDA out of memory"

"Error: Checkpoint not found"

Generated audio sounds weird

Permission denied

More Documentation

Note About Training Data