This repository contains voice recording scripts designed for use with Mirako CLI - a command-line tool for creating AI avatars with custom voice cloning capabilities.
These scripts provide carefully curated text content for recording high-quality voice samples needed to create custom voice profiles using Mirako's voice cloning technology. Each script is optimized for capturing the phonetic diversity and emotional range necessary for generating realistic AI voice clones.
Before using these scripts, ensure you have:
- Mirako CLI installed - Follow the installation guide
- Valid API token - Obtain from Mirako Developer Console
- Audio recording setup - Quality microphone and quiet recording environment
- Audio editing software - For cleaning and processing recordings (optional but recommended)
Follow the installation instructions in the Mirako CLI repository.
# Interactive setup
mirako auth login
# Or set environment variable
export MIRAKO_API_TOKEN="your-api-token-here"
- Create organized folder structure (see Folder Organization)
- Choose your script from the available options
- Record each script in a quiet environment
- Clean audio using denoising tools (recommended)
- Organize files according to the naming convention
This repository includes the following voice recording scripts:
- cantonese_40.txt - 40 Cantonese sentences for basic phonetic coverage
- cantonese_mixed_100.txt - 100 mixed Cantonese and English phrases and sentences
- cantonese_word_100.txt - 100 individual Cantonese words
- english_generic_100.txt - 100 English sentences covering general topics
- english_smalltalk_140.txt - 140 English small talk and conversational phrases
- mandarin_150_v2.txt - 150 Mandarin Chinese sentences for comprehensive coverage
- Format: WAV (16-bit, 44.1kHz minimum)
- Bit depth: 16-bit or 24-bit
- Sample rate: 44.1kHz or 48kHz
- Channels: Mono
- Duration: 2-10 seconds per sample
- Total samples: 50-150 samples recommended per language
- Quiet space with minimal background noise
- Consistent distance from microphone (6-12 inches)
- Pop filter or foam cover to reduce plosives
- Consistent volume throughout recording
- Same microphone for all samples
- Natural delivery - speak as you normally would
- Consistent pace - avoid rushing or dragging
- Clear articulation - but maintain natural speech patterns
- Accuracy - Make sure you speak the text exactly as in the annotation file.
- Language consistency - complete one language before switching
Ensure all audio files are:
- Properly named and organized
- Cleaned and denoised
- In the correct format (WAV)
The annotation.list
file uses a pipe-delimited format to map audio files to their corresponding text and language. Each line follows this structure:
audio_filename|language_code|text_content
Example entries:
audio_samples_100.wav|yue|我嗰個smart home system真係好方便。
audio_samples_101.wav|yue|嗰個cafe嘅brunch menu好吸引。
audio_samples_102.wav|en|London bridge is falling down.
audio_samples_103.wav|en|Hi there! Welcome to mirako.
audio_samples_104.wav|zh|寻求将能源转化为智能的最优解。
audio_samples_105.wav|zh|但是它本身是不支持访问网络、数据库等外部资源,也不支持执行任何代码。
Format specifications:
- audio_filename: Name of the audio file (without path)
- language_code: Language identifier (
yue
for Cantonese,en
for English,zh
for Mandarin) - text_content: The exact text spoken in the audio file
- Delimiter: Pipe character
|
separates the three components - Empty lines: Allowed and will be ignored during processing
Create the following directory structure for optimal voice sample management:
some-voice-sample-project/
├── audio-samples/
│ ├── cantonese_50_101.wav
│ ├── cantonese_50_102.wav
│ └── english_generic_101.wav
└── annotation.list # The annotation file
Here's an example to perform voice cloning process using the Mirako CLI:
mirako voice clone \
--name "My Multilingual Voice" \
--annotations annotations.list \
--audio-dir audio_samples/
You can also use the --clean_data
flag to let the voice clone service to perform de-noise operations on the audio files. This is useful if you have a worse quality audio samples (e.g. noisy, high reverb environment) and you don't have the tools to clean them up. However, this may not always yield the best results, so use it with caution. The best practice is to record in a quiet environment with low noise and reverberation.
mirako voice clone \
--name "My Multilingual Voice" \
--annotations annotations/voice_profile.yml \
--audio-dir recordings/final/ \
--clean_data
# List your voice profiles
mirako voice list
# Use in text-to-speech
mirako speech tts \
--text "Hello, this is my custom voice!" \
--voice [your-voice-profile-id] \
--output hello_custom.wav
# Use in interactive session
mirako interactive start \
--avatar [avatar-id] \
--voice [your-voice-profile-id]
Audio Quality Problems
- Use denoising tools before processing
- Ensure consistent recording environment
- Record multiple takes for selection
Voice Cloning Failures
- Verify audio format compatibility
- Check annotation file syntax
- Ensure sufficient sample variety per language
API Authentication
- Verify API token validity
- Check network connectivity
- Ensure proper configuration
- Sample Diversity: Include varied content types across all languages
- Quality over Quantity: Better to have 50 high-quality samples than 200 poor ones
- Consistent Setup: Use same microphone, position, and environment
- Language Separation: Organize recordings by language for easier processing
- Regular Testing: Test voice quality throughout the recording process
- Mirako CLI Documentation: GitHub Repository
- API Documentation: docs.mirako.ai
- Web Console: mirako.ai
- Voice Cloning Guide: Available in Mirako documentation
For issues with:
- These scripts: Open an issue in this repository
- Mirako CLI: Visit mirako-ai/mirako-cli
- General support: Contact Mirako Discord channel or shoot us an email.
This script repository is provided under the MIT License. See LICENSE file for details.
The Mirako CLI is a separate product maintained by Mirko AI, also under MIT License.