Files
skills/frontend-dev/references/minimax-tts-guide.md
shihao 6487becf60 Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:52:49 +08:00

2.1 KiB
Raw Permalink Blame History

TTS Guide

# Basic
python scripts/minimax_tts.py "Hello world" -o output.mp3

# Custom voice and speed
python scripts/minimax_tts.py "你好世界" -o hi.mp3 -v female-shaonv --speed 0.9

# WAV format, high quality
python scripts/minimax_tts.py "Welcome" -o out.wav -v male-qn-jingying --format wav --sample-rate 32000

# With emotion (for speech-2.6 models)
python scripts/minimax_tts.py "Great news!" -o happy.mp3 -v female-shaonv --emotion happy --model speech-2.6-hd

Programmatic usage

from minimax_tts import tts

# Basic
audio_bytes = tts("Hello world")

# With options
audio_bytes = tts(
    text="Welcome to our product.",
    voice_id="female-shaonv",
    model="speech-2.8-hd",
    speed=0.9,
    fmt="mp3",
)

# Save to file
with open("output.mp3", "wb") as f:
    f.write(audio_bytes)

Limits

  • Sync TTS: max 10,000 characters per request
  • Pause markers: insert <#1.5#> for a 1.5s pause (range: 0.0199.99s)

Model selection

Model Best for
speech-2.8-hd Highest quality, auto emotion (recommended)
speech-2.8-turbo Fast, good quality
speech-2.6-hd Manual emotion control needed
speech-2.6-turbo Fast + manual emotion

Voice selection

See minimax-voice-catalog.md for the full list.

Common voices:

Voice ID Gender Style
male-qn-qingse Male Young, gentle
male-qn-jingying Male Elite, authoritative
male-qn-badao Male Dominant, powerful
female-shaonv Female Young, bright
female-yujie Female Mature, elegant
female-chengshu Female Sophisticated
presenter_male Male News presenter
presenter_female Female News presenter
audiobook_male_1 Male Audiobook narrator
audiobook_female_1 Female Audiobook narrator

Best practices

  • Use speech-2.8-hd and let emotion auto-match — don't manually set emotion unless needed
  • Use 32000 sample rate for web audio (good balance of quality and file size)
  • For long text (>10,000 chars), split into chunks and merge with FFmpeg