Senseaudio voice picker followup#2260
Conversation
Speech projects using `senseaudio-tts` had no way to discover the
voices a SenseAudio account can synthesise — the only escape hatch
was for the user to paste a raw voice_id into the New Project panel or
accept the daemon's default. Add an ElevenLabs-style picker so the
agent can present a dropdown of the account's available personas and
route to the right variant on dispatch.
Daemon
- `senseaudio-voices.ts` fetches `POST /v1/get_voice`, validates the
base_resp envelope, and shapes the response into
`Record<prefix, { name, description, variants }>` — the prefix
(`male_0028`) keys 1:1 to a persona; colliding prefixes
(`female_0030_*`) get keyed by full voice_id instead. The only
metadata the API does not return — variant suffix → emotion label —
is inlined as a `VARIANT_LABELS` const sourced from
docs.senseaudio.cn. 10-min cache by api-key fingerprint. Shaping is
wrapped in try/catch so an API field rename returns an empty
catalogue (and the prompt falls back to the error path) instead of
crashing the daemon.
- `GET /api/media/providers/senseaudio/voices` exposes the catalogue.
Web
- `apps/web/src/providers/senseaudio-voices.ts` mirrors the daemon
shape with defensive normalisation.
- `ProjectView` wires the fetch through the BYOK compose path so both
daemon and BYOK turns get the same catalogue.
Prompt
- New `senseAudioCatalogue` field on `ComposeInput` (contracts +
daemon mirror). `renderSenseAudioPickerInstructions` emits a short
bullet instruction, fixed `title` / `description` / `submitLabel`
defaults the agent reuses verbatim (localised to the brief
language), per-option label rules, post-submit variant-swap logic,
and the catalogue JSON. Errors are sanitised through a
`formatSenseAudioCatalogueErrorForPrompt` helper that classifies
missing-key vs HTTP status code paths.
- Localisation lives in the agent: option labels and form copy get
translated into the user's brief language at emit time; voice_ids
stay verbatim.
Tests
- `senseaudio-voices.test.ts` covers shape conversion, prefix
collisions, hardcoded variant labels, the `通用` fallback, the
base_resp error envelope, caching, missing-credentials early exit,
and an "API field rename returns empty catalogue" defence.
- `system-prompt-senseaudio-voices.test.ts` covers picker injection
triggered by `audioModel=senseaudio-tts`, the sanitised error path,
and the missing-key Settings hint.
…-picker # Conflicts: # apps/daemon/src/prompts/system.ts # packages/contracts/src/prompts/system.ts
The variant suffix → emotion label map (e.g. female_0033_b → "开心")
is documented on docs.senseaudio.cn/guides/voice/catalog.md but not
returned by the /v1/get_voice API, so the original PR hardcoded a
50-line table that drifts every time SenseAudio adds a persona.
Replace the hardcoded table with a one-time per-process scraper:
1. fetch docs.senseaudio.cn/guides/voice/catalog.md (24h cached)
2. regex `<voice_id>` `(<label>)` across the page → labels map
3. shapeCatalogue() now consults the scraped map first
Fallback chain when shaping each variant entry:
primary doc-scraped label (fresh, authoritative)
secondary BACKUP_VARIANT_LABELS hardcoded (used iff doc fetch fails
or yields zero matches;
cached only 5min so the
live doc is retried fast
once it recovers)
per-voice voice_name from the API (used iff a specific
voice_id is missing from
both label sources —
never a static "通用"
placeholder anymore)
Net effect against today's prod catalogue: doc surfaces 111 voice_id
labels vs 82 in the hardcoded backup, so 29 voices that previously
fell back to "通用" (female_0006_a "深情", male_0023_a "平稳",
male_0004_a "平稳", … includes热门 personas) now carry their real
label without anyone needing to update source code.
The picker dropdown showed all ~12 personas in catalogue order with no UX hint about which ones actually fit the user's brief. The user had to read every option's label end-to-end to decide. Add a REQUIRED step in renderSenseAudioPickerInstructions: before composing the dropdown, the agent scores each persona for fit against the brief (gender, age, register, tone, scenario keywords), then marks the top 3 with prefix glyphs included in the localised label: ★ nexu-io#1 best match ◆ nexu-io#2 ◇ nexu-io#3 (none for the rest) Top-3 options sort to the front of the dropdown in 1→2→3 order; the remainder follow in original catalogue order. Glyphs are universal (not zh-CN-only) so the localisation rule for the rest of the label keeps working unchanged.
Update the variant-label-fallback test to match the new behaviour introduced in the scrape-from-docs commit: voice_ids missing from both the doc-scraped map and the BACKUP_VARIANT_LABELS hardcoded backup now fall back to the persona's voice_name, not the static "通用" placeholder. The fetch mock now serves a 404 for the docs URL so the daemon deterministically takes the backup path during the test, which was the implicit assumption in the previous version.
The previous prefix set (★ ◆ ◇) was too geometrically similar — the filled and open diamonds were hard to tell apart at a glance, and the black star + diamonds combo did not visually communicate "ranking". Switch to the universal medal emojis. They map onto the gold/silver/ bronze metaphor users already recognise from sports, awards, and leaderboards, and remain locale-neutral so the rest of the label can still be translated freely.
lefarcen
left a comment
There was a problem hiding this comment.
Hey @QWERTY0205! Thanks for picking up the SenseAudio voice-picker follow-up — the changed areas make the intended direction visible, but the PR description is still mostly template placeholders.
Could you fill in ## Why, ## What users will see, ## Surface area, and ## Validation before pool review gets deep into this? This appears to touch UI plus daemon/API-contract voice discovery, so ticking the relevant surface-area boxes and adding the validation commands/screenshots will help reviewers scope the user-facing path quickly. Also, please replace the dangling Fixes # line with a real issue number or remove it if there is no linked issue.
Related: #2044 by @Fl0rencess720 is already open against the same SenseAudio voice-picker area and touches the same 12 files. You two may want to compare approaches; the maintainer team will decide which path lands.
Siri-Ray
left a comment
There was a problem hiding this comment.
@QWERTY0205 thanks for the thoughtful follow-up on the SenseAudio picker. I found one BYOK/API-mode consistency issue to consider; it should be straightforward to fix by keeping the two prompt composers in sync.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.| lines.push(' description: "Pick a voice for the read."'); | ||
| lines.push(' submitLabel: "Use voice"'); | ||
| lines.push(''); | ||
| lines.push('For each dropdown option:'); |
There was a problem hiding this comment.
This contracts-side SenseAudio picker text is now missing the top-3 medal-ranking step that was added to the daemon composer in apps/daemon/src/prompts/system.ts (Top-3 highlighting, 🥇 / 🥈 / 🥉, and sorting the ranked options first). That matters because ProjectView imports composeSystemPrompt from @open-design/contracts for the web/BYOK compose path while daemon-mode runs use apps/daemon/src/prompts/system.ts, so SenseAudio projects behave differently depending on mode: daemon users get the ranked picker instructions, but BYOK/API-mode users only get the unranked dropdown guidance here. Please mirror the medal-ranking block in this contracts composer as well, and add/update a contracts prompt test that asserts the 🥇, 🥈, 🥉 prefixes so future follow-ups cannot drift the two prompt copies again.
Fixes #
Why
What users will see
Surface area
apps/weborapps/desktop(including Electron menu bar)odsubcommand or flag, newtools-dev/tools-pack/tools-prflag, or newOD_*env var/api/*endpoint, new SSE event, or changed shape inpackages/contractsskills/,design-systems/,design-templates/, orcraft/, or change to the skills protocolTRANSLATIONS.mdfor the locale workflow)package.json(dependenciesordevDependencies); workspace-packagepackage.jsonfiles are out of scope. Include a paragraph on what we get vs. what bytes we ship (seeCONTRIBUTING.md→ Code style)Screenshots
Bug fix verification
Validation