Skip to content

Senseaudio voice picker followup#2260

Open
QWERTY0205 wants to merge 7 commits into
nexu-io:mainfrom
QWERTY0205:senseaudio-voice-picker-followup
Open

Senseaudio voice picker followup#2260
QWERTY0205 wants to merge 7 commits into
nexu-io:mainfrom
QWERTY0205:senseaudio-voice-picker-followup

Conversation

@QWERTY0205
Copy link
Copy Markdown

Fixes #

Why

What users will see

Surface area

  • UI — new page / dialog / panel / menu item / setting / empty state in apps/web or apps/desktop (including Electron menu bar)
  • Keyboard shortcut — new or changed
  • CLI / env var — new od subcommand or flag, new tools-dev / tools-pack / tools-pr flag, or new OD_* env var
  • API / contract — new /api/* endpoint, new SSE event, or changed shape in packages/contracts
  • Extension point — new entry under skills/, design-systems/, design-templates/, or craft/, or change to the skills protocol
  • i18n keys — added new translation keys (see TRANSLATIONS.md for the locale workflow)
  • New top-level dependency — adding any new entry to the root package.json (dependencies or devDependencies); workspace-package package.json files are out of scope. Include a paragraph on what we get vs. what bytes we ship (see CONTRIBUTING.md → Code style)
  • Default behavior change — changes what existing users experience without opting in (default model, default setting, file/SQLite schema, auto-network on startup, auto-install)
  • None — internal refactor, docs, tests, or translation update only

Screenshots

Bug fix verification

Validation

Fl0rencess720 and others added 7 commits May 18, 2026 13:59
Speech projects using `senseaudio-tts` had no way to discover the
voices a SenseAudio account can synthesise — the only escape hatch
was for the user to paste a raw voice_id into the New Project panel or
accept the daemon's default. Add an ElevenLabs-style picker so the
agent can present a dropdown of the account's available personas and
route to the right variant on dispatch.

Daemon
- `senseaudio-voices.ts` fetches `POST /v1/get_voice`, validates the
  base_resp envelope, and shapes the response into
  `Record<prefix, { name, description, variants }>` — the prefix
  (`male_0028`) keys 1:1 to a persona; colliding prefixes
  (`female_0030_*`) get keyed by full voice_id instead. The only
  metadata the API does not return — variant suffix → emotion label —
  is inlined as a `VARIANT_LABELS` const sourced from
  docs.senseaudio.cn. 10-min cache by api-key fingerprint. Shaping is
  wrapped in try/catch so an API field rename returns an empty
  catalogue (and the prompt falls back to the error path) instead of
  crashing the daemon.
- `GET /api/media/providers/senseaudio/voices` exposes the catalogue.

Web
- `apps/web/src/providers/senseaudio-voices.ts` mirrors the daemon
  shape with defensive normalisation.
- `ProjectView` wires the fetch through the BYOK compose path so both
  daemon and BYOK turns get the same catalogue.

Prompt
- New `senseAudioCatalogue` field on `ComposeInput` (contracts +
  daemon mirror). `renderSenseAudioPickerInstructions` emits a short
  bullet instruction, fixed `title` / `description` / `submitLabel`
  defaults the agent reuses verbatim (localised to the brief
  language), per-option label rules, post-submit variant-swap logic,
  and the catalogue JSON. Errors are sanitised through a
  `formatSenseAudioCatalogueErrorForPrompt` helper that classifies
  missing-key vs HTTP status code paths.
- Localisation lives in the agent: option labels and form copy get
  translated into the user's brief language at emit time; voice_ids
  stay verbatim.

Tests
- `senseaudio-voices.test.ts` covers shape conversion, prefix
  collisions, hardcoded variant labels, the `通用` fallback, the
  base_resp error envelope, caching, missing-credentials early exit,
  and an "API field rename returns empty catalogue" defence.
- `system-prompt-senseaudio-voices.test.ts` covers picker injection
  triggered by `audioModel=senseaudio-tts`, the sanitised error path,
  and the missing-key Settings hint.
…-picker

# Conflicts:
#	apps/daemon/src/prompts/system.ts
#	packages/contracts/src/prompts/system.ts
The variant suffix → emotion label map (e.g. female_0033_b → "开心")
is documented on docs.senseaudio.cn/guides/voice/catalog.md but not
returned by the /v1/get_voice API, so the original PR hardcoded a
50-line table that drifts every time SenseAudio adds a persona.

Replace the hardcoded table with a one-time per-process scraper:

  1. fetch docs.senseaudio.cn/guides/voice/catalog.md (24h cached)
  2. regex `<voice_id>` `(<label>)` across the page → labels map
  3. shapeCatalogue() now consults the scraped map first

Fallback chain when shaping each variant entry:

  primary    doc-scraped label                 (fresh, authoritative)
  secondary  BACKUP_VARIANT_LABELS hardcoded   (used iff doc fetch fails
                                                or yields zero matches;
                                                cached only 5min so the
                                                live doc is retried fast
                                                once it recovers)
  per-voice  voice_name from the API           (used iff a specific
                                                voice_id is missing from
                                                both label sources —
                                                never a static "通用"
                                                placeholder anymore)

Net effect against today's prod catalogue: doc surfaces 111 voice_id
labels vs 82 in the hardcoded backup, so 29 voices that previously
fell back to "通用" (female_0006_a "深情", male_0023_a "平稳",
male_0004_a "平稳", … includes热门 personas) now carry their real
label without anyone needing to update source code.
The picker dropdown showed all ~12 personas in catalogue order with
no UX hint about which ones actually fit the user's brief. The user
had to read every option's label end-to-end to decide.

Add a REQUIRED step in renderSenseAudioPickerInstructions: before
composing the dropdown, the agent scores each persona for fit against
the brief (gender, age, register, tone, scenario keywords), then
marks the top 3 with prefix glyphs included in the localised label:

  ★    nexu-io#1 best match
  ◆    nexu-io#2nexu-io#3
  (none for the rest)

Top-3 options sort to the front of the dropdown in 1→2→3 order; the
remainder follow in original catalogue order. Glyphs are universal
(not zh-CN-only) so the localisation rule for the rest of the label
keeps working unchanged.
Update the variant-label-fallback test to match the new behaviour
introduced in the scrape-from-docs commit: voice_ids missing from
both the doc-scraped map and the BACKUP_VARIANT_LABELS hardcoded
backup now fall back to the persona's voice_name, not the static
"通用" placeholder.

The fetch mock now serves a 404 for the docs URL so the daemon
deterministically takes the backup path during the test, which was
the implicit assumption in the previous version.
The previous prefix set (★ ◆ ◇) was too geometrically similar — the
filled and open diamonds were hard to tell apart at a glance, and the
black star + diamonds combo did not visually communicate "ranking".

Switch to the universal medal emojis. They map onto the gold/silver/
bronze metaphor users already recognise from sports, awards, and
leaderboards, and remain locale-neutral so the rest of the label can
still be translated freely.
@lefarcen lefarcen requested a review from Siri-Ray May 19, 2026 11:22
@lefarcen lefarcen added size/XL PR changes 700-1500 lines risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/enhancement Enhancement to existing feature labels May 19, 2026
Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @QWERTY0205! Thanks for picking up the SenseAudio voice-picker follow-up — the changed areas make the intended direction visible, but the PR description is still mostly template placeholders.

Could you fill in ## Why, ## What users will see, ## Surface area, and ## Validation before pool review gets deep into this? This appears to touch UI plus daemon/API-contract voice discovery, so ticking the relevant surface-area boxes and adding the validation commands/screenshots will help reviewers scope the user-facing path quickly. Also, please replace the dangling Fixes # line with a real issue number or remove it if there is no linked issue.

Related: #2044 by @Fl0rencess720 is already open against the same SenseAudio voice-picker area and touches the same 12 files. You two may want to compare approaches; the maintainer team will decide which path lands.

Copy link
Copy Markdown
Contributor

@Siri-Ray Siri-Ray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QWERTY0205 thanks for the thoughtful follow-up on the SenseAudio picker. I found one BYOK/API-mode consistency issue to consider; it should be straightforward to fix by keeping the two prompt composers in sync.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

lines.push(' description: "Pick a voice for the read."');
lines.push(' submitLabel: "Use voice"');
lines.push('');
lines.push('For each dropdown option:');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contracts-side SenseAudio picker text is now missing the top-3 medal-ranking step that was added to the daemon composer in apps/daemon/src/prompts/system.ts (Top-3 highlighting, 🥇 / 🥈 / 🥉, and sorting the ranked options first). That matters because ProjectView imports composeSystemPrompt from @open-design/contracts for the web/BYOK compose path while daemon-mode runs use apps/daemon/src/prompts/system.ts, so SenseAudio projects behave differently depending on mode: daemon users get the ranked picker instructions, but BYOK/API-mode users only get the unranked dropdown guidance here. Please mirror the medal-ranking block in this contracts composer as well, and add/update a contracts prompt test that asserts the 🥇, 🥈, 🥉 prefixes so future follow-ups cannot drift the two prompt copies again.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps size/XL PR changes 700-1500 lines type/enhancement Enhancement to existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants