Skip to content

feat: wire up voice dictation in goose2 via ACP#8565

Open
tulsi-builder wants to merge 1 commit intomainfrom
tulsi/voice-input
Open

feat: wire up voice dictation in goose2 via ACP#8565
tulsi-builder wants to merge 1 commit intomainfrom
tulsi/voice-input

Conversation

@tulsi-builder
Copy link
Copy Markdown
Collaborator

Overview

Category: new-feature
User Impact: Users can now dictate messages using their microphone in the Goose2 desktop app, with support for OpenAI Whisper, Groq, and ElevenLabs transcription providers.

Problem: Goose2 had voice dictation building blocks (hooks, VAD, settings UI) sitting unused in the codebase. The mic button showed "coming soon" and the backend commands couldn't compile because they imported the goose crate directly instead of going through ACP.

Solution: Exposed dictation as ACP custom methods (_goose/dictation/transcribe and _goose/dictation/config) following the same pattern as existing methods like _goose/session/export. Rewrote the Tauri commands to use call_ext_method, wired the frontend hooks into ChatInput, and added a Voice settings page.

Changes

File changes

crates/goose-sdk/src/custom_requests.rs
Added DictationTranscribeRequest/Response, DictationConfigRequest/Response, and DictationProviderStatusEntry types for the ACP custom method protocol.

crates/goose-acp/src/server.rs
Added #[custom_method] handlers for on_dictation_transcribe (routes to OpenAI/Groq/ElevenLabs/Local providers) and on_dictation_config (returns provider statuses with model metadata).

crates/goose-acp/acp-meta.json
Registered the two new dictation methods in the ACP method registry.

crates/goose-acp/Cargo.toml
Added local-inference feature flag and base64 dependency for the transcription handler.

crates/goose-cli/Cargo.toml
Forwarded local-inference feature to goose-acp so local Whisper code paths compile into the binary.

ui/goose2/src-tauri/src/commands/dictation.rs
New file. Two Tauri commands (transcribe_dictation, get_dictation_config) that proxy through GooseAcpManager::call_ext() to ACP.

ui/goose2/src-tauri/src/services/acp/manager.rs
Added generic CallExt command variant and call_ext() public method on GooseAcpManager. Added normalize_ext_method_name() to strip leading underscores (the ACP protocol auto-prefixes _). Includes regression test.

ui/goose2/src-tauri/src/services/acp/manager/command_dispatch.rs
Added match arm for ManagerCommand::CallExt dispatch.

ui/goose2/src/features/chat/ui/ChatInput.tsx
Wired up useDictationRecorder + useVoiceInputPreferences. Handles transcription text insertion, auto-submit on keyword, stops recording on manual send, shows "Listening..."/"Transcribing..." placeholder.

ui/goose2/src/features/chat/ui/ChatInputToolbar.tsx
Replaced disabled "coming soon" mic button with working toggle. Shows recording (red) and transcribing (pulse) states.

ui/goose2/src/features/settings/ui/SettingsModal.tsx
Added Voice nav item with Mic icon, renders VoiceInputSettings.

ui/goose2/src/features/settings/ui/VoiceInputSettings.tsx
New file. Voice settings page with provider selection, API key management, microphone picker, and auto-submit phrase configuration.

ui/goose2/src/features/chat/lib/dictationVad.ts
Fixed return type annotation on advanceVadState (was inferring string instead of VadPhase).

ui/goose2/src/shared/i18n/locales/{en,es}/chat.json
Added voice toolbar strings (recording, transcribing, disabled tooltip).

ui/goose2/src/shared/i18n/locales/{en,es}/settings.json
Added voice settings strings (provider names, API key labels, mic labels, auto-submit labels, local model unavailable message).

ui/goose2/src-tauri/Info.plist
New file. macOS microphone usage description for permission prompt.

Reproduction Steps

  1. Build the goose binary: cargo build --release -p goose-cli
  2. Launch goose2: GOOSE_BIN=./target/release/goose pnpm tauri dev from ui/goose2/
  3. Open Settings → Voice → select OpenAI Whisper (should show as configured if you have an OpenAI API key)
  4. Close settings, click the mic button in the chat toolbar
  5. Speak — text should appear in the input after a brief delay
  6. Say your auto-submit keyword (default: "submit") — message sends and mic turns off
  7. While recording, click send or mic button — recording stops

Known Issues

  • Local Whisper shows "not configured" even when a model is downloaded and config is set. The is_downloaded() path check needs investigation — likely a path resolution mismatch between config and data directories.
  • Keychain popup on first launch when the backend checks API key status. Goes away after clicking "Always Allow".
  • Local model download not available from UI — model management ACP methods not yet implemented. Users can download models via the Goose CLI as a workaround.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b085207aea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


setError(null);
try {
await saveDictationProviderSecret(selectedProvider, apiKeyInput);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register missing Tauri commands for voice settings

VoiceInputSettings now invokes saveDictationProviderSecret, deleteDictationProviderSecret, and saveDictationModelSelection, but this commit only wires get_dictation_config and transcribe_dictation into the Tauri invoke_handler (ui/goose2/src-tauri/src/lib.rs lines 111-112). As a result, saving/removing API keys or changing models from the new Voice settings screen will fail at runtime with command ... not found, so core settings actions are non-functional.

Useful? React with 👍 / 👎.

Comment on lines +183 to +184
const merged = appendTranscribedText(text, fragment);
setText(merged);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Merge dictation chunks with functional text updates

Dictation responses can arrive concurrently because recording flushes chunks with void transcribeChunk(...), but handleTranscription appends each fragment using the closure-captured text value. When multiple transcriptions resolve before React re-renders, later callbacks can overwrite earlier updates and drop dictated words (and potentially interfere with auto-submit matching). Applying fragments via functional state updates against the latest text avoids this race.

Useful? React with 👍 / 👎.

@tulsi-builder tulsi-builder force-pushed the tulsi/voice-input branch 3 times, most recently from df1b1c0 to 77ce67b Compare April 15, 2026 21:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77ce67bece

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useDictationRecorder.ts
Add voice dictation support to the goose2 Tauri app by exposing
transcription and config as ACP custom methods, then wiring the
frontend to use them.

Backend (crates/):
- Add DictationTranscribeRequest/Response and DictationConfigRequest/Response
  types to goose-sdk custom_requests.rs with model metadata fields
- Add #[custom_method] handlers in goose-acp server.rs for transcribe
  (OpenAI, Groq, ElevenLabs, Local) and config
- Register methods in acp-meta.json
- Forward local-inference feature from goose-cli to goose-acp

Tauri (ui/goose2/src-tauri/):
- Rewrite dictation.rs to use call_ext_method via ACP instead of
  importing goose crate directly
- Add generic CallExt command to ACP manager with method name
  normalization (strips leading _ to avoid double-prefix)
- Register get_dictation_config and transcribe_dictation commands

Frontend (ui/goose2/src/):
- Wire useDictationRecorder + useVoiceInputPreferences into ChatInput
- Replace placeholder mic button with working toggle (recording/
  transcribing states, auto-submit on keyword)
- Stop recording on manual send and on auto-submit keyword
- Show "Listening..."/"Transcribing..." placeholder in textarea
- Add Voice section to SettingsModal with VoiceInputSettings
- Add all voice i18n strings (en + es)
- Fix pre-existing type errors in dictationVad.ts and VoiceInputSettings

Known issue: Local Whisper reports configured: false despite model being
downloaded and config set. The is_downloaded() path check needs
investigation in a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bce05bf054

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +79 to +83
onSend(
merged.trim(),
selectedPersonaId ?? undefined,
attachments.length > 0 ? attachments : undefined,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect send guards before auto-submitting dictation

Auto-submit calls onSend(...) directly without checking the same guard conditions used by manual send (canSend in ChatInput, which blocks when a queued message exists or input is disabled). In the busy/queued state, this can bypass the queue protection and trigger another send while a message is already queued, which risks out-of-order or dropped user messages in ChatView's busy-path logic. Add the same send preconditions (or pass an explicit canSend predicate) before invoking onSend from dictation.

Useful? React with 👍 / 👎.

Comment on lines +174 to +177
} else if (!flushPending) {
samplesRef.current = [];
generationRef.current += 1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear transcribing state when canceling dictation

When stopRecording({ flushPending: false }) is used (e.g., after auto-submit), this path invalidates generation but leaves pendingTranscriptionsRef/isTranscribing untouched. Because ChatInput blocks send while isTranscribing is true, users can be unable to send a follow-up message until canceled in-flight requests finish, even though their results are intentionally ignored. Reset or decouple transcribing UI state for canceled generations so cancellation immediately unblocks input.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant