feat: wire up voice dictation in goose2 via ACP#8565
feat: wire up voice dictation in goose2 via ACP#8565tulsi-builder wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b085207aea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| setError(null); | ||
| try { | ||
| await saveDictationProviderSecret(selectedProvider, apiKeyInput); |
There was a problem hiding this comment.
Register missing Tauri commands for voice settings
VoiceInputSettings now invokes saveDictationProviderSecret, deleteDictationProviderSecret, and saveDictationModelSelection, but this commit only wires get_dictation_config and transcribe_dictation into the Tauri invoke_handler (ui/goose2/src-tauri/src/lib.rs lines 111-112). As a result, saving/removing API keys or changing models from the new Voice settings screen will fail at runtime with command ... not found, so core settings actions are non-functional.
Useful? React with 👍 / 👎.
| const merged = appendTranscribedText(text, fragment); | ||
| setText(merged); |
There was a problem hiding this comment.
Merge dictation chunks with functional text updates
Dictation responses can arrive concurrently because recording flushes chunks with void transcribeChunk(...), but handleTranscription appends each fragment using the closure-captured text value. When multiple transcriptions resolve before React re-renders, later callbacks can overwrite earlier updates and drop dictated words (and potentially interfere with auto-submit matching). Applying fragments via functional state updates against the latest text avoids this race.
Useful? React with 👍 / 👎.
df1b1c0 to
77ce67b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77ce67bece
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Add voice dictation support to the goose2 Tauri app by exposing transcription and config as ACP custom methods, then wiring the frontend to use them. Backend (crates/): - Add DictationTranscribeRequest/Response and DictationConfigRequest/Response types to goose-sdk custom_requests.rs with model metadata fields - Add #[custom_method] handlers in goose-acp server.rs for transcribe (OpenAI, Groq, ElevenLabs, Local) and config - Register methods in acp-meta.json - Forward local-inference feature from goose-cli to goose-acp Tauri (ui/goose2/src-tauri/): - Rewrite dictation.rs to use call_ext_method via ACP instead of importing goose crate directly - Add generic CallExt command to ACP manager with method name normalization (strips leading _ to avoid double-prefix) - Register get_dictation_config and transcribe_dictation commands Frontend (ui/goose2/src/): - Wire useDictationRecorder + useVoiceInputPreferences into ChatInput - Replace placeholder mic button with working toggle (recording/ transcribing states, auto-submit on keyword) - Stop recording on manual send and on auto-submit keyword - Show "Listening..."/"Transcribing..." placeholder in textarea - Add Voice section to SettingsModal with VoiceInputSettings - Add all voice i18n strings (en + es) - Fix pre-existing type errors in dictationVad.ts and VoiceInputSettings Known issue: Local Whisper reports configured: false despite model being downloaded and config set. The is_downloaded() path check needs investigation in a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
77ce67b to
bce05bf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bce05bf054
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| onSend( | ||
| merged.trim(), | ||
| selectedPersonaId ?? undefined, | ||
| attachments.length > 0 ? attachments : undefined, | ||
| ); |
There was a problem hiding this comment.
Respect send guards before auto-submitting dictation
Auto-submit calls onSend(...) directly without checking the same guard conditions used by manual send (canSend in ChatInput, which blocks when a queued message exists or input is disabled). In the busy/queued state, this can bypass the queue protection and trigger another send while a message is already queued, which risks out-of-order or dropped user messages in ChatView's busy-path logic. Add the same send preconditions (or pass an explicit canSend predicate) before invoking onSend from dictation.
Useful? React with 👍 / 👎.
| } else if (!flushPending) { | ||
| samplesRef.current = []; | ||
| generationRef.current += 1; | ||
| } |
There was a problem hiding this comment.
Clear transcribing state when canceling dictation
When stopRecording({ flushPending: false }) is used (e.g., after auto-submit), this path invalidates generation but leaves pendingTranscriptionsRef/isTranscribing untouched. Because ChatInput blocks send while isTranscribing is true, users can be unable to send a follow-up message until canceled in-flight requests finish, even though their results are intentionally ignored. Reset or decouple transcribing UI state for canceled generations so cancellation immediately unblocks input.
Useful? React with 👍 / 👎.
Overview
Category: new-feature
User Impact: Users can now dictate messages using their microphone in the Goose2 desktop app, with support for OpenAI Whisper, Groq, and ElevenLabs transcription providers.
Problem: Goose2 had voice dictation building blocks (hooks, VAD, settings UI) sitting unused in the codebase. The mic button showed "coming soon" and the backend commands couldn't compile because they imported the goose crate directly instead of going through ACP.
Solution: Exposed dictation as ACP custom methods (
_goose/dictation/transcribeand_goose/dictation/config) following the same pattern as existing methods like_goose/session/export. Rewrote the Tauri commands to usecall_ext_method, wired the frontend hooks into ChatInput, and added a Voice settings page.Changes
File changes
crates/goose-sdk/src/custom_requests.rs
Added
DictationTranscribeRequest/Response,DictationConfigRequest/Response, andDictationProviderStatusEntrytypes for the ACP custom method protocol.crates/goose-acp/src/server.rs
Added
#[custom_method]handlers foron_dictation_transcribe(routes to OpenAI/Groq/ElevenLabs/Local providers) andon_dictation_config(returns provider statuses with model metadata).crates/goose-acp/acp-meta.json
Registered the two new dictation methods in the ACP method registry.
crates/goose-acp/Cargo.toml
Added
local-inferencefeature flag andbase64dependency for the transcription handler.crates/goose-cli/Cargo.toml
Forwarded
local-inferencefeature togoose-acpso local Whisper code paths compile into the binary.ui/goose2/src-tauri/src/commands/dictation.rs
New file. Two Tauri commands (
transcribe_dictation,get_dictation_config) that proxy throughGooseAcpManager::call_ext()to ACP.ui/goose2/src-tauri/src/services/acp/manager.rs
Added generic
CallExtcommand variant andcall_ext()public method onGooseAcpManager. Addednormalize_ext_method_name()to strip leading underscores (the ACP protocol auto-prefixes_). Includes regression test.ui/goose2/src-tauri/src/services/acp/manager/command_dispatch.rs
Added match arm for
ManagerCommand::CallExtdispatch.ui/goose2/src/features/chat/ui/ChatInput.tsx
Wired up
useDictationRecorder+useVoiceInputPreferences. Handles transcription text insertion, auto-submit on keyword, stops recording on manual send, shows "Listening..."/"Transcribing..." placeholder.ui/goose2/src/features/chat/ui/ChatInputToolbar.tsx
Replaced disabled "coming soon" mic button with working toggle. Shows recording (red) and transcribing (pulse) states.
ui/goose2/src/features/settings/ui/SettingsModal.tsx
Added Voice nav item with Mic icon, renders VoiceInputSettings.
ui/goose2/src/features/settings/ui/VoiceInputSettings.tsx
New file. Voice settings page with provider selection, API key management, microphone picker, and auto-submit phrase configuration.
ui/goose2/src/features/chat/lib/dictationVad.ts
Fixed return type annotation on
advanceVadState(was inferringstringinstead ofVadPhase).ui/goose2/src/shared/i18n/locales/{en,es}/chat.json
Added voice toolbar strings (recording, transcribing, disabled tooltip).
ui/goose2/src/shared/i18n/locales/{en,es}/settings.json
Added voice settings strings (provider names, API key labels, mic labels, auto-submit labels, local model unavailable message).
ui/goose2/src-tauri/Info.plist
New file. macOS microphone usage description for permission prompt.
Reproduction Steps
cargo build --release -p goose-cliGOOSE_BIN=./target/release/goose pnpm tauri devfromui/goose2/Known Issues
is_downloaded()path check needs investigation — likely a path resolution mismatch between config and data directories.