feat: wire up voice dictation in goose2 via ACP by tulsi-builder · Pull Request #8565 · aaif-goose/goose

tulsi-builder · 2026-04-15T21:02:14Z

Overview

Category: new-feature
User Impact: Users can now dictate messages using their microphone in the Goose2 desktop app, with support for OpenAI Whisper, Groq, and ElevenLabs transcription providers.

Problem: Goose2 had voice dictation building blocks (hooks, VAD, settings UI) sitting unused in the codebase. The mic button showed "coming soon" and the backend commands couldn't compile because they imported the goose crate directly instead of going through ACP.

Solution: Exposed dictation as ACP custom methods (_goose/dictation/transcribe and _goose/dictation/config) following the same pattern as existing methods like _goose/session/export. Rewrote the Tauri commands to use call_ext_method, wired the frontend hooks into ChatInput, and added a Voice settings page.

Changes

File changes

crates/goose-sdk/src/custom_requests.rs
Added DictationTranscribeRequest/Response, DictationConfigRequest/Response, and DictationProviderStatusEntry types for the ACP custom method protocol.

crates/goose-acp/src/server.rs
Added #[custom_method] handlers for on_dictation_transcribe (routes to OpenAI/Groq/ElevenLabs/Local providers) and on_dictation_config (returns provider statuses with model metadata).

crates/goose-acp/acp-meta.json
Registered the two new dictation methods in the ACP method registry.

crates/goose-acp/Cargo.toml
Added local-inference feature flag and base64 dependency for the transcription handler.

crates/goose-cli/Cargo.toml
Forwarded local-inference feature to goose-acp so local Whisper code paths compile into the binary.

ui/goose2/src-tauri/src/commands/dictation.rs
New file. Two Tauri commands (transcribe_dictation, get_dictation_config) that proxy through GooseAcpManager::call_ext() to ACP.

ui/goose2/src-tauri/src/services/acp/manager.rs
Added generic CallExt command variant and call_ext() public method on GooseAcpManager. Added normalize_ext_method_name() to strip leading underscores (the ACP protocol auto-prefixes _). Includes regression test.

ui/goose2/src-tauri/src/services/acp/manager/command_dispatch.rs
Added match arm for ManagerCommand::CallExt dispatch.

ui/goose2/src/features/chat/ui/ChatInput.tsx
Wired up useDictationRecorder + useVoiceInputPreferences. Handles transcription text insertion, auto-submit on keyword, stops recording on manual send, shows "Listening..."/"Transcribing..." placeholder.

ui/goose2/src/features/chat/ui/ChatInputToolbar.tsx
Replaced disabled "coming soon" mic button with working toggle. Shows recording (red) and transcribing (pulse) states.

ui/goose2/src/features/settings/ui/SettingsModal.tsx
Added Voice nav item with Mic icon, renders VoiceInputSettings.

ui/goose2/src/features/settings/ui/VoiceInputSettings.tsx
New file. Voice settings page with provider selection, API key management, microphone picker, and auto-submit phrase configuration.

ui/goose2/src/features/chat/lib/dictationVad.ts
Fixed return type annotation on advanceVadState (was inferring string instead of VadPhase).

ui/goose2/src/shared/i18n/locales/{en,es}/chat.json
Added voice toolbar strings (recording, transcribing, disabled tooltip).

ui/goose2/src/shared/i18n/locales/{en,es}/settings.json
Added voice settings strings (provider names, API key labels, mic labels, auto-submit labels, local model unavailable message).

ui/goose2/src-tauri/Info.plist
New file. macOS microphone usage description for permission prompt.

Reproduction Steps

Build the goose binary: cargo build --release -p goose-cli
Launch goose2: GOOSE_BIN=./target/release/goose pnpm tauri dev from ui/goose2/
Open Settings → Voice → select OpenAI Whisper (should show as configured if you have an OpenAI API key)
Close settings, click the mic button in the chat toolbar
Speak — text should appear in the input after a brief delay
Say your auto-submit keyword (default: "submit") — message sends and mic turns off
While recording, click send or mic button — recording stops

Known Issues

Local Whisper shows "not configured" even when a model is downloaded and config is set. The is_downloaded() path check needs investigation — likely a path resolution mismatch between config and data directories.
Keychain popup on first launch when the backend checks API key status. Goes away after clicking "Always Allow".
Local model download not available from UI — model management ACP methods not yet implemented. Users can download models via the Goose CLI as a workaround.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b085207aea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T21:06:15Z

+
+    setError(null);
+    try {
+      await saveDictationProviderSecret(selectedProvider, apiKeyInput);


Register missing Tauri commands for voice settings

VoiceInputSettings now invokes saveDictationProviderSecret, deleteDictationProviderSecret, and saveDictationModelSelection, but this commit only wires get_dictation_config and transcribe_dictation into the Tauri invoke_handler (ui/goose2/src-tauri/src/lib.rs lines 111-112). As a result, saving/removing API keys or changing models from the new Voice settings screen will fail at runtime with command ... not found, so core settings actions are non-functional.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-15T21:06:15Z

+        const merged = appendTranscribedText(text, fragment);
+        setText(merged);


Merge dictation chunks with functional text updates

Dictation responses can arrive concurrently because recording flushes chunks with void transcribeChunk(...), but handleTranscription appends each fragment using the closure-captured text value. When multiple transcriptions resolve before React re-renders, later callbacks can overwrite earlier updates and drop dictated words (and potentially interfere with auto-submit matching). Applying fragments via functional state updates against the latest text avoids this race.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77ce67bece

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Add voice dictation support to the goose2 Tauri app by exposing transcription and config as ACP custom methods, then wiring the frontend to use them. Backend (crates/): - Add DictationTranscribeRequest/Response and DictationConfigRequest/Response types to goose-sdk custom_requests.rs with model metadata fields - Add #[custom_method] handlers in goose-acp server.rs for transcribe (OpenAI, Groq, ElevenLabs, Local) and config - Register methods in acp-meta.json - Forward local-inference feature from goose-cli to goose-acp Tauri (ui/goose2/src-tauri/): - Rewrite dictation.rs to use call_ext_method via ACP instead of importing goose crate directly - Add generic CallExt command to ACP manager with method name normalization (strips leading _ to avoid double-prefix) - Register get_dictation_config and transcribe_dictation commands Frontend (ui/goose2/src/): - Wire useDictationRecorder + useVoiceInputPreferences into ChatInput - Replace placeholder mic button with working toggle (recording/ transcribing states, auto-submit on keyword) - Stop recording on manual send and on auto-submit keyword - Show "Listening..."/"Transcribing..." placeholder in textarea - Add Voice section to SettingsModal with VoiceInputSettings - Add all voice i18n strings (en + es) - Fix pre-existing type errors in dictationVad.ts and VoiceInputSettings Known issue: Local Whisper reports configured: false despite model being downloaded and config set. The is_downloaded() path check needs investigation in a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bce05bf054

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T21:35:47Z

+          onSend(
+            merged.trim(),
+            selectedPersonaId ?? undefined,
+            attachments.length > 0 ? attachments : undefined,
+          );


Respect send guards before auto-submitting dictation

Auto-submit calls onSend(...) directly without checking the same guard conditions used by manual send (canSend in ChatInput, which blocks when a queued message exists or input is disabled). In the busy/queued state, this can bypass the queue protection and trigger another send while a message is already queued, which risks out-of-order or dropped user messages in ChatView's busy-path logic. Add the same send preconditions (or pass an explicit canSend predicate) before invoking onSend from dictation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-15T21:35:47Z

+      } else if (!flushPending) {
+        samplesRef.current = [];
+        generationRef.current += 1;
+      }


Clear transcribing state when canceling dictation

When stopRecording({ flushPending: false }) is used (e.g., after auto-submit), this path invalidates generation but leaves pendingTranscriptionsRef/isTranscribing untouched. Because ChatInput blocks send while isTranscribing is true, users can be unable to send a follow-up message until canceled in-flight requests finish, even though their results are intentionally ignored. Reset or decouple transcribing UI state for canceled generations so cancellation immediately unblocks input.

Useful? React with 👍 / 👎.

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

tulsi-builder force-pushed the tulsi/voice-input branch 3 times, most recently from df1b1c0 to 77ce67b Compare April 15, 2026 21:17

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

Comment thread ui/goose2/src/features/chat/hooks/useDictationRecorder.ts

tulsi-builder force-pushed the tulsi/voice-input branch from 77ce67b to bce05bf Compare April 15, 2026 21:28

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire up voice dictation in goose2 via ACP#8565

feat: wire up voice dictation in goose2 via ACP#8565
tulsi-builder wants to merge 1 commit intomainfrom
tulsi/voice-input

tulsi-builder commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const merged = appendTranscribedText(text, fragment);
		setText(merged);

Conversation

tulsi-builder commented Apr 15, 2026

Overview

Changes

Reproduction Steps

Known Issues

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant