Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
306 changes: 306 additions & 0 deletions client/src/pages/Conversation/hooks/useVoiceConversion.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
/**
* Voice Conversion Hook
*
* Provides real-time voice conversion functionality using StreamVC-style
* architecture. Connects to the voice conversion WebSocket endpoint and
* streams audio for conversion.
*
* Usage:
* const { start, stop, status, setTargetVoice } = useVoiceConversion({
* onConvertedAudio: (audio) => playAudio(audio),
* });
*/

import { useState, useCallback, useRef, useEffect } from "react";
import Recorder from "opus-recorder";

export type VCStatus =
| "idle"
| "connecting"
| "awaiting_reference"
| "reference_ready"
| "converting"
| "error"
| "disconnected";

export interface VoiceConversionOptions {
/** Callback when converted audio is received */
onConvertedAudio?: (audioData: ArrayBuffer) => void;
/** Callback when status changes */
onStatusChange?: (status: VCStatus) => void;
/** Callback on error */
onError?: (error: string) => void;
/** Server URL (defaults to current host with /api/vc path) */
serverUrl?: string;
/** Sample rate (default: 24000) */
sampleRate?: number;
}

export interface VoiceConversionResult {
/** Current status */
status: VCStatus;
/** Start voice conversion with optional target voice */
start: (options?: { voice?: string; referenceMode?: boolean }) => Promise<void>;
/** Stop voice conversion */
stop: () => void;
/** Set target voice by name (requires voice embeddings on server) */
setTargetVoice: (voiceName: string) => void;
/** Send reference audio for voice cloning */
sendReferenceAudio: (audioData: ArrayBuffer) => void;
/** Signal end of reference audio collection */
endReferenceCollection: () => void;
/** Whether currently recording/converting */
isActive: boolean;
/** Available voices (fetched from server) */
availableVoices: string[];
/** Fetch available voices from server */
fetchVoices: () => Promise<string[]>;
}

export const useVoiceConversion = (
options: VoiceConversionOptions = {}
): VoiceConversionResult => {
const {
onConvertedAudio,
onStatusChange,
onError,
serverUrl,
sampleRate = 24000,
} = options;

const [status, setStatus] = useState<VCStatus>("idle");
const [availableVoices, setAvailableVoices] = useState<string[]>([]);
const [isActive, setIsActive] = useState(false);

const socketRef = useRef<WebSocket | null>(null);
const recorderRef = useRef<Recorder | null>(null);
const targetVoiceRef = useRef<string | null>(null);

// Update status and notify
const updateStatus = useCallback(
(newStatus: VCStatus) => {
setStatus(newStatus);
onStatusChange?.(newStatus);
},
[onStatusChange]
);

// Get WebSocket URL
const getWsUrl = useCallback(
(voice?: string, referenceMode?: boolean) => {
const base =
serverUrl ||
`${window.location.protocol === "https:" ? "wss:" : "ws:"}//${window.location.host}/api/vc`;

const params = new URLSearchParams();
if (voice) params.set("voice", voice);
if (referenceMode) params.set("reference_mode", "true");

const queryString = params.toString();
return queryString ? `${base}?${queryString}` : base;
},
[serverUrl]
);

// Fetch available voices from server
const fetchVoices = useCallback(async (): Promise<string[]> => {
try {
const baseUrl = serverUrl?.replace(/^wss?:/, "http") || "";
const url = baseUrl
? `${baseUrl.replace("/api/vc", "")}/api/vc/voices`
: "/api/vc/voices";

const response = await fetch(url);
if (!response.ok) {
throw new Error(`Failed to fetch voices: ${response.statusText}`);
}

const data = await response.json();
const voices = data.voices || [];
setAvailableVoices(voices);
return voices;
} catch (error) {
console.error("Failed to fetch voices:", error);
return [];
}
}, [serverUrl]);
Comment on lines +106 to +126
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile URL construction may fail for HTTPS or custom paths.

The URL manipulation at Lines 108-111 has edge cases:

  1. The regex ^wss?: won't match https:, so HTTPS serverUrls won't be converted properly
  2. If serverUrl doesn't contain /api/vc, the path construction fails
🔧 Proposed fix
 const fetchVoices = useCallback(async (): Promise<string[]> => {
   try {
-    const baseUrl = serverUrl?.replace(/^wss?:/, "http") || "";
-    const url = baseUrl
-      ? `${baseUrl.replace("/api/vc", "")}/api/vc/voices`
-      : "/api/vc/voices";
+    let url = "/api/vc/voices";
+    if (serverUrl) {
+      const wsUrl = new URL(serverUrl, window.location.href);
+      wsUrl.protocol = wsUrl.protocol.replace(/^ws/, "http");
+      wsUrl.pathname = "/api/vc/voices";
+      url = wsUrl.toString();
+    }

     const response = await fetch(url);
🤖 Prompt for AI Agents
In `@client/src/pages/Conversation/hooks/useVoiceConversion.ts` around lines 106 -
126, In useVoiceConversion's fetchVoices, the URL assembly is fragile: parse
serverUrl with the URL constructor (if present), map ws->http and wss->https by
checking protocol, then build the voices endpoint from urlObj.origin + (optional
basePath if your server uses a path prefix) or simply urlObj.origin +
'/api/vc/voices' so you don't rely on string replace of "/api/vc"; fall back to
a relative "/api/vc/voices" when serverUrl is falsy; update variables
baseUrl/url and keep setAvailableVoices(voices) and return semantics unchanged.


// Set target voice
const setTargetVoice = useCallback((voiceName: string) => {
targetVoiceRef.current = voiceName;
}, []);

// Send reference audio
const sendReferenceAudio = useCallback((audioData: ArrayBuffer) => {
if (socketRef.current && socketRef.current.readyState === WebSocket.OPEN) {
// Prepend message type byte (0x01 for audio)
const message = new Uint8Array(audioData.byteLength + 1);
message[0] = 0x01;
message.set(new Uint8Array(audioData), 1);
socketRef.current.send(message.buffer);
}
}, []);

// Signal end of reference collection
const endReferenceCollection = useCallback(() => {
if (socketRef.current && socketRef.current.readyState === WebSocket.OPEN) {
// Send control message (0x03) with "end_reference"
const encoder = new TextEncoder();
const text = encoder.encode("end_reference");
const message = new Uint8Array(text.length + 1);
message[0] = 0x03;
message.set(text, 1);
socketRef.current.send(message.buffer);
}
}, []);

// Start voice conversion
const start = useCallback(
async (startOptions?: { voice?: string; referenceMode?: boolean }) => {
const voice = startOptions?.voice || targetVoiceRef.current || undefined;
const referenceMode = startOptions?.referenceMode || false;

try {
updateStatus("connecting");

// Create WebSocket connection
const wsUrl = getWsUrl(voice, referenceMode);
const ws = new WebSocket(wsUrl);
ws.binaryType = "arraybuffer";

ws.onopen = () => {
console.log("Voice conversion WebSocket connected");
};

ws.onmessage = (event) => {
const data = new Uint8Array(event.data);
if (data.length === 0) return;

const messageType = data[0];
const payload = data.slice(1);

switch (messageType) {
case 0x00: // Handshake
console.log("Voice conversion handshake received");
updateStatus(referenceMode ? "awaiting_reference" : "converting");
break;

case 0x01: // Audio
onConvertedAudio?.(payload.buffer);
break;

case 0x03: // Control
const controlMsg = new TextDecoder().decode(payload);
if (controlMsg === "awaiting_reference") {
updateStatus("awaiting_reference");
} else if (controlMsg === "reference_ready") {
updateStatus("converting");
}
break;

case 0x05: // Error
const errorMsg = new TextDecoder().decode(payload);
onError?.(errorMsg);
updateStatus("error");
break;
}
Comment on lines +182 to +206
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap switch case declarations in blocks to prevent scope leakage.

The controlMsg (Line 193) and errorMsg (Line 202) declarations inside switch cases without blocks can be accessed by other cases, potentially causing temporal dead zone errors.

🔧 Proposed fix
           case 0x03: // Control
-            const controlMsg = new TextDecoder().decode(payload);
-            if (controlMsg === "awaiting_reference") {
-              updateStatus("awaiting_reference");
-            } else if (controlMsg === "reference_ready") {
-              updateStatus("converting");
-            }
+            {
+              const controlMsg = new TextDecoder().decode(payload);
+              if (controlMsg === "awaiting_reference") {
+                updateStatus("awaiting_reference");
+              } else if (controlMsg === "reference_ready") {
+                updateStatus("converting");
+              }
+            }
             break;

           case 0x05: // Error
-            const errorMsg = new TextDecoder().decode(payload);
-            onError?.(errorMsg);
-            updateStatus("error");
+            {
+              const errorMsg = new TextDecoder().decode(payload);
+              onError?.(errorMsg);
+              updateStatus("error");
+            }
             break;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
switch (messageType) {
case 0x00: // Handshake
console.log("Voice conversion handshake received");
updateStatus(referenceMode ? "awaiting_reference" : "converting");
break;
case 0x01: // Audio
onConvertedAudio?.(payload.buffer);
break;
case 0x03: // Control
const controlMsg = new TextDecoder().decode(payload);
if (controlMsg === "awaiting_reference") {
updateStatus("awaiting_reference");
} else if (controlMsg === "reference_ready") {
updateStatus("converting");
}
break;
case 0x05: // Error
const errorMsg = new TextDecoder().decode(payload);
onError?.(errorMsg);
updateStatus("error");
break;
}
switch (messageType) {
case 0x00: // Handshake
console.log("Voice conversion handshake received");
updateStatus(referenceMode ? "awaiting_reference" : "converting");
break;
case 0x01: // Audio
onConvertedAudio?.(payload.buffer);
break;
case 0x03: // Control
{
const controlMsg = new TextDecoder().decode(payload);
if (controlMsg === "awaiting_reference") {
updateStatus("awaiting_reference");
} else if (controlMsg === "reference_ready") {
updateStatus("converting");
}
}
break;
case 0x05: // Error
{
const errorMsg = new TextDecoder().decode(payload);
onError?.(errorMsg);
updateStatus("error");
}
break;
}
🧰 Tools
🪛 Biome (2.3.13)

[error] 193-193: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 202-202: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🤖 Prompt for AI Agents
In `@client/src/pages/Conversation/hooks/useVoiceConversion.ts` around lines 182 -
206, The switch cases in useVoiceConversion.ts leak block-scoped variables
(controlMsg, errorMsg) across cases; wrap each case body in its own block (e.g.,
case 0x03: { ... break; } and case 0x05: { ... break; }) or otherwise ensure
const declarations are scoped locally so controlMsg and errorMsg cannot be
accessed by other cases—keep existing calls to updateStatus, onConvertedAudio,
and onError unchanged and just add braces around the 0x03 and 0x05 case bodies.

};

ws.onclose = () => {
console.log("Voice conversion WebSocket closed");
updateStatus("disconnected");
setIsActive(false);
};

ws.onerror = (error) => {
console.error("Voice conversion WebSocket error:", error);
onError?.("WebSocket connection error");
updateStatus("error");
};

socketRef.current = ws;

// Start microphone recording
const encoderPath = new URL(
"opus-recorder/dist/encoderWorker.min.js",
import.meta.url
).href;

const recorder = new Recorder({
encoderPath,
encoderSampleRate: sampleRate,
encoderFrameSize: 20, // 20ms frames
maxFramesPerPage: 2, // 40ms packets
numberOfChannels: 1,
encoderApplication: 2049, // VOIP mode
encoderComplexity: 0, // Low CPU
});

recorder.ondataavailable = (opusData: ArrayBuffer) => {
if (
socketRef.current &&
socketRef.current.readyState === WebSocket.OPEN &&
status === "converting"
) {
Comment on lines +239 to +244
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid stale status gating that blocks audio send

The recorder.ondataavailable callback closes over the status value captured when start() was invoked. After the server handshake updates state to "converting", this callback still sees the old value (e.g., "connecting"), so the status === "converting" guard stays false and no audio is ever sent. This means the voice conversion stream never receives microphone data unless start() is called again. Consider storing status in a ref or removing the status gate so the callback sees the latest state.

Useful? React with 👍 / 👎.

// Send audio with message type byte
const message = new Uint8Array(opusData.byteLength + 1);
message[0] = 0x01;
message.set(new Uint8Array(opusData), 1);
socketRef.current.send(message.buffer);
}
};
Comment on lines +239 to +251
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Stale closure bug: status check will always fail.

The ondataavailable callback captures status from the closure when start() is called. At that point, status is "connecting", not "converting". Since the callback isn't updated when React state changes, the condition status === "converting" will never be true, and audio will never be sent.

Use a ref to track the current status instead:

🐛 Proposed fix

Add a ref to track current status:

 const socketRef = useRef<WebSocket | null>(null);
 const recorderRef = useRef<Recorder | null>(null);
 const targetVoiceRef = useRef<string | null>(null);
+const statusRef = useRef<VCStatus>("idle");

 // Update status and notify
 const updateStatus = useCallback(
   (newStatus: VCStatus) => {
     setStatus(newStatus);
+    statusRef.current = newStatus;
     onStatusChange?.(newStatus);
   },
   [onStatusChange]
 );

Then in the callback:

       recorder.ondataavailable = (opusData: ArrayBuffer) => {
         if (
           socketRef.current &&
           socketRef.current.readyState === WebSocket.OPEN &&
-          status === "converting"
+          statusRef.current === "converting"
         ) {
🤖 Prompt for AI Agents
In `@client/src/pages/Conversation/hooks/useVoiceConversion.ts` around lines 239 -
251, The recorder.ondataavailable handler is using the stale React state
variable status (captured at start) so the check status === "converting" never
becomes true; create a mutable ref (e.g., statusRef) to mirror status, update
statusRef.current whenever status changes, and replace the closure check with
statusRef.current === "converting" (keeping existing socketRef and
recorder.ondataavailable logic) so the callback sees the latest state and will
send audio when converting.


await recorder.start();
recorderRef.current = recorder;
setIsActive(true);

console.log("Voice conversion started");
} catch (error) {
console.error("Failed to start voice conversion:", error);
onError?.(error instanceof Error ? error.message : "Unknown error");
updateStatus("error");
}
},
[getWsUrl, onConvertedAudio, onError, sampleRate, status, updateStatus]
);

// Stop voice conversion
const stop = useCallback(() => {
// Stop recorder
if (recorderRef.current) {
recorderRef.current.stop();
recorderRef.current = null;
}

// Close WebSocket
if (socketRef.current) {
socketRef.current.close();
socketRef.current = null;
}

setIsActive(false);
updateStatus("idle");
console.log("Voice conversion stopped");
}, [updateStatus]);

// Cleanup on unmount
useEffect(() => {
return () => {
stop();
};
}, [stop]);

return {
status,
start,
stop,
setTargetVoice,
sendReferenceAudio,
endReferenceCollection,
isActive,
availableVoices,
fetchVoices,
};
};

export default useVoiceConversion;
Loading