Skip to content

Fix typescript errors, Build errors and compatilbility with latest el…#1

Open
Xayaan wants to merge 1 commit intoelizaos-plugins:mainfrom
Xayaan:main
Open

Fix typescript errors, Build errors and compatilbility with latest el…#1
Xayaan wants to merge 1 commit intoelizaos-plugins:mainfrom
Xayaan:main

Conversation

@Xayaan
Copy link

@Xayaan Xayaan commented Jan 23, 2026

Fix typescript errors, Build errors and compatilbility with latest elizaos version

Summary by CodeRabbit

  • New Features

    • Added text generation capabilities with support for small and large models
    • Added image description functionality for analyzing images from URLs
    • Added audio transcription support for converting speech to text
    • Added text-to-speech capability for generating audio from text
  • Chores

    • Renamed package from "@elizaos/plugin-local-embedding" to "@elizaos/plugin-local-ai"
  • Tests

    • Added comprehensive test coverage for all new features and core initialization workflows

✏️ Tip: You can customize this high-level summary in your review settings.

Greptile Overview

Greptile Summary

This PR implements comprehensive TypeScript compatibility and build fixes for the local-ai plugin to work with the latest elizaos version. The changes add complete model handler implementations for text generation (small/large), embeddings, object generation, image description, transcription, and text-to-speech.

Key improvements:

  • Added lazy initialization for all model types to optimize resource usage
  • Implemented robust error handling with proper TypeScript types
  • Added platform detection for GPU capabilities (CUDA, Metal, DirectML)
  • Created download manager with progress tracking and fallback URLs
  • Added comprehensive test suite with proper mocking
  • Implemented environment configuration validation using Zod
  • Added TypeScript type definitions for external modules (whisper-node)

Architecture changes:

  • Refactored LocalAIManager to use singleton pattern with lazy initialization
  • Split functionality into specialized managers (VisionManager, TranscribeManager, TTSManager, DownloadManager, PlatformManager)
  • Added proper separation of concerns with utility modules
  • Implemented streaming support for TTS output

The implementation follows best practices with proper error handling, logging, and resource cleanup. All model operations now support the full elizaos model API specification.

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations for external dependencies
  • Score reflects comprehensive implementation with proper error handling, extensive test coverage, and good architectural patterns. Deducted 1 point due to reliance on external binary dependencies (FFmpeg, whisper models) that need to be installed separately and could cause runtime failures if not properly configured. The code quality is high with proper TypeScript types, singleton patterns, lazy initialization, and resource cleanup.
  • Pay attention to src/utils/transcribeManager.ts and src/utils/ttsManager.ts as they depend on external binaries (FFmpeg, whisper models) that must be installed separately. Ensure deployment documentation includes these dependencies.

Important Files Changed

Filename Overview
package.json Updated dependencies and added test scripts for compatibility with latest elizaos version
src/index.ts Added comprehensive model handlers for TEXT_SMALL, TEXT_LARGE, TEXT_EMBEDDING, OBJECT_SMALL/LARGE, IMAGE_DESCRIPTION, TRANSCRIPTION, and TEXT_TO_SPEECH with proper lazy initialization, environment configuration, and error handling
src/environment.ts Added configuration schema with Zod validation for model paths, cache/models directories, and embedding dimensions with sensible defaults
src/utils/downloadManager.ts Implemented robust download manager with progress tracking, fallback URLs, atomic file operations, and duplicate download prevention
src/utils/platform.ts Added platform detection for GPU capabilities (CUDA, Metal, DirectML) and system resource detection to optimize model selection
src/utils/visionManager.ts Implemented vision model manager using Florence-2 with automatic model download, platform-specific optimizations (fp16/fp32), and image description capabilities
src/utils/transcribeManager.ts Added audio transcription manager with FFmpeg validation, WAV conversion, whisper-node integration, and proper error handling for missing dependencies
src/utils/ttsManager.ts Implemented text-to-speech manager using Transformers.js with SpeechT5, speaker embeddings, WAV header generation, and streaming output

Sequence Diagram

sequenceDiagram
    participant Runtime as IAgentRuntime
    participant Plugin as localAiPlugin
    participant Manager as LocalAIManager
    participant Env as Environment
    participant Platform as PlatformManager
    participant Download as DownloadManager
    participant Model as LlamaModel/Florence2/Whisper
    
    Runtime->>Plugin: init(config, runtime)
    Plugin->>Manager: initializeEnvironment()
    Manager->>Env: validateConfig()
    Env-->>Manager: Config (paths, model names)
    Manager->>Manager: _postValidateInit()
    Manager->>Download: getInstance(cacheDir, modelsDir)
    Manager->>Platform: checkPlatformCapabilities()
    Platform->>Platform: detectSystemCapabilities()
    Platform-->>Manager: GPU/CPU info
    Manager-->>Plugin: Environment ready
    
    Runtime->>Plugin: useModel(ModelType.TEXT_SMALL, params)
    Plugin->>Manager: generateText(params)
    Manager->>Manager: lazyInitSmallModel()
    Manager->>Download: downloadModel(ModelType.TEXT_SMALL)
    Download->>Download: Check if model exists
    alt Model not found
        Download->>Download: downloadFile(url, path)
        Download-->>Manager: Model downloaded
    end
    Manager->>Model: loadModel(modelPath)
    Model-->>Manager: Model instance
    Manager->>Model: createContext()
    Manager->>Model: prompt(text)
    Model-->>Manager: Generated text
    Manager-->>Plugin: Response text
    Plugin-->>Runtime: Result
    
    Runtime->>Plugin: useModel(ModelType.IMAGE_DESCRIPTION, imageUrl)
    Plugin->>Manager: describeImage(buffer, mimeType)
    Manager->>Manager: lazyInitVision()
    Manager->>Model: Florence2ForConditionalGeneration.from_pretrained()
    Manager->>Model: AutoProcessor.from_pretrained()
    Model-->>Manager: Vision model ready
    Manager->>Model: processImage(imageUrl)
    Model->>Model: generate(inputs)
    Model-->>Manager: Image description
    Manager-->>Plugin: {title, description}
    Plugin-->>Runtime: Result
    
    Runtime->>Plugin: useModel(ModelType.TRANSCRIPTION, audioBuffer)
    Plugin->>Manager: transcribeAudio(buffer)
    Manager->>Manager: lazyInitTranscription()
    Manager->>Manager: ensureFFmpeg()
    Manager->>Model: whisper(wavFile, options)
    Model-->>Manager: Transcription segments
    Manager-->>Plugin: Transcribed text
    Plugin-->>Runtime: Result
Loading

(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!

@coderabbitai
Copy link

coderabbitai bot commented Jan 23, 2026

Walkthrough

This pull request rebrands the plugin from "local-embedding" to "local-ai" and introduces comprehensive multimodal AI capabilities including text generation, image description, audio transcription, and text-to-speech synthesis. It adds extensive test coverage via six new test suites and introduces three new manager utilities for specialized AI workflows.

Changes

Cohort / File(s) Summary
Test Suite
__tests__/initialization.test.ts, __tests__/image-desc.test.ts, __tests__/text-gen.test.ts, __tests__/text-transcribe.test.ts, __tests__/tts.test.ts
Introduces comprehensive Vitest-based test coverage for plugin initialization, image description, text generation, audio transcription, and text-to-speech; each suite includes environment setup, mocks for external dependencies (HuggingFace transformers, whisper-node, node-llama-cpp), and tests for success/failure scenarios.
Test Utilities
__tests__/test-utils.ts
Provides createMockRuntime() fixture with downloadModelMock and TEST_PATHS constants; implements detailed useModel branches for TEXT_SMALL, TEXT_LARGE, TRANSCRIPTION, IMAGE_DESCRIPTION, and TEXT_TO_SPEECH with file I/O, dynamic imports, and error handling.
Package Configuration
package.json
Updates package name to "@elizaos/plugin-local-ai", repository URL, and @elizaos/core dependency to workspace-based resolution.
Environment Configuration
src/environment.ts
Adds DEFAULT_SMALL_MODEL and DEFAULT_LARGE_MODEL filename constants; extends Zod config schema with LOCAL_SMALL_MODEL and LOCAL_LARGE_MODEL fields; refactors logging to use string concatenation instead of object literals.
Core Plugin Implementation
src/index.ts
Major rewrite introducing LocalGenerateTextParams interface and LocalAIManager methods (generateText, describeImage, transcribeAudio, generateSpeech); adds model handlers for TEXT_SMALL, TEXT_LARGE, OBJECT_SMALL, OBJECT_LARGE, IMAGE_DESCRIPTION, TRANSCRIPTION, TEXT_TO_SPEECH; implements lazy initialization, fresh chat contexts per request, and JSON extraction logic; expands plugin test scenarios.
Type Definitions
src/types/whisper-node.d.ts
Adds TypeScript ambient module declaration for whisper-node with WhisperOptions, WhisperConfig, TranscriptSegment interfaces and whisper function signature.
Vision Manager
src/utils/visionManager.ts
Introduces VisionManager singleton for Florence2-based image captioning with platform-aware configuration, automatic initialization, image fetching, processing pipeline, and returns {title, description} objects.
Transcription Manager
src/utils/transcribeManager.ts
Introduces TranscribeManager singleton for FFmpeg-backed audio preprocessing and whisper-node transcription; handles WAV format detection, codec validation, model initialization, temporary file cleanup, and returns {text} results.
Text-to-Speech Manager
src/utils/ttsManager.ts
Introduces TTSManager singleton for Transformer.js-based speech synthesis with WAV header utilities, model caching, speaker embedding support, and returns Readable audio streams.
Utility Refactoring
src/utils/downloadManager.ts, src/utils/platform.ts
Refactors logging from object literals to string concatenation for error messages; no functional changes to control flow or public APIs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰✨ From embedding bounds we broke free,
Four tools of AI, harmoniously!
Images, voices, transcribed words sing—
A local oracle we now bring! 🎙️📸

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is vague and incomplete, using ellipsis instead of fully stating the intended change. It mentions fixing typescript/build errors and compatibility but cuts off mid-sentence, making it unclear and unprofessional. Complete the title with the full intended message, e.g., 'Fix typescript errors, build errors and compatibility with latest elizaos version' or more specifically describe the primary change this PR introduces.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🤖 Fix all issues with AI agents
In `@__tests__/image-desc.test.ts`:
- Around line 122-127: The tests named "should describe image from URL
successfully" (and the similar failing cases) currently call out to external
URLs; replace those live network calls by mocking global.fetch (e.g.,
jest.spyOn(global, 'fetch') or a fetch-mock helper) to return deterministic
Response-like objects for success and failure scenarios, return controlled image
bytes/JSON for the success path and a deterministic error/HTTP status for the
failure path, update the assertions to assert the actual error or response
content instead of throwing "Should have failed", and ensure the mock is
restored/cleared after each test (use afterEach/restoreMocks) so tests are
deterministic and CI-friendly.

In `@__tests__/text-gen.test.ts`:
- Around line 79-95: The test "should attempt to download large model when using
TEXT_LARGE" wrongly passes context instead of prompt to mockRuntime.useModel and
is tightly coupled to the mock's canned response; update the call to pass
prompt: 'Debug Mode: Generate a one-sentence response about artificial
intelligence.' (replace the context property with prompt) and loosen the
assertion that checks the returned text: instead of asserting it contains the
exact phrase 'artificial intelligence' (which ties to createMockRuntime's canned
output), assert general properties like defined, string, length > 10, and
optionally that it contains at least one token/word from the prompt (or matches
a regex for letters) so the test no longer depends on
downloadModelMock/createMockRuntime internals; reference symbols:
mockRuntime.useModel, MODEL_SPECS, downloadModelMock, and createMockRuntime.
- Around line 62-77: The test passes a `context` field but the TEXT_SMALL
handler (in src/index.ts) expects `prompt` from GenerateTextParams, causing a
mismatch that hides a real bug; update this test (and any other tests in
__tests__/text-gen.test.ts using `context`) to pass `prompt` instead of
`context` when calling mockRuntime.useModel (and ensure the shape matches
GenerateTextParams), so the call to the TEXT_SMALL handler and the
downloadModelMock assertion reflect the true API contract involving
ModelType.TEXT_SMALL, createMockRuntime/mockRuntime.useModel, and
MODEL_SPECS.small.name.

In `@__tests__/text-transcribe.test.ts`:
- Around line 280-281: The test currently accesses error.message without a type
guard; update the assertions to safely read the message (e.g., use a type cast
or guard). Replace expect(error.message).toContain('Whisper model failed') with
a safe access such as expect((error as Error).message).toContain('Whisper model
failed') or first assert typeof error === 'object' && error !== null &&
'message' in error before checking the value; keep the existing
expect(error).toBeDefined() and only change the message access.
- Line 244: The test is accessing error.message where TypeScript types the catch
variable as unknown; update the catch block in the failing test so the variable
is typed or narrowed before reading .message — for example change the catch
signature to catch (error: any) or catch (error: Error | unknown) and then use
(error as Error).message, or add a type guard like isError(error): error is
Error and use error.message; apply this change around the catch that feeds the
expect(error.message) assertion.

In `@__tests__/tts.test.ts`:
- Around line 128-129: The test currently accesses error.message causing a
TypeScript type error; update the catch assertion in __tests__/tts.test.ts to
guard the error type before accessing message — for example, after catching
assign to a variable `error` and use `if (error instanceof Error) {
expect(error.message).toContain('Audio generation failed'); } else {
fail('Expected an Error'); }` so that the type checker knows `message` exists.
- Line 100: The test's catch block treats the caught variable `error` as having
a `message` property but TypeScript types it as `unknown`; update the assertion
to first narrow or cast `error` to Error before accessing `message` (e.g., use
an `if (error instanceof Error) { expect(error.message).toContain('Failed to
initialize TTS model'); } else { fail('Unexpected non-Error thrown'); }` or use
a type cast `expect((error as Error).message).toContain(...)`) so the TypeScript
error is resolved while keeping the original assertion logic in
__tests__/tts.test.ts.

In `@package.json`:
- Line 25: The package.json currently uses the pnpm workspace protocol for the
dependency "@elizaos/core" ("workspace:*"), which must be replaced with a pinned
semver before publishing; update the dependency entry for "@elizaos/core" to a
specific version range (for example "^0.1.0") in package.json or add a publish
preprocessing step (in your CI workflow) that rewrites "workspace:*" to the
chosen semver so the published package contains an explicit version instead of
the workspace protocol.

In `@src/index.ts`:
- Around line 1086-1108: The extractJSON implementation is duplicated in the
OBJECT_SMALL and OBJECT_LARGE handlers; remove both local copies and centralize
the function (e.g., declare a single function extractJSON(text: string) near the
top of the file or in a shared utils module and export/import it), then update
the OBJECT_SMALL and OBJECT_LARGE handlers to call that shared extractJSON;
ensure the signature and return type remain the same and adjust imports/exports
if you place it in a separate module.
- Around line 1349-1363: The TEXT_TO_SPEECH handler (ModelType.TEXT_TO_SPEECH)
returns a Promise<Buffer> (it converts the Readable from
localAIManager.generateSpeech into a Buffer), but the test asserts the result is
a Readable via "audioStream instanceof Readable"; update the test to expect a
Buffer instead: change the assertion to check Buffer.isBuffer(audio) (or
similar, e.g., expect(typeof audio).toBe('object') && Buffer.isBuffer(audio) &&
audio.length > 0) and rename the test variable (audioStream -> audioBuffer) for
clarity so it matches the handler's return type.
- Around line 616-618: The punishTokensFilter in repeatPenalty always calls
this.smallModel.tokenize(...) which is wrong for TEXT_LARGE; update
punishTokensFilter to pick the tokenizer based on the actual model in use (e.g.
if using TEXT_LARGE call this.mediumModel.tokenize(wordsToPunish.join(' '))
otherwise this.smallModel.tokenize(...)) or otherwise reference the correct
model property instead of always this.smallModel so token IDs are produced by
the appropriate model (see repeatPenalty, punishTokensFilter, smallModel,
mediumModel, and TEXT_LARGE).
- Around line 802-825: lazyInitVision currently marks this.visionInitialized =
true but never actually initializes the model; update the function so the
initialization block calls the vision manager's init method (e.g., await
this.visionManager.initialize() or the correct async init function) before
setting this.visionInitialized and logging success, ensure errors still clear
this.visionInitializingPromise and rethrow, and reference the existing fields
visionInitializingPromise, visionInitialized, and logger to maintain current
promise gating and logging behavior.

In `@src/types/whisper-node.d.ts`:
- Around line 1-31: The declaration currently uses CommonJS export pattern
(export = exports) which conflicts with the package's ES6 default export; update
the module declaration so the default export is the whisper function instead:
export the existing types (WhisperOptions, WhisperConfig, TranscriptSegment) as
before and replace the final export block with a default export signature for
the function whisper (i.e., make whisper the ES6 default export) so consumers
can use import whisper from "whisper-node" without workarounds.

In `@src/utils/transcribeManager.ts`:
- Around line 157-168: The checkFFmpegAvailability function uses a
platform-dependent shell string ('which ffmpeg || where ffmpeg') which can fail
on Windows; change checkFFmpegAvailability to run a cross-platform probe: detect
process.platform and run either 'which ffmpeg' (or 'command -v ffmpeg') on POSIX
or 'where ffmpeg' on win32 using execAsync, and as a fallback try running
'ffmpeg -version' to confirm availability; set ffmpegPath, ffmpegAvailable, and
log via logger consistently based on the successful probe, and keep error
handling around execAsync calls in the checkFFmpegAvailability method to clear
ffmpegPath and set ffmpegAvailable=false on failure.
- Around line 271-348: The preprocessAudio function can leak temporary files if
an error occurs after writing tempInputFile but before the cleanup; wrap the
post-write work in a try/finally (or add a finally block) to always remove
tempInputFile and, on failure, remove tempWavFile as well; specifically modify
preprocessAudio to ensure tempInputFile is unlinked in a finally regardless of
success and that tempWavFile is removed if conversion failed (while preserving
the successful return when tempWavFile was produced and should remain),
referencing tempInputFile, tempWavFile, preprocessAudio, and convertToWav to
locate the code to change.

In `@src/utils/ttsManager.ts`:
- Around line 142-149: The code that reads the cached speaker embedding (using
embeddingPath and assigning this.defaultSpeakerEmbedding) constructs a
Float32Array from the Node Buffer which can have an unaligned byteOffset;
instead create an aligned ArrayBuffer copy of the Buffer's bytes (e.g., via
buffer.buffer.slice(buffer.byteOffset, buffer.byteOffset + buffer.length) or by
copying into a new ArrayBuffer) and then build the Float32Array from that
aligned ArrayBuffer so the Float32Array constructor receives a properly aligned
buffer; update the block that reads the file (where fs.readFileSync, buffer,
embeddingPath, and this.defaultSpeakerEmbedding are referenced) to use the
aligned slice/copy before creating the Float32Array.

In `@src/utils/visionManager.ts`:
- Around line 76-81: Add an in-flight initialization promise to guard against
concurrent initialize() runs: introduce a private field like initPromise:
Promise<void> | null on the VisionManager class, update initialize() to return
immediately if initialized, otherwise if initPromise exists await it and return,
and when starting initialization assign initPromise = (async () => { ...actual
init work... })() so other callers await it; ensure initPromise is cleared (or
assigned a resolved value) after success/failure and set initialized = true only
once; apply the same pattern to the other initialization block referenced around
processImage/lines 231-236 so duplicate downloads are prevented.
- Around line 1-6: The file src/utils/visionManager.ts uses global fetch() and
new Blob(...) (at the usages near lines ~388 and ~429) which will fail on Node
<18; fix by explicitly importing the polyfills from undici (e.g., add an import
for fetch and Blob from 'undici' at the top of visionManager.ts, similar to
ttsManager.ts) or alternatively enforce Node ≥18 by adding an "engines"
constraint in package.json; update visionManager.ts to import the undici symbols
(fetch, Blob) so the calls in the functions that call fetch() and construct
Blobs work reliably across Node versions.
🧹 Nitpick comments (13)
__tests__/text-transcribe.test.ts (1)

285-322: Cleanup test may be fragile due to async timing.

The test assumes temporary files are cleaned up synchronously after useModel returns. If the cleanup happens asynchronously or the mock doesn't write temp files with the temp_ prefix, this test could pass falsely. Consider verifying that the mock runtime actually creates and then cleans up the expected temp files.

__tests__/tts.test.ts (1)

195-209: Consider strengthening the non-string input test.

The test passes an object { text: 'not-a-string' } which could be confused with TextToSpeechParams. Based on the src/index.ts handler at line 1408, objects with a text property are valid inputs. Consider testing with a truly invalid type like a number or array to properly test the type validation.

Suggested alternative test input
   test('should handle non-string input', async () => {
     logger.info('Starting non-string input test');
-    const invalidInput = { text: 'not-a-string' };
+    const invalidInput = 12345; // Truly non-string input

     try {
-      await mockRuntime.useModel(ModelType.TEXT_TO_SPEECH, invalidInput as unknown as string);
+      await mockRuntime.useModel(ModelType.TEXT_TO_SPEECH, invalidInput as unknown as string);
       throw new Error("Should have failed but didn't");
     } catch (error) {
src/utils/ttsManager.ts (2)

52-73: Consider adding error event forwarding to PassThrough stream.

The prependWavHeader function forwards 'data' and 'end' events but doesn't forward 'error' events from the source readable. If the source stream emits an error, the PassThrough stream won't propagate it.

Proposed fix
 function prependWavHeader(
   readable: Readable,
   audioLength: number,
   sampleRate: number,
   channelCount = 1,
   bitsPerSample = 16
 ): PassThrough {
   const wavHeader = getWavHeader(audioLength, sampleRate, channelCount, bitsPerSample);
   let pushedHeader = false;
   const passThrough = new PassThrough();
   readable.on('data', (data: Buffer) => {
     if (!pushedHeader) {
       passThrough.push(wavHeader);
       pushedHeader = true;
     }
     passThrough.push(data);
   });
   readable.on('end', () => {
     passThrough.end();
   });
+  readable.on('error', (err) => {
+    passThrough.destroy(err);
+  });
   return passThrough;
 }

92-97: Singleton ignores different cacheDir values after first instantiation.

Once TTSManager.instance is created, subsequent calls to getInstance with a different cacheDir will return the existing instance with the original cacheDir. This could lead to unexpected behavior if different parts of the code expect different cache directories.

Option 1: Validate cacheDir matches
   public static getInstance(cacheDir: string): TTSManager {
     if (!TTSManager.instance) {
       TTSManager.instance = new TTSManager(cacheDir);
+    } else if (TTSManager.instance.cacheDir !== path.join(cacheDir, 'tts')) {
+      logger.warn(`TTSManager already initialized with different cacheDir. Using existing: ${TTSManager.instance.cacheDir}`);
     }
     return TTSManager.instance;
   }
Option 2: Document the behavior

Add a JSDoc comment explaining that the first call determines the cache directory.

src/environment.ts (1)

51-55: Consider potential PII exposure in logs.

The configuration is logged with JSON.stringify(configToParse) and JSON.stringify(validatedConfig), which includes file paths. While paths themselves are usually not sensitive, if the directory names contain usernames or other identifying information, this could be a minor privacy concern in shared log systems.

You may want to consider logging only specific non-sensitive fields or redacting path information in production environments.

__tests__/test-utils.ts (3)

21-23: Side effect at module load time may cause test isolation issues.

Calling process.chdir() at module import time affects all tests that import this module and can cause race conditions in parallel test execution. Consider moving this to a beforeAll hook or making it conditional.

-// During tests, we need to set cwd to agent directory since that's where the plugin runs from in production
-const AGENT_DIR = path.join(WORKSPACE_ROOT, 'packages/project-starter');
-process.chdir(AGENT_DIR);
+// During tests, we need to set cwd to agent directory since that's where the plugin runs from in production
+export const AGENT_DIR = path.join(WORKSPACE_ROOT, 'packages/project-starter');
+
+// Call this in beforeAll() hook in test files that need it
+export function setupTestWorkingDirectory(): void {
+  process.chdir(AGENT_DIR);
+}

96-139: Test mock writes real files and executes system commands.

This mock implementation writes actual files to disk (line 107) and attempts to execute ffmpeg (lines 111-119). This creates:

  1. Test pollution if cleanup fails
  2. Dependency on system-installed FFmpeg
  3. Slow tests due to I/O

For unit tests, consider fully mocking file operations and external processes rather than performing real I/O.


141-196: IMAGE_DESCRIPTION mock performs real network fetch.

The mock calls fetch(imageUrl) (line 152) making actual network requests during tests. This makes tests slow, flaky, and dependent on external services.

♻️ Suggested mock approach
     if (modelType === ModelType.IMAGE_DESCRIPTION) {
       // For image description, we expect a URL as the parameter
       const imageUrl = params as unknown as string;
       if (typeof imageUrl !== 'string') {
         throw new Error('Invalid image URL');
       }

       try {
         logger.info('Attempting to fetch image:', imageUrl);

-        // Mock the fetch and vision processing
-        const response = await fetch(imageUrl);
+        // Return mock result directly without network call
+        const mockResult = {
+          title: 'A test image from Picsum',
+          description:
+            'This is a detailed description of a randomly generated test image.',
+        };
+        logger.info('Generated mock description:', mockResult);
+        return mockResult as R;
src/utils/transcribeManager.ts (2)

9-19: Module-level mutable state for lazy loading.

The whisperModule variable at module scope creates shared mutable state that persists across tests and could cause issues in concurrent environments. The singleton pattern is already in use for the class—consider moving this into the class instance.

♻️ Suggested refactor
-// Lazy load whisper-node to avoid ESM/CommonJS issues
-let whisperModule: any = null;
-async function getWhisper() {
-  if (!whisperModule) {
-    // Dynamic import for CommonJS module
-    const module = await import('whisper-node');
-    // The module exports an object with a whisper property
-    whisperModule = (module as any).whisper;
-  }
-  return whisperModule;
-}

 export class TranscribeManager {
   private static instance: TranscribeManager | null = null;
+  private whisperModule: any = null;
   // ... other fields

+  private async getWhisper() {
+    if (!this.whisperModule) {
+      const module = await import('whisper-node');
+      this.whisperModule = (module as any).whisper;
+    }
+    return this.whisperModule;
+  }

208-213: Singleton getInstance ignores cacheDir on subsequent calls.

If getInstance is called with different cacheDir values, only the first value is used. This silent behavior could lead to unexpected cache locations.

♻️ Suggested fix to warn on mismatch
   public static getInstance(cacheDir: string): TranscribeManager {
     if (!TranscribeManager.instance) {
       TranscribeManager.instance = new TranscribeManager(cacheDir);
+    } else if (TranscribeManager.instance.cacheDir !== path.join(cacheDir, 'whisper')) {
+      logger.warn('TranscribeManager singleton already initialized with different cacheDir');
     }
     return TranscribeManager.instance;
   }
src/index.ts (3)

769-774: Medium model initialization unnecessarily initializes small model.

When only TEXT_LARGE is requested, lazyInitMediumModel calls lazyInitSmallModel just to get the llama instance. This downloads and loads an unneeded model. Consider extracting llama initialization separately.

♻️ Suggested refactor
+  private async ensureLlamaInstance(): Promise<Llama> {
+    if (!this.llama) {
+      this.llama = await getLlama();
+    }
+    return this.llama;
+  }

   private async lazyInitMediumModel(): Promise<void> {
     // ...
-        if (!this.llama) {
-          // Attempt to initialize small model first to get llama instance
-          await this.lazyInitSmallModel();
-        }
+        await this.ensureLlamaInstance();
     // ...
   }

1138-1150: Schema validation only adds missing keys with null, doesn't validate types.

The schema validation logic at lines 1138-1150 and 1250-1262 only checks if keys exist and adds missing ones as null. It doesn't validate types, making the "schema" parameter misleading. If the schema is a Zod schema, consider using schema.safeParse().

Also applies to: 1250-1262


542-548: Context disposal on every generateText call may leak resources on error.

If ctx.dispose() is called but an error occurs before creating a new context, the manager is left without a valid context. Consider using try/finally for safer resource management.

Comment on lines +122 to +127
test('should describe image from URL successfully', async () => {
logger.info('Starting successful image description test');

// Using a reliable test image URL
const imageUrl = 'https://picsum.photos/200/300';
logger.info('Testing with image URL:', imageUrl);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n __tests__/image-desc.test.ts | head -250

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 10692


Avoid live network dependencies in tests. These tests hit external URLs without mocking fetch; Picsum can return 200 for "invalid" paths, and CI often blocks outbound traffic. Mock fetch to return deterministic responses and assert on the real error instead of throwing "Should have failed" exceptions.

🧪 Example: deterministic fetch stubs
@@
-describe('LocalAI Image Description', () => {
+describe('LocalAI Image Description', () => {
+  const fetchMock = vi.fn();
+  vi.stubGlobal('fetch', fetchMock as unknown as typeof fetch);
@@
   test('should describe image from URL successfully', async () => {
+    fetchMock.mockResolvedValueOnce({
+      ok: true,
+      status: 200,
+      statusText: 'OK',
+      headers: { get: () => 'image/jpeg' },
+      arrayBuffer: async () => new ArrayBuffer(8),
+    } as any);
@@
   test('should handle invalid image URL', async () => {
+    fetchMock.mockResolvedValueOnce({
+      ok: false,
+      status: 404,
+      statusText: 'Not Found',
+      headers: { get: () => null },
+      arrayBuffer: async () => new ArrayBuffer(0),
+    } as any);
@@
   test('should handle non-image content type', async () => {
+    fetchMock.mockResolvedValueOnce({
+      ok: true,
+      status: 200,
+      statusText: 'OK',
+      headers: { get: () => 'text/plain' },
+      arrayBuffer: async () => new ArrayBuffer(8),
+    } as any);

Also applies to: 165-168, 237-240

🤖 Prompt for AI Agents
In `@__tests__/image-desc.test.ts` around lines 122 - 127, The tests named "should
describe image from URL successfully" (and the similar failing cases) currently
call out to external URLs; replace those live network calls by mocking
global.fetch (e.g., jest.spyOn(global, 'fetch') or a fetch-mock helper) to
return deterministic Response-like objects for success and failure scenarios,
return controlled image bytes/JSON for the success path and a deterministic
error/HTTP status for the failure path, update the assertions to assert the
actual error or response content instead of throwing "Should have failed", and
ensure the mock is restored/cleared after each test (use afterEach/restoreMocks)
so tests are deterministic and CI-friendly.

Comment on lines +62 to +77
test('should attempt to download small model when using TEXT_SMALL', async () => {
const result = await mockRuntime.useModel(ModelType.TEXT_SMALL, {
context: 'Generate a test response.',
stopSequences: [],
runtime: mockRuntime,
modelClass: ModelType.TEXT_SMALL,
});

expect(downloadModelMock).toHaveBeenCalledTimes(1);
expect(downloadModelMock.mock.calls[0][0]).toMatchObject({
name: MODEL_SPECS.small.name,
});
expect(result).toBeDefined();
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(0);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test uses context but actual API expects prompt.

Looking at the src/index.ts code at lines 1019-1020, the TEXT_SMALL handler destructures { prompt, stopSequences } from GenerateTextParams. The test passes context instead of prompt, which would result in prompt being undefined.

However, since this test uses createMockRuntime() which has its own mock implementation that ignores the params structure, the test passes. This creates a discrepancy between what the test validates and the actual API contract.

Proposed fix
   test('should attempt to download small model when using TEXT_SMALL', async () => {
     const result = await mockRuntime.useModel(ModelType.TEXT_SMALL, {
-      context: 'Generate a test response.',
+      prompt: 'Generate a test response.',
       stopSequences: [],
-      runtime: mockRuntime,
-      modelClass: ModelType.TEXT_SMALL,
     });

Apply the same fix to all test cases in this file that use context instead of prompt.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test('should attempt to download small model when using TEXT_SMALL', async () => {
const result = await mockRuntime.useModel(ModelType.TEXT_SMALL, {
context: 'Generate a test response.',
stopSequences: [],
runtime: mockRuntime,
modelClass: ModelType.TEXT_SMALL,
});
expect(downloadModelMock).toHaveBeenCalledTimes(1);
expect(downloadModelMock.mock.calls[0][0]).toMatchObject({
name: MODEL_SPECS.small.name,
});
expect(result).toBeDefined();
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(0);
});
test('should attempt to download small model when using TEXT_SMALL', async () => {
const result = await mockRuntime.useModel(ModelType.TEXT_SMALL, {
prompt: 'Generate a test response.',
stopSequences: [],
});
expect(downloadModelMock).toHaveBeenCalledTimes(1);
expect(downloadModelMock.mock.calls[0][0]).toMatchObject({
name: MODEL_SPECS.small.name,
});
expect(result).toBeDefined();
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(0);
});
🤖 Prompt for AI Agents
In `@__tests__/text-gen.test.ts` around lines 62 - 77, The test passes a `context`
field but the TEXT_SMALL handler (in src/index.ts) expects `prompt` from
GenerateTextParams, causing a mismatch that hides a real bug; update this test
(and any other tests in __tests__/text-gen.test.ts using `context`) to pass
`prompt` instead of `context` when calling mockRuntime.useModel (and ensure the
shape matches GenerateTextParams), so the call to the TEXT_SMALL handler and the
downloadModelMock assertion reflect the true API contract involving
ModelType.TEXT_SMALL, createMockRuntime/mockRuntime.useModel, and
MODEL_SPECS.small.name.

Comment on lines +79 to +95
test('should attempt to download large model when using TEXT_LARGE', async () => {
const result = await mockRuntime.useModel(ModelType.TEXT_LARGE, {
context: 'Debug Mode: Generate a one-sentence response about artificial intelligence.',
stopSequences: [],
runtime: mockRuntime,
modelClass: ModelType.TEXT_LARGE,
});

expect(downloadModelMock).toHaveBeenCalledTimes(1);
expect(downloadModelMock.mock.calls[0][0]).toMatchObject({
name: MODEL_SPECS.medium.name,
});
expect(result).toBeDefined();
expect(typeof result).toBe('string');
expect(result.length).toBeGreaterThan(10);
expect(result.toLowerCase()).toContain('artificial intelligence');
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same issue: context should be prompt.

This test also uses context instead of prompt. Additionally, the assertion expect(result.toLowerCase()).toContain('artificial intelligence') relies on the mock returning a canned response containing this phrase, which is defined in createMockRuntime. This creates a tight coupling between the test and the mock implementation.

🤖 Prompt for AI Agents
In `@__tests__/text-gen.test.ts` around lines 79 - 95, The test "should attempt to
download large model when using TEXT_LARGE" wrongly passes context instead of
prompt to mockRuntime.useModel and is tightly coupled to the mock's canned
response; update the call to pass prompt: 'Debug Mode: Generate a one-sentence
response about artificial intelligence.' (replace the context property with
prompt) and loosen the assertion that checks the returned text: instead of
asserting it contains the exact phrase 'artificial intelligence' (which ties to
createMockRuntime's canned output), assert general properties like defined,
string, length > 10, and optionally that it contains at least one token/word
from the prompt (or matches a regex for letters) so the test no longer depends
on downloadModelMock/createMockRuntime internals; reference symbols:
mockRuntime.useModel, MODEL_SPECS, downloadModelMock, and createMockRuntime.

error: error instanceof Error ? error.message : String(error),
});
expect(error).toBeDefined();
expect(error.message).toContain('Failed to convert audio');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add type annotation to fix TypeScript error.

The error variable in the catch block is typed as unknown by default in TypeScript. Accessing error.message directly will cause a type error.

Proposed fix
     } catch (error) {
       logger.info('Conversion failure test failed as expected:', {
         error: error instanceof Error ? error.message : String(error),
       });
       expect(error).toBeDefined();
-      expect(error.message).toContain('Failed to convert audio');
+      expect(error instanceof Error && error.message).toContain('Failed to convert audio');
     }

Or use a type guard:

     } catch (error) {
       logger.info('Conversion failure test failed as expected:', {
         error: error instanceof Error ? error.message : String(error),
       });
       expect(error).toBeDefined();
+      expect(error).toBeInstanceOf(Error);
+      expect((error as Error).message).toContain('Failed to convert audio');
-      expect(error.message).toContain('Failed to convert audio');
     }
🤖 Prompt for AI Agents
In `@__tests__/text-transcribe.test.ts` at line 244, The test is accessing
error.message where TypeScript types the catch variable as unknown; update the
catch block in the failing test so the variable is typed or narrowed before
reading .message — for example change the catch signature to catch (error: any)
or catch (error: Error | unknown) and then use (error as Error).message, or add
a type guard like isError(error): error is Error and use error.message; apply
this change around the catch that feeds the expect(error.message) assertion.

Comment on lines +280 to +281
expect(error).toBeDefined();
expect(error.message).toContain('Whisper model failed');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same TypeScript type error with error.message.

Apply the same fix as above for the catch block accessing error.message without a type guard.

Proposed fix
     } catch (error) {
       logger.info('Whisper failure test failed as expected:', {
         error: error instanceof Error ? error.message : String(error),
       });
       expect(error).toBeDefined();
-      expect(error.message).toContain('Whisper model failed');
+      expect(error).toBeInstanceOf(Error);
+      expect((error as Error).message).toContain('Whisper model failed');
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expect(error).toBeDefined();
expect(error.message).toContain('Whisper model failed');
} catch (error) {
logger.info('Whisper failure test failed as expected:', {
error: error instanceof Error ? error.message : String(error),
});
expect(error).toBeDefined();
expect(error).toBeInstanceOf(Error);
expect((error as Error).message).toContain('Whisper model failed');
}
🤖 Prompt for AI Agents
In `@__tests__/text-transcribe.test.ts` around lines 280 - 281, The test currently
accesses error.message without a type guard; update the assertions to safely
read the message (e.g., use a type cast or guard). Replace
expect(error.message).toContain('Whisper model failed') with a safe access such
as expect((error as Error).message).toContain('Whisper model failed') or first
assert typeof error === 'object' && error !== null && 'message' in error before
checking the value; keep the existing expect(error).toBeDefined() and only
change the message access.

Comment on lines +157 to +168
private async checkFFmpegAvailability(): Promise<void> {
try {
const { stdout, stderr } = await execAsync('which ffmpeg || where ffmpeg');
this.ffmpegPath = stdout.trim();
this.ffmpegAvailable = true;
logger.info('FFmpeg found at: ' + this.ffmpegPath);
} catch (error) {
this.ffmpegAvailable = false;
this.ffmpegPath = null;
logger.error('FFmpeg not found in PATH: ' + (error instanceof Error ? error.message : String(error)));
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Platform-specific command may fail on Windows.

The command which ffmpeg || where ffmpeg relies on shell behavior that differs across platforms. On Windows, which doesn't exist by default, and the || operator behavior in cmd.exe differs from Unix shells.

♻️ Suggested cross-platform approach
   private async checkFFmpegAvailability(): Promise<void> {
     try {
-      const { stdout, stderr } = await execAsync('which ffmpeg || where ffmpeg');
+      const isWindows = process.platform === 'win32';
+      const command = isWindows ? 'where ffmpeg' : 'which ffmpeg';
+      const { stdout } = await execAsync(command);
       this.ffmpegPath = stdout.trim();
       this.ffmpegAvailable = true;
       logger.info('FFmpeg found at: ' + this.ffmpegPath);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private async checkFFmpegAvailability(): Promise<void> {
try {
const { stdout, stderr } = await execAsync('which ffmpeg || where ffmpeg');
this.ffmpegPath = stdout.trim();
this.ffmpegAvailable = true;
logger.info('FFmpeg found at: ' + this.ffmpegPath);
} catch (error) {
this.ffmpegAvailable = false;
this.ffmpegPath = null;
logger.error('FFmpeg not found in PATH: ' + (error instanceof Error ? error.message : String(error)));
}
}
private async checkFFmpegAvailability(): Promise<void> {
try {
const isWindows = process.platform === 'win32';
const command = isWindows ? 'where ffmpeg' : 'which ffmpeg';
const { stdout } = await execAsync(command);
this.ffmpegPath = stdout.trim();
this.ffmpegAvailable = true;
logger.info('FFmpeg found at: ' + this.ffmpegPath);
} catch (error) {
this.ffmpegAvailable = false;
this.ffmpegPath = null;
logger.error('FFmpeg not found in PATH: ' + (error instanceof Error ? error.message : String(error)));
}
}
🤖 Prompt for AI Agents
In `@src/utils/transcribeManager.ts` around lines 157 - 168, The
checkFFmpegAvailability function uses a platform-dependent shell string ('which
ffmpeg || where ffmpeg') which can fail on Windows; change
checkFFmpegAvailability to run a cross-platform probe: detect process.platform
and run either 'which ffmpeg' (or 'command -v ffmpeg') on POSIX or 'where
ffmpeg' on win32 using execAsync, and as a fallback try running 'ffmpeg
-version' to confirm availability; set ffmpegPath, ffmpegAvailable, and log via
logger consistently based on the successful probe, and keep error handling
around execAsync calls in the checkFFmpegAvailability method to clear ffmpegPath
and set ffmpegAvailable=false on failure.

Comment on lines +271 to +348
private async preprocessAudio(audioBuffer: Buffer): Promise<string> {
if (!this.ffmpegAvailable) {
throw new Error('FFmpeg is not installed. Please install FFmpeg to use audio transcription.');
}

try {
// Check if the buffer is already a WAV file
const isWav =
audioBuffer.length > 4 &&
audioBuffer.toString('ascii', 0, 4) === 'RIFF' &&
audioBuffer.length > 12 &&
audioBuffer.toString('ascii', 8, 12) === 'WAVE';

// Use appropriate extension based on format detection
const extension = isWav ? '.wav' : '';
const tempInputFile = path.join(this.cacheDir, `temp_input_${Date.now()}${extension}`);
const tempWavFile = path.join(this.cacheDir, `temp_${Date.now()}.wav`);

// logger.info("Creating temporary files", {
// inputFile: tempInputFile,
// wavFile: tempWavFile,
// bufferSize: audioBuffer.length,
// timestamp: new Date().toISOString()
// });

// Write buffer to temporary file
fs.writeFileSync(tempInputFile, audioBuffer);
// logger.info("Temporary input file created", {
// path: tempInputFile,
// size: audioBuffer.length,
// timestamp: new Date().toISOString()
// });

// If already WAV with correct format, skip conversion
if (isWav) {
// Check if it's already in the correct format (16kHz, mono, 16-bit)
try {
const { stdout } = await execAsync(
`ffprobe -v error -show_entries stream=sample_rate,channels,bits_per_raw_sample -of json "${tempInputFile}"`
);
const probeResult = JSON.parse(stdout);
const stream = probeResult.streams?.[0];

if (
stream?.sample_rate === '16000' &&
stream?.channels === 1 &&
(stream?.bits_per_raw_sample === 16 || stream?.bits_per_raw_sample === undefined)
) {
// Already in correct format, just rename
fs.renameSync(tempInputFile, tempWavFile);
return tempWavFile;
}
} catch (probeError) {
// If probe fails, continue with conversion
logger.debug('FFprobe failed, continuing with conversion: ' + (probeError instanceof Error ? probeError.message : String(probeError)));
}
}

// Convert to WAV format
await this.convertToWav(tempInputFile, tempWavFile);

// Clean up the input file
if (fs.existsSync(tempInputFile)) {
fs.unlinkSync(tempInputFile);
// logger.info("Temporary input file cleaned up", {
// path: tempInputFile,
// timestamp: new Date().toISOString()
// });
}

return tempWavFile;
} catch (error) {
logger.error('Audio preprocessing failed: ' + (error instanceof Error ? error.message : String(error)));
throw new Error(
`Failed to preprocess audio: ${error instanceof Error ? error.message : String(error)}`
);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Temporary file cleanup not guaranteed on all error paths.

In preprocessAudio, if an error occurs after writing tempInputFile but before the cleanup block at lines 333-339, the file remains on disk. The cleanup only runs after successful conversion.

♻️ Suggested fix with finally block
   private async preprocessAudio(audioBuffer: Buffer): Promise<string> {
     // ... validation code ...
+    let tempInputFile: string | null = null;
     try {
       // ... format detection ...
-      const tempInputFile = path.join(this.cacheDir, `temp_input_${Date.now()}${extension}`);
+      tempInputFile = path.join(this.cacheDir, `temp_input_${Date.now()}${extension}`);
       const tempWavFile = path.join(this.cacheDir, `temp_${Date.now()}.wav`);
       // ... rest of processing ...
-      // Clean up the input file
-      if (fs.existsSync(tempInputFile)) {
-        fs.unlinkSync(tempInputFile);
-      }
       return tempWavFile;
     } catch (error) {
       logger.error('Audio preprocessing failed: ...');
       throw new Error(...);
+    } finally {
+      // Clean up input file on all paths
+      if (tempInputFile && fs.existsSync(tempInputFile)) {
+        fs.unlinkSync(tempInputFile);
+      }
     }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private async preprocessAudio(audioBuffer: Buffer): Promise<string> {
if (!this.ffmpegAvailable) {
throw new Error('FFmpeg is not installed. Please install FFmpeg to use audio transcription.');
}
try {
// Check if the buffer is already a WAV file
const isWav =
audioBuffer.length > 4 &&
audioBuffer.toString('ascii', 0, 4) === 'RIFF' &&
audioBuffer.length > 12 &&
audioBuffer.toString('ascii', 8, 12) === 'WAVE';
// Use appropriate extension based on format detection
const extension = isWav ? '.wav' : '';
const tempInputFile = path.join(this.cacheDir, `temp_input_${Date.now()}${extension}`);
const tempWavFile = path.join(this.cacheDir, `temp_${Date.now()}.wav`);
// logger.info("Creating temporary files", {
// inputFile: tempInputFile,
// wavFile: tempWavFile,
// bufferSize: audioBuffer.length,
// timestamp: new Date().toISOString()
// });
// Write buffer to temporary file
fs.writeFileSync(tempInputFile, audioBuffer);
// logger.info("Temporary input file created", {
// path: tempInputFile,
// size: audioBuffer.length,
// timestamp: new Date().toISOString()
// });
// If already WAV with correct format, skip conversion
if (isWav) {
// Check if it's already in the correct format (16kHz, mono, 16-bit)
try {
const { stdout } = await execAsync(
`ffprobe -v error -show_entries stream=sample_rate,channels,bits_per_raw_sample -of json "${tempInputFile}"`
);
const probeResult = JSON.parse(stdout);
const stream = probeResult.streams?.[0];
if (
stream?.sample_rate === '16000' &&
stream?.channels === 1 &&
(stream?.bits_per_raw_sample === 16 || stream?.bits_per_raw_sample === undefined)
) {
// Already in correct format, just rename
fs.renameSync(tempInputFile, tempWavFile);
return tempWavFile;
}
} catch (probeError) {
// If probe fails, continue with conversion
logger.debug('FFprobe failed, continuing with conversion: ' + (probeError instanceof Error ? probeError.message : String(probeError)));
}
}
// Convert to WAV format
await this.convertToWav(tempInputFile, tempWavFile);
// Clean up the input file
if (fs.existsSync(tempInputFile)) {
fs.unlinkSync(tempInputFile);
// logger.info("Temporary input file cleaned up", {
// path: tempInputFile,
// timestamp: new Date().toISOString()
// });
}
return tempWavFile;
} catch (error) {
logger.error('Audio preprocessing failed: ' + (error instanceof Error ? error.message : String(error)));
throw new Error(
`Failed to preprocess audio: ${error instanceof Error ? error.message : String(error)}`
);
}
}
private async preprocessAudio(audioBuffer: Buffer): Promise<string> {
if (!this.ffmpegAvailable) {
throw new Error('FFmpeg is not installed. Please install FFmpeg to use audio transcription.');
}
let tempInputFile: string | null = null;
try {
// Check if the buffer is already a WAV file
const isWav =
audioBuffer.length > 4 &&
audioBuffer.toString('ascii', 0, 4) === 'RIFF' &&
audioBuffer.length > 12 &&
audioBuffer.toString('ascii', 8, 12) === 'WAVE';
// Use appropriate extension based on format detection
const extension = isWav ? '.wav' : '';
tempInputFile = path.join(this.cacheDir, `temp_input_${Date.now()}${extension}`);
const tempWavFile = path.join(this.cacheDir, `temp_${Date.now()}.wav`);
// logger.info("Creating temporary files", {
// inputFile: tempInputFile,
// wavFile: tempWavFile,
// bufferSize: audioBuffer.length,
// timestamp: new Date().toISOString()
// });
// Write buffer to temporary file
fs.writeFileSync(tempInputFile, audioBuffer);
// logger.info("Temporary input file created", {
// path: tempInputFile,
// size: audioBuffer.length,
// timestamp: new Date().toISOString()
// });
// If already WAV with correct format, skip conversion
if (isWav) {
// Check if it's already in the correct format (16kHz, mono, 16-bit)
try {
const { stdout } = await execAsync(
`ffprobe -v error -show_entries stream=sample_rate,channels,bits_per_raw_sample -of json "${tempInputFile}"`
);
const probeResult = JSON.parse(stdout);
const stream = probeResult.streams?.[0];
if (
stream?.sample_rate === '16000' &&
stream?.channels === 1 &&
(stream?.bits_per_raw_sample === 16 || stream?.bits_per_raw_sample === undefined)
) {
// Already in correct format, just rename
fs.renameSync(tempInputFile, tempWavFile);
return tempWavFile;
}
} catch (probeError) {
// If probe fails, continue with conversion
logger.debug('FFprobe failed, continuing with conversion: ' + (probeError instanceof Error ? probeError.message : String(probeError)));
}
}
// Convert to WAV format
await this.convertToWav(tempInputFile, tempWavFile);
return tempWavFile;
} catch (error) {
logger.error('Audio preprocessing failed: ' + (error instanceof Error ? error.message : String(error)));
throw new Error(
`Failed to preprocess audio: ${error instanceof Error ? error.message : String(error)}`
);
} finally {
// Clean up input file on all paths
if (tempInputFile && fs.existsSync(tempInputFile)) {
fs.unlinkSync(tempInputFile);
}
}
}
🤖 Prompt for AI Agents
In `@src/utils/transcribeManager.ts` around lines 271 - 348, The preprocessAudio
function can leak temporary files if an error occurs after writing tempInputFile
but before the cleanup; wrap the post-write work in a try/finally (or add a
finally block) to always remove tempInputFile and, on failure, remove
tempWavFile as well; specifically modify preprocessAudio to ensure tempInputFile
is unlinked in a finally regardless of success and that tempWavFile is removed
if conversion failed (while preserving the successful return when tempWavFile
was produced and should remain), referencing tempInputFile, tempWavFile,
preprocessAudio, and convertToWav to locate the code to change.

Comment on lines +142 to +149
if (fs.existsSync(embeddingPath)) {
logger.info('Loading default speaker embedding from cache...');
const buffer = fs.readFileSync(embeddingPath);
this.defaultSpeakerEmbedding = new Float32Array(
buffer.buffer,
buffer.byteOffset,
buffer.length / Float32Array.BYTES_PER_ELEMENT
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue with byteOffset when reading cached embedding file.

When creating a Float32Array from a Buffer read via fs.readFileSync, the buffer's byteOffset may not be aligned to 4 bytes (Float32 boundary), which can cause issues on some platforms. Using buffer.buffer.slice is safer.

Proposed fix
           if (fs.existsSync(embeddingPath)) {
             logger.info('Loading default speaker embedding from cache...');
             const buffer = fs.readFileSync(embeddingPath);
-            this.defaultSpeakerEmbedding = new Float32Array(
-              buffer.buffer,
-              buffer.byteOffset,
-              buffer.length / Float32Array.BYTES_PER_ELEMENT
-            );
+            // Create a copy to ensure proper alignment
+            const arrayBuffer = buffer.buffer.slice(
+              buffer.byteOffset,
+              buffer.byteOffset + buffer.byteLength
+            );
+            this.defaultSpeakerEmbedding = new Float32Array(arrayBuffer);
             logger.success('Default speaker embedding loaded from cache.');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (fs.existsSync(embeddingPath)) {
logger.info('Loading default speaker embedding from cache...');
const buffer = fs.readFileSync(embeddingPath);
this.defaultSpeakerEmbedding = new Float32Array(
buffer.buffer,
buffer.byteOffset,
buffer.length / Float32Array.BYTES_PER_ELEMENT
);
if (fs.existsSync(embeddingPath)) {
logger.info('Loading default speaker embedding from cache...');
const buffer = fs.readFileSync(embeddingPath);
// Create a copy to ensure proper alignment
const arrayBuffer = buffer.buffer.slice(
buffer.byteOffset,
buffer.byteOffset + buffer.byteLength
);
this.defaultSpeakerEmbedding = new Float32Array(arrayBuffer);
logger.success('Default speaker embedding loaded from cache.');
🤖 Prompt for AI Agents
In `@src/utils/ttsManager.ts` around lines 142 - 149, The code that reads the
cached speaker embedding (using embeddingPath and assigning
this.defaultSpeakerEmbedding) constructs a Float32Array from the Node Buffer
which can have an unaligned byteOffset; instead create an aligned ArrayBuffer
copy of the Buffer's bytes (e.g., via buffer.buffer.slice(buffer.byteOffset,
buffer.byteOffset + buffer.length) or by copying into a new ArrayBuffer) and
then build the Float32Array from that aligned ArrayBuffer so the Float32Array
constructor receives a properly aligned buffer; update the block that reads the
file (where fs.readFileSync, buffer, embeddingPath, and
this.defaultSpeakerEmbedding are referenced) to use the aligned slice/copy
before creating the Float32Array.

Comment on lines +1 to +6
import { existsSync } from 'node:fs';
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import process from 'node:process';
import { logger } from '@elizaos/core';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and read the relevant sections
head -20 src/utils/visionManager.ts

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 621


🏁 Script executed:

# Also check lines 369-433
sed -n '369,433p' src/utils/visionManager.ts | head -70

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 2668


🏁 Script executed:

# Check package.json for Node engine constraints
cat package.json | grep -A 5 '"engines"'

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 64


🏁 Script executed:

# Search for fetch and Blob usage in the file
rg -n 'fetch\(|new Blob|Blob\(' src/utils/visionManager.ts

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 260


🏁 Script executed:

# Check package.json for engines and dependencies
cat package.json

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 3398


🏁 Script executed:

# Search for undici or any fetch polyfill imports
rg -n 'undici|fetch.*polyfill|ponyfill' src/

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 139


🏁 Script executed:

# Check if there's any runtime check for fetch/Blob availability
rg -n 'typeof fetch|typeof Blob|globalThis' src/utils/visionManager.ts

Repository: elizaos-plugins/plugin-local-embedding

Length of output: 64


Import fetch from undici or enforce Node ≥18. The code uses the global fetch() API and Blob constructor at lines 388 and 429, which are unavailable in Node <18. While undici is already a dependency, it must be explicitly imported (as done in ttsManager.ts), or add an engines constraint in package.json to enforce Node ≥18.

🤖 Prompt for AI Agents
In `@src/utils/visionManager.ts` around lines 1 - 6, The file
src/utils/visionManager.ts uses global fetch() and new Blob(...) (at the usages
near lines ~388 and ~429) which will fail on Node <18; fix by explicitly
importing the polyfills from undici (e.g., add an import for fetch and Blob from
'undici' at the top of visionManager.ts, similar to ttsManager.ts) or
alternatively enforce Node ≥18 by adding an "engines" constraint in
package.json; update visionManager.ts to import the undici symbols (fetch, Blob)
so the calls in the functions that call fetch() and construct Blobs work
reliably across Node versions.

Comment on lines +76 to +81
private initialized = false;
private downloadManager: DownloadManager;
private modelDownloaded = false;
private tokenizerDownloaded = false;
private processorDownloaded = false;
private platformConfig: PlatformConfig;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against concurrent initialization races. Multiple processImage calls can enter initialize() simultaneously, causing duplicate downloads and heavy parallel work. Add an in-flight promise guard.

🔒 Suggested in-flight initialization guard
@@
   private initialized = false;
+  private initializing: Promise<void> | null = null;
@@
-  private async initialize() {
-    try {
-      if (this.initialized) {
-        logger.info('Vision model already initialized, skipping initialization');
-        return;
-      }
+  private async initialize() {
+    if (this.initialized) {
+      logger.info('Vision model already initialized, skipping initialization');
+      return;
+    }
+    if (this.initializing) {
+      await this.initializing;
+      return;
+    }
+    this.initializing = (async () => {
+      try {
       logger.info('Starting vision model initialization...');
@@
-      this.initialized = true;
-      logger.success('Vision model initialization complete');
-    } catch (error) {
-      logger.error('Vision model initialization failed: ' + (error instanceof Error ? error.message : String(error)));
-      throw error;
-    }
+        this.initialized = true;
+        logger.success('Vision model initialization complete');
+      } catch (error) {
+        logger.error('Vision model initialization failed: ' + (error instanceof Error ? error.message : String(error)));
+        throw error;
+      }
+    })();
+    try {
+      await this.initializing;
+    } finally {
+      this.initializing = null;
+    }
   }

Also applies to: 231-236

🤖 Prompt for AI Agents
In `@src/utils/visionManager.ts` around lines 76 - 81, Add an in-flight
initialization promise to guard against concurrent initialize() runs: introduce
a private field like initPromise: Promise<void> | null on the VisionManager
class, update initialize() to return immediately if initialized, otherwise if
initPromise exists await it and return, and when starting initialization assign
initPromise = (async () => { ...actual init work... })() so other callers await
it; ensure initPromise is cleared (or assigned a resolved value) after
success/failure and set initialized = true only once; apply the same pattern to
the other initialization block referenced around processImage/lines 231-236 so
duplicate downloads are prevented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant