Skip to content

Add TranscriptControl with out-of-process Whisper transcription service#51

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/add-transcript-control-usercontrol
Draft

Add TranscriptControl with out-of-process Whisper transcription service#51
Copilot wants to merge 5 commits into
mainfrom
copilot/add-transcript-control-usercontrol

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 21, 2025

Implements audio transcription via Whisper ONNX models using an out-of-process COM server architecture for cross-app reusability.

Architecture

3 new projects:

  • Bookmarkly.Transcription.Abstractions - WinRT contract library (.idl.winmd)
  • Bookmarkly.Transcription - ONNX inference, audio processing, tokenization
  • Bookmarkly.Transcription.Server - OOP COM server (single instance, ref counting)

WinRT Interface:

interface ITranscriptionService
{
    IAsyncOperation<String> TranscribeAsync(StorageFile audioFile);
    IAsyncOperation<String> TranscribeWithLanguageAsync(StorageFile audioFile, String languageCode);
    IAsyncOperation<IVector<String>> GetSupportedLanguagesAsync();
}

Audio Pipeline:

  1. Resample to 16kHz mono (NAudio)
  2. Compute mel spectrogram (80 bins, 25ms windows, 10ms hop)
  3. ONNX encoder → hidden states
  4. ONNX decoder → tokens → text
  5. Chunk files >30s, concatenate results

TranscriptControl UI

Location: Bookmarkly.Views/Controls/TranscriptControl.xaml

Features:

  • Drag-drop + picker for .wav/.mp3/.m4a/.flac/.ogg
  • Shimmer loading animation (5 gradient lines)
  • Language selector (14 languages)
  • Copy-to-clipboard with feedback
  • Scrollable, selectable transcript

Dependency Properties:

  • AudioFile (StorageFile) - input
  • Transcript (string, read-only) - output
  • IsTranscribing (bool) - loading state
  • SelectedLanguage (string) - language code

Usage:

<controls:TranscriptControl AudioFile="{x:Bind ViewModel.AudioFile, Mode=OneWay}" />

Configuration

Package.appxmanifest:

<uap5:Extension Category="windows.activatableClass.outOfProcessServer">
  <uap5:OutOfProcessServer ServerName="Bookmarkly.Transcription.Server">
    <uap5:Path>Bookmarkly.Transcription.Server\Bookmarkly.Transcription.Server.exe</uap5:Path>
    <uap5:ActivatableClass ActivatableClassId="Bookmarkly.Transcription.TranscriptionService" />
  </uap5:OutOfProcessServer>
</uap5:Extension>

NuGet packages:

  • Microsoft.ML.OnnxRuntime + DirectML (1.16.3) - GPU-accelerated inference
  • NAudio (2.2.1) - audio I/O

Implementation Notes

  • Operates in placeholder mode when Whisper models unavailable (demo transcripts)
  • Service instance reused per control (not per-call)
  • Pre-allocated List capacity for audio samples
  • GetMany implementation corrected for IVector
  • Whisper models: https://huggingface.co/onnx-community/whisper-base

Files: 19 changed (12 new), 1660+ lines added

Original prompt

Overview

Build a TranscriptControl UserControl that accepts an audio file and uses the Whisper base model (from https://huggingface.co/onnx-community/whisper-base) to render the transcript. The transcription service should run in a separate out-of-process COM server so it can be reused by other apps from the same publisher.

Architecture Requirements

Project Structure

Create the following new projects in the solution:

  1. Bookmarkly.Transcription.Abstractions (WinRT Contract Library)

    • Contains WinRT interface definitions (.idl files compiled to .winmd)
    • Define ITranscriptionService interface with methods:
      • TranscribeAsync(StorageFile audioFile) - returns transcript text
      • TranscribeWithLanguageAsync(StorageFile audioFile, string languageCode) - transcribe with specific language
      • GetSupportedLanguagesAsync() - returns list of supported languages
    • This project produces a .winmd that both server and client reference
  2. Bookmarkly.Transcription (Class Library)

  3. Bookmarkly.Transcription.Server (Out-of-Process WinRT COM Server EXE)

    • Implements the WinRT runtime class TranscriptionService that implements ITranscriptionService
    • References both Abstractions and Transcription projects
    • Entry point (Program.cs) that:
      • Registers the WinRT activation factory
      • Keeps the server alive while clients are connected
      • Handles graceful shutdown
    • Follow the pattern from https://github.com/roxk/packaged-oop-winrt-server-app-extension for proper OOP WinRT server setup

Package Manifest Updates

Update Bookmarkly.App/Package.appxmanifest to register the out-of-process server:

<Extensions>
  <uap5:Extension Category="windows.activatableClass.outOfProcessServer">
    <uap5:OutOfProcessServer ServerName="Bookmarkly.Transcription.Server"
                             uap5:IdentityType="activateAsPackage"
                             uap5:RunFullTrust="true">
      <uap5:Path>Bookmarkly.Transcription.Server\Bookmarkly.Transcription.Server.exe</uap5:Path>
      <uap5:Instancing>singleInstance</uap5:Instancing>
      <uap5:ActivatableClass ActivatableClassId="Bookmarkly.Transcription.TranscriptionService" />
    </uap5:OutOfProcessServer>
  </uap5:Extension>
</Extensions>

Add required namespace: xmlns:uap5="http://schemas.microsoft.com/appx/manifest/uap/windows10/5"

TranscriptControl UI Requirements

Location

Create in Bookmarkly.Views project:

  • Controls/TranscriptControl.xaml
  • Controls/TranscriptControl.xaml.cs

UI Layout

┌─────────────────────────────────────────────────────────┐
│                              [Language ▼] [📋 Copy]     │  <- Top right toolbar
├─────────────────────────────────────────────────────────┤
│                                                         │
│  [Drop audio file here or click to browse]              │  <- File input area
│                                                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Transcript text appears here...                        │  <- Scrollable text area
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ (shimmer when loading)  │
│                                                         │
└─────────────────────────────────────────────────────────┘

Features

  1. File Input

    • Accept audio files via drag-drop or file picker
    • Support common formats: .wav, .mp3, .m4a, .flac, .ogg
    • Show file name after selection
  2. Shimmer Loading Effect

    • Display animated shimmer placeholder lines while transcription is in progress
    • Use WinUI 3 shimmer/skeleton pattern with animated gradient
    • Show 3-5 shimmer lines of varying widths
  3. Language Picker (Top Right)

    • ComboBox/Dropdown with supported languages
    • Default to "Auto-detect"
    • Populate from GetSupportedLanguagesAsync()
    • Common languages: English, Spanish, French, German, Chinese, Japanese, etc.
  4. Copy Button (Top Right)

    • Button with copy icon (📋 or Segoe Fluent icon)
    • Copies transcript text to clipboard
    • Show brief "Copied!" tooltip feedbac...

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits December 21, 2025 21:58
- Created Bookmarkly.Transcription.Abstractions with WinRT interface definitions
- Created Bookmarkly.Transcription with ONNX/Whisper implementation
- Created Bookmarkly.Transcription.Server for out-of-process COM server
- Created TranscriptControl UserControl in Bookmarkly.Views
- Updated Package.appxmanifest to register OOP server
- Updated Directory.Packages.Props with required NuGet packages
- Updated solution file to include new projects
- Added EnableWindowsTargeting property for Linux builds

Co-authored-by: Kumara-Krishnan <[email protected]>
- Optimize AudioProcessor to pre-allocate list capacity
- Add comments for tensor copy efficiency
- Reuse TranscriptionService instance in TranscriptControl
- Fix GetMany implementation with correct parameter handling
- Improve COM server lifetime management with ref counting

Co-authored-by: Kumara-Krishnan <[email protected]>
Copilot AI changed the title [WIP] Add TranscriptControl UserControl for audio transcription Add TranscriptControl with out-of-process Whisper transcription service Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants