feat: add external providers for audio transcription and LLM-based post-processing [post processing is similer to PR #355] by avijitbhuin21 · Pull Request #466 · cjpais/Handy

avijitbhuin21 · 2025-12-16T17:02:08Z

Summary

This PR adds support for external AI providers for both audio transcription and post-processing, enabling users to leverage cloud-based services for faster and more accurate speech-to-text conversion. This speeds up the processing and post processing significantly using provider like groq or cerebras even in my potato pc.

Features

🎤 Online Audio Transcription Providers

Added support for multiple external transcription providers:
- OpenAI (Whisper)
- Groq (Whisper Large V3 / V3 Turbo)
- Gemini (2.0/2.5 Flash, Flash Lite, Pro models)
- SambaNova (Whisper Large V3)
Per-provider API key storage with secure masking
Per-provider model selection with persistence
Toggle to switch between local and online providers

📝 Post-Processing with LLM Providers

Extended post-processing to support multiple LLM providers:
- OpenAI, OpenRouter, Gemini, Groq, Cerebrus, SambaNova, and Custom endpoints
Configurable base URL for self-hosted/custom endpoints
Model fetching and selection with refresh capability
Custom prompt management (create, edit, delete prompts)

🔧 Backend Changes

New Tauri commands for online provider settings management
API key and model storage per provider
Settings persistence for both transcription and post-processing providers

UI Changes

New "Online Providers" settings panel with provider/model/API key configuration
Enhanced "Post Processing" settings with multi-provider support
Consistent UI patterns with the existing settings design

Screenshots

…viders

shortcut.rs (1 warning) Unused import: APPLE_INTELLIGENCE_DEFAULT_MODEL_ID — removed from imports signal_handle.rs (6 warnings) Unused import: crate::actions::ACTION_MAP — added #[cfg(unix)] Unused import: crate::ManagedToggleState — added #[cfg(unix)] Unused imports: debug , info, warn from log — added #[cfg(unix)] Unused import: std::thread — added #[cfg(unix)] Unused imports: AppHandle, Manager from tauri — added #[cfg(unix)] (These were all only used in unix-specific code, so adding #[cfg(unix)] prevents the warnings on Windows) settings.rs (1 warning) Unused mut: providers variable — added #[allow(unused_mut)] (it's only mutated on macOS aarch64) lib.rs (1 warning) Unused mut: builder variable — added #[allow(unused_mut)] (it's only mutated on macOS with the nspanel plugin)

cjpais · 2025-12-17T02:53:00Z

To be honest with you, I'm not exactly sure what to say. There's been many discussions around this topic and why it hasn't been approved before. Have you read them? What makes this PR different? I actually think you've done the best job of implementing the UI so far, but that doesn't really change my overall stance on API models. I am not particularly interested in having them in Handy for a variety of reasons at the moment and you can go ahead and read the existing lengthy conversations on this. You're welcome to continue to challenge my opinion and position as well as collect feedback from the community. Clear and obvious feedback, support, and discussion from the community on the best way forward would probably sway my opinion.

I may consider this PR if you remove the part with API models for transcription and let's just consider the post-processing part for now.

Also if you are implementing API models you should use the support transcribe-rs has for it already.. and if it is not sufficient we should add more support there. I know this is almost certainly a vibe coded PR so this probably wasn't thought of. If it was vibe coded it would probably be helpful to know the prompts because at least it shows a clear intent and purpose, because a bunch of LLM generated text as the PR is something I will basically always skip reading because I want to know a humans intention not a machines intention.

Unfortunately I have to be a bit defensive of what makes it into the codebase and I really need to have someone who has a strong enough opinion or community support to have things make it into the repo. Not every PR can be accepted otherwise the entire app turns into an unmaintainable mess and it's already close to that in my opinion.

cjpais · 2025-12-17T16:32:52Z

You gotta clean this PR up. It's not in any state to be merged. There's a bunch of breaking changes, things like changing tauri.conf.json and other stuff is not gonna fly.

Review every file for the changes, and only submit what is absolutely necessary. I skimmed the PR and don't have time to review a bunch of random changes to files that don't seem meaningful.

….tsx formatting changes - Reverted shortcut.rs extra blank line - Reverted tauri.conf.json to restore Windows signing command and original formatting

avijitbhuin21 · 2025-12-17T17:24:50Z

Hi @cjpais,

Thank you for the detailed and thoughtful feedback! I apologize for not reading through the previous discussions (#77, #279, #222, Discussion #168) before submitting. I've now gone through them and understand your position on keeping Handy primarily a local transcription app.

What I've done with this PR:

✅ Removed the API transcription feature as you requested
✅ Keep only the LLM post-processing feature which seems to have more community interest

Regarding transcribe-rs and future API support:

I checked transcribe-rs and noticed its openai feature currently uses a hardcoded enum that only supports 3 OpenAI models (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe). This means providers like Groq and SambaNova won't work out of the box since they use different model names (whisper-large-v3, whisper-large-v3-turbo).

I'll raise a PR to transcribe-rs to:

Allow passing model names as strings instead of the fixed enum
Add helper configurations for providers like Groq, SambaNova, and others with OpenAI-compatible endpoints

Once that's merged, I can create a proper PR for Handy that uses transcribe-rs for API transcription, following your preferred approach.

Why this matters to me (and others):

I have a low-end PC where local transcription is quite slow. Providers like Groq and SambaNova offer extremely fast transcription (often under 1 second) via their OpenAI-compatible APIs, which makes Handy usable for people like me who don't have powerful hardware. I believe there are others in the community with similar needs (like the user in #77).

About AI assistance:

Yes, I used AI tools (Claude opus 4.5) to help implement this. My intent was to:

Enable fast transcription for users with slow hardware
Add flexible LLM post-processing with multiple providers

I should have been more thoughtful about how this fits with the project's architecture and your vision.

here is the updated ui for post processing

post proessing disabled:

post processing enabled:

post processing tab:

The history remains same as above

cjpais · 2025-12-18T02:34:46Z

I have a low-end PC where local transcription is quite slow. Providers like Groq and SambaNova offer extremely fast transcription (often under 1 second) via their OpenAI-compatible APIs, which makes Handy usable for people like me who don't have powerful hardware. I believe there are others in the community with similar needs (like the user in #77).

This is fundamentally the same thing as the earlier discussion. My stance has not significantly changed. Please do not submit a PR for it without gathering significant support in Discussions. As well as coming up with a way forward which does not require someone to use an API key. Or hides this functionality in a nice way, which enables power users to use it.

As for this PR. Please describe what you have done exactly in your own language. No AI descriptions. How does this improve Handy? I can see some details in the screenshot but I want you to be explicit about it, so I can validate what the code is doing based on what you wanted to do. Post processing is not a 'general' feature right now. There's a reason it's in debug and I will be moving it to a new menu when I think the feature is good enough for prime time. I'm happy to accept other UI/UX improvements to the feature itself in the meantime

avijitbhuin21 · 2025-12-18T03:51:11Z

It's okay, since i need this feature desperately i'll modify and build it or my use only. i will close this pr as there is not much left without the online api support. post processing is already in debug state so there is not much point in implementing it.
Thanks for the feedback man appreciate your time

cjpais · 2025-12-18T03:56:15Z

Sounds good, and if you do want to get the API support in the mainline build please go collect support so we can have an open discussion there!

User-3090 · 2026-01-20T09:28:44Z

A good use case would be to use https://github.com/speaches-ai/speaches locally. Parakeet models are really not that good, and support for other Whisper models is currently broken. The ability to specify an API endpoint would give the user the possibility to run any desired model locally and frees you up from supporting whatever new models yourself.

User-3090 · 2026-01-20T09:43:06Z

Here are some other use cases:

A small company with thin client hardware could purchase one beefy box equipped with a GPU, enabling all clients to utilize it for transcriptions.

A family with one GPU server running, everyone accessing it via Tailscale wherever they are.

User-3090 · 2026-01-20T09:58:45Z

You mentioned in another thread that you envisioned users sharing their GPU capabilities to others in order to improve transcriptions.

To my knowledge, sharing that node in a tailnet is by far the easiest and most secure way to accomplish that.

All we would need is the ability to specify an API endpoint in Handy.

cjpais · 2026-01-20T10:16:03Z

Handy is not going to support STT API providers. People are welcome to fork if they want that support. It will eventually be a local provider itself.

avijitbhuin21 · 2026-01-20T11:36:00Z

Hi @User-3090 ,

I’ve created Babbl:
https://github.com/avijitbhuin21/Babbl/releases/tag/v0.1.3

Babbl is a fork of Handy, extended with API integration and post-processing features.
So far, my friends and I have been using it with Groq for super-fast processing, and it has been working very well.

Please check it out and let me know your thoughts.

avijitbhuin21 and others added 6 commits December 16, 2025 05:58

ui work

f101b02

implemented external providers and post processing using external pro…

8042672

…viders

Merge branch 'main' into main

f2c3c48

merge conflict resolved

d7cec5d

feat: Updated cause the build was failing

3922bfb

removed api transcription

66b838c

avijitbhuin21 added 6 commits December 17, 2025 22:04

Post processing ui changes

1c03ff9

revert: remove cosmetic/formatting changes from PR - Reverted Sidebar…

2fdf2bd

….tsx formatting changes - Reverted shortcut.rs extra blank line - Reverted tauri.conf.json to restore Windows signing command and original formatting

Feat: Cleaned PR

ecc18c2

reverted binding.ts

64d413c

revertd binding.ts

26583e6

fixed uto spacing

b4d1104

avijitbhuin21 closed this Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add external providers for audio transcription and LLM-based post-processing [post processing is similer to PR #355]#466

feat: add external providers for audio transcription and LLM-based post-processing [post processing is similer to PR #355]#466
avijitbhuin21 wants to merge 13 commits intocjpais:mainfrom
avijitbhuin21:main

avijitbhuin21 commented Dec 16, 2025 •

edited

Loading

Uh oh!

cjpais commented Dec 17, 2025 •

edited

Loading

Uh oh!

cjpais commented Dec 17, 2025 •

edited

Loading

Uh oh!

avijitbhuin21 commented Dec 17, 2025

Uh oh!

cjpais commented Dec 18, 2025 •

edited

Loading

Uh oh!

avijitbhuin21 commented Dec 18, 2025

Uh oh!

cjpais commented Dec 18, 2025

Uh oh!

User-3090 commented Jan 20, 2026 •

edited

Loading

Uh oh!

User-3090 commented Jan 20, 2026

Uh oh!

User-3090 commented Jan 20, 2026

Uh oh!

cjpais commented Jan 20, 2026

Uh oh!

avijitbhuin21 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

avijitbhuin21 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

🎤 Online Audio Transcription Providers

📝 Post-Processing with LLM Providers

🔧 Backend Changes

UI Changes

Related

Screenshots

Uh oh!

cjpais commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avijitbhuin21 commented Dec 17, 2025

What I've done with this PR:

Regarding transcribe-rs and future API support:

Why this matters to me (and others):

About AI assistance:

post proessing disabled:

post processing enabled:

post processing tab:

The history remains same as above

Uh oh!

cjpais commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avijitbhuin21 commented Dec 18, 2025

Uh oh!

cjpais commented Dec 18, 2025

Uh oh!

User-3090 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

User-3090 commented Jan 20, 2026

Uh oh!

User-3090 commented Jan 20, 2026

Uh oh!

cjpais commented Jan 20, 2026

Uh oh!

avijitbhuin21 commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avijitbhuin21 commented Dec 16, 2025 •

edited

Loading

cjpais commented Dec 17, 2025 •

edited

Loading

cjpais commented Dec 17, 2025 •

edited

Loading

cjpais commented Dec 18, 2025 •

edited

Loading

User-3090 commented Jan 20, 2026 •

edited

Loading