added more 3 vars for LLM post-processing transcript#704
added more 3 vars for LLM post-processing transcript#704khanhicetea wants to merge 2 commits intocjpais:mainfrom
Conversation
|
btw, I plan to have 1 more var named '${language}' which is picked by user in settings for multi-lang model. Should I push it ? |
|
Sure. If you can, can you also add documentation of this feature in the readme or somewhere |
I tested on my mac works (not sure about Win and Linux) This template can provide more ways to do with transcribe (fix typo, grammar, even translation or dual translation output) 😄 |
| /// that has keyboard focus when the user starts transcribing. | ||
|
|
||
| #[cfg(target_os = "macos")] | ||
| pub fn get_frontmost_app_name() -> Option<String> { |
There was a problem hiding this comment.
hmm so macOS impl returns app name via localizedName (ie "Safari"), but Windows uses GetWindowTextW which returns the window title (ie "Stack Overflow - How to do X - Google Chrome"). this means ${current_app} has different semantics per platform, and ${short_prev_transcript} context keying by app name would create separate entries per browser tab on Windows. maybe consider using GetModuleFileNameExW or querying the process name instead?
| ); | ||
|
|
||
| // Replace ${output} variable in the prompt with the actual text | ||
| let processed_prompt = prompt.replace("${output}", transcription); |
There was a problem hiding this comment.
the change logs the entire prompt including transcription text AND app name. this is a privacy concern since the log file could contain sensitive speech content
recommend reverting
|
|
||
| [target.'cfg(target_os = "macos")'.dependencies] | ||
| tauri-nspanel = { git = "https://github.com/ahkohd/tauri-nspanel", branch = "v2.1" } | ||
| objc = "0.2" |
There was a problem hiding this comment.
crate is deprecated in favor of objc2 which is already a transitive dep in the project. maybe we look into objc2-app-kit or a simpler approach for the macOS active app detection
| use tauri::Manager; | ||
|
|
||
| /// Tracks the frontmost application captured at recording start, keyed by binding_id | ||
| static RECORDING_APP_CONTEXT: Lazy<Mutex<HashMap<String, String>>> = |
There was a problem hiding this comment.
nit: entries are inserted in start() and removed in stop(), but CancelAction doesn't clean up its entry. over many cancelled recordings this could accumulate stale entries. minor since the values are small strings, but worth adding cleanup in the cancel path too
|
solid feature idea, the context variables will make post-processing much more useful. some notes below~ merge conflicts + coordination with #706 this PR is currently conflicting against main and also has a semantic conflict with #706 (structured outputs). that PR splits prompts into system prompt + user content, so the variable substitutions here ( |
Before Submitting This PR
Please confirm you have done the following:
If this is a feature or change that was previously closed/rejected:
Human Written Description
We have more variables in prompt when post-processing:
$time_local : current time local string
$current_app : use for identify the purpose of work, can teach the LLM about in prompt template
$short_prev_transcript : a trim about 200 words last transcript in same app (or expired in 5 mins), so LLM has more context about what next.
$language : Selected transcription language for Whisper models (e.g., "en", "zh-Hans"), or "auto" for other models (Parakeet, Moonshine)
Related Issues/Discussions
Fixes #168
Discussion:
Community Feedback
Testing
Using this prompt:
Screenshots/Videos (if applicable)
AI Assistance
If AI was used: