macOS menu bar companion app. Lives entirely in the macOS status bar (no dock icon, no main window). Clicking the menu bar icon opens a custom floating panel with companion voice and text controls. Uses push-to-talk (ctrl+option) to capture voice input, transcribes it via OpenAI audio transcription, and sends the transcript + a screenshot of the user's screen to OpenAI. Users can also open a typed prompt with Command+Shift+Return; typed messages use the same screenshot → OpenAI → TTS → pointing pipeline as voice. OpenAI responds with text (streamed via SSE) and voice (OpenAI TTS). A blue cursor overlay can fly to and point at UI elements the AI references on any connected monitor.
All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in the app.
- App Type: Menu bar-only (
LSUIElement=true), no dock icon or main window - Framework: SwiftUI (macOS native) with AppKit bridging for menu bar panel and cursor overlay
- Pattern: MVVM with
@StateObject/@Publishedstate management - AI Chat: OpenAI (GPT-4o default, GPT-4o mini optional) via Cloudflare Worker proxy with SSE streaming
- Speech-to-Text: OpenAI audio transcription via Cloudflare Worker proxy, with Apple Speech as a local fallback
- Text-to-Speech: OpenAI speech API via Cloudflare Worker proxy
- Screen Capture: ScreenCaptureKit (macOS 14.2+), multi-monitor support
- Voice Input: Push-to-talk via
AVAudioEngine+ pluggable transcription-provider layer. System-wide keyboard shortcut via listen-only CGEvent tap. - Text Input: Command+Shift+Return opens a floating typed prompt. Submitted text bypasses transcription and enters the same screenshot + OpenAI + TTS + pointing response pipeline as voice transcripts.
- Element Pointing: The AI embeds
[POINT:x,y:label:screenN]tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target. - Concurrency:
@MainActorisolation, async/await throughout - Analytics: PostHog via
ClickyAnalytics.swift
The app never calls external APIs directly. All requests go through a Cloudflare Worker (worker/src/index.ts) that holds the real API keys as secrets.
| Route | Upstream | Purpose |
|---|---|---|
POST /chat |
api.openai.com/v1/chat/completions |
OpenAI vision + streaming chat |
POST /tts |
api.openai.com/v1/audio/speech |
OpenAI TTS audio |
POST /transcribe |
api.openai.com/v1/audio/transcriptions |
OpenAI audio transcription |
Worker secrets: OPENAI_API_KEY
Menu Bar Panel Pattern: The companion panel uses NSStatusItem for the menu bar icon and a custom borderless NSPanel for the floating control panel. This gives full control over appearance (dark, rounded corners, custom shadow) and avoids the standard macOS menu/popover chrome. The panel is non-activating so it doesn't steal focus. A global event monitor auto-dismisses it on outside clicks.
Cursor Overlay: A full-screen transparent NSPanel hosts the blue cursor companion. It's non-activating, joins all Spaces, and never steals focus. The cursor position, response text, waveform, and pointing animations all render in this overlay via SwiftUI through NSHostingView.
Global Push-To-Talk Shortcut: Background push-to-talk uses a listen-only CGEvent tap instead of an AppKit global monitor so modifier-based shortcuts like ctrl + option are detected more reliably while the app is running in the background.
Global Text Prompt Shortcut: Background typed input uses a separate listen-only CGEvent tap for Command+Shift+Return. This avoids sharing ctrl + option, which starts voice recording as soon as those modifiers are pressed.
Transient Cursor Mode: When "Show Clicky" is off, pressing the hotkey fades in the cursor overlay for the duration of the interaction (recording → response → TTS → optional pointing), then fades it out automatically after 1 second of inactivity.
| File | Lines | Purpose |
|---|---|---|
leanring_buddyApp.swift |
~89 | Menu bar app entry point. Uses @NSApplicationDelegateAdaptor with CompanionAppDelegate which creates MenuBarPanelManager and starts CompanionManager. No main window — the app lives entirely in the status bar. |
CompanionManager.swift |
~1076 | Central state machine. Owns dictation, shortcut monitoring, text prompt routing, screen capture, OpenAI API, OpenAI TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full voice/text → screenshot → OpenAI → TTS → pointing pipeline. |
MenuBarPanelManager.swift |
~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. |
CompanionPanelView.swift |
~811 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk and typed-input instructions, model picker, permissions UI, DM feedback button, and quit button. Dark aesthetic using DS design system. |
OverlayWindow.swift |
~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. |
TextPromptWindowManager.swift |
~233 | Floating text prompt NSPanel and SwiftUI input view. Lets users type messages and submit them into the shared companion response pipeline. |
CompanionResponseOverlay.swift |
~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. |
CompanionScreenCaptureUtility.swift |
~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. |
BuddyDictationManager.swift |
~866 | Push-to-talk voice pipeline. Handles microphone capture via AVAudioEngine, provider-aware permission checks, keyboard/button dictation sessions, transcript finalization, shortcut parsing, contextual keyterms, and live audio-level reporting for waveform feedback. |
BuddyTranscriptionProvider.swift |
~73 | Protocol surface and provider factory for voice transcription backends. Resolves provider based on VoiceTranscriptionProvider in Info.plist — OpenAI or Apple Speech. |
OpenAIAudioTranscriptionProvider.swift |
~317 | Upload-based transcription provider. Buffers push-to-talk audio locally, uploads as WAV on release, returns finalized transcript. |
AppleSpeechTranscriptionProvider.swift |
~147 | Local fallback transcription provider backed by Apple's Speech framework. |
BuddyAudioConversionSupport.swift |
~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio and builds WAV payloads for upload-based providers. |
GlobalPushToTalkShortcutMonitor.swift |
~132 | System-wide push-to-talk monitor. Owns the listen-only CGEvent tap and publishes press/release transitions. |
GlobalTextPromptShortcutMonitor.swift |
~132 | System-wide typed prompt monitor. Owns the listen-only CGEvent tap for Command+Shift+Return and publishes prompt-open events. |
OpenAIAPI.swift |
~253 | OpenAI GPT vision API client with streaming (SSE) and non-streaming modes. |
OpenAITTSClient.swift |
~66 | OpenAI TTS client. Sends text to the Worker proxy, plays back audio via AVAudioPlayer. Exposes isPlaying for transient cursor scheduling. |
DesignSystem.swift |
~880 | Design system tokens — colors, corner radii, shared styles. All UI references DS.Colors, DS.CornerRadius, etc. |
ClickyAnalytics.swift |
~121 | PostHog analytics integration for usage tracking. |
WindowPositionManager.swift |
~262 | Window placement logic, Screen Recording permission flow, and accessibility permission helpers. |
AppBundleConfiguration.swift |
~28 | Runtime configuration reader for keys stored in the app bundle Info.plist. |
worker/src/index.ts |
~146 | Cloudflare Worker proxy. Routes: /chat, /tts, /transcribe, and /health, all using OpenAI except /health. |
# Open in Xcode
open leanring-buddy.xcodeproj
# Select the leanring-buddy scheme, set signing team, Cmd+R to build and run
# Known non-blocking warnings: Swift 6 concurrency warnings,
# deprecated onChange warning in OverlayWindow.swift. Do NOT attempt to fix these.Do NOT run xcodebuild from the terminal — it invalidates TCC (Transparency, Consent, and Control) permissions and the app will need to re-request screen recording, accessibility, etc.
cd worker
npm install
# Add secrets
npx wrangler secret put OPENAI_API_KEY
# Deploy
npx wrangler deploy
# Local dev (create worker/.dev.vars with your keys)
npx wrangler devIMPORTANT: Follow these naming rules strictly. Clarity is the top priority.
- Be as clear and specific with variable and method names as possible
- Optimize for clarity over concision. A developer with zero context on the codebase should immediately understand what a variable or method does just from reading its name
- Use longer names when it improves clarity. Do NOT use single-character variable names
- Example: use
originalQuestionLastAnsweredDateinstead oforiginalAnswered - When passing props or arguments to functions, keep the same names as the original variable. Do not shorten or abbreviate parameter names. If you have
currentCardData, pass it ascurrentCardData, notcardorcardData
- Clear is better than clever. Do not write functionality in fewer lines if it makes the code harder to understand
- Write more lines of code if additional lines improve readability and comprehension
- Make things so clear that someone with zero context would completely understand the variable names, method names, what things do, and why they exist
- When a variable or method name alone cannot fully explain something, add a comment explaining what is happening and why
- Use SwiftUI for all UI unless a feature is only supported in AppKit (e.g.,
NSPanelfor floating windows) - All UI state updates must be on
@MainActor - Use async/await for all asynchronous operations
- Comments should explain "why" not just "what", especially for non-obvious AppKit bridging
- AppKit
NSPanel/NSWindowbridged into SwiftUI viaNSHostingView - All buttons must show a pointer cursor on hover
- For any interactive element, explicitly think through its hover behavior (cursor, visual feedback, and whether hover should communicate clickability)
- Do not add features, refactor code, or make "improvements" beyond what was asked
- Do not add docstrings, comments, or type annotations to code you did not change
- Do not try to fix the known non-blocking warnings (Swift 6 concurrency, deprecated onChange)
- Do not rename the project directory or scheme (the "leanring" typo is intentional/legacy)
- Do not run
xcodebuildfrom the terminal — it invalidates TCC permissions
- Branch naming:
feature/descriptionorfix/description - Commit messages: imperative mood, concise, explain the "why" not the "what"
- Do not force-push to main
When you make changes to this project that affect the information in this file, update this file to reflect those changes. Specifically:
- New files: Add new source files to the "Key Files" table with their purpose and approximate line count
- Deleted files: Remove entries for files that no longer exist
- Architecture changes: Update the architecture section if you introduce new patterns, frameworks, or significant structural changes
- Build changes: Update build commands if the build process changes
- New conventions: If the user establishes a new coding convention during a session, add it to the appropriate conventions section
- Line count drift: If a file's line count changes significantly (>50 lines), update the approximate count in the Key Files table
Do NOT update this file for minor edits, bug fixes, or changes that don't affect the documented architecture or conventions.