diff --git a/.github/workflows/porter-debug.yml b/.github/workflows/porter-debug.yml index 088ac69e2d..628029dbb0 100644 --- a/.github/workflows/porter-debug.yml +++ b/.github/workflows/porter-debug.yml @@ -1,7 +1,7 @@ "on": push: branches: - - cloud/livekit-grpc + - cloud/livekit-ios-bug paths: - "cloud/**" name: ๐Ÿš€ [debug] Porter.run Deploy diff --git a/cloud/docker-compose.dev.yml b/cloud/docker-compose.dev.yml index 8b999498ce..3624cb66f0 100644 --- a/cloud/docker-compose.dev.yml +++ b/cloud/docker-compose.dev.yml @@ -22,6 +22,7 @@ services: - CONTAINER_ENVIRONMENT=true - CLOUD_HOST_NAME=cloud - LIVEKIT_GRPC_SOCKET=/var/run/livekit/bridge.sock + - LIVEKIT_PCM_ENDIAN=off env_file: - .env volumes: @@ -47,6 +48,8 @@ services: - LIVEKIT_API_KEY=${LIVEKIT_API_KEY} - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET} - LIVEKIT_GRPC_SOCKET=/var/run/livekit/bridge.sock + - BETTERSTACK_SOURCE_TOKEN=${BETTERSTACK_SOURCE_TOKEN} + - BETTERSTACK_INGESTING_HOST=s1311181.eu-nbg-2.betterstackdata.com volumes: - livekit_socket:/var/run/livekit restart: "no" diff --git a/cloud/docs/development/local-setup.mdx b/cloud/docs/development/local-setup.mdx index 876b5a9dfc..27ec679288 100644 --- a/cloud/docs/development/local-setup.mdx +++ b/cloud/docs/development/local-setup.mdx @@ -7,6 +7,10 @@ description: "Run MentraOS Cloud locally for development and testing" This guide helps you set up MentraOS Cloud on your local machine for development. You'll learn how to configure the environment, expose your local server using ngrok, and connect the mobile app to your development cloud instance. + +**New to MentraOS?** If you're setting up your entire development environment from scratch (including Node.js, Git, Android Studio, etc.), start with the [Complete Beginner's Setup Guide](https://docs.mentra.glass/beginner-setup-guide) first. This guide focuses specifically on cloud backend development. + + ## Prerequisites Before starting, ensure you have: diff --git a/cloud/issues/livekit-ios-bug/CURRENT-STATUS.md b/cloud/issues/livekit-ios-bug/CURRENT-STATUS.md new file mode 100644 index 0000000000..0ed32fded0 --- /dev/null +++ b/cloud/issues/livekit-ios-bug/CURRENT-STATUS.md @@ -0,0 +1,336 @@ +# LiveKit iOS Bug - Current Status + +**Date:** 2025-10-17 +**Status:** ๐Ÿ” Root Cause Identified - Ready for Implementation +**Priority:** Critical + +--- + +## What We Know (Confirmed) + +### 1. Grace Period Cleanup Was Disabled โœ… Fixed + +```typescript +// Before: +const GRACE_PERIOD_CLEANUP_ENABLED = false; // Sessions never expired + +// After: +const GRACE_PERIOD_CLEANUP_ENABLED = true; // Sessions expire after 60s +``` + +**Impact:** Sessions now properly dispose after 60 seconds when glasses disconnect. + +### 2. LiveKit Identity Conflict (Theory - Needs Testing) + +**Problem:** All servers use the same LiveKit identity for the same user: + +```typescript +identity: `cloud-agent:user@example.com`; +``` + +**What happens:** + +1. User on Server A with LiveKit bridge connected +2. User switches to Server B +3. Server B joins LiveKit room with SAME identity +4. LiveKit kicks Server A's bridge out (duplicate identity rule) +5. Server A's session stays alive (up to 60s grace period) +6. If user switches back to Server A before 60s, rejoins broken session +7. LiveKit bridge is disconnected and has no reconnection logic + +### 3. No Bridge Reconnection Logic + +**File:** `service.go:125-133` + +```go +OnDisconnected: func() { + log.Printf("Disconnected from LiveKit room: %s", req.RoomName) + // โ† No reconnection attempt! +} +``` + +**Impact:** Once kicked out, bridge never rejoins automatically. + +### 4. Reconnection Doesn't Reinitialize LiveKit + +**File:** `websocket-glasses.service.ts:577-590` + +```typescript +if (!reconnection) { + // โ† Only runs on NEW connections + await userSession.appManager.startPreviouslyRunningApps(); +} + +if (livekitRequested) { + // โ† Only if explicitly requested in THIS connection + const livekitInfo = await userSession.liveKitManager.handleLiveKitInit(); +} +``` + +**Impact:** Reconnecting to existing session doesn't reinitialize broken LiveKit bridge. + +--- + +## What We Fixed + +### โœ… Grace Period Cleanup Enabled + +- **File:** `websocket-glasses.service.ts:43` +- **Change:** `GRACE_PERIOD_CLEANUP_ENABLED = true` +- **Impact:** Sessions now expire 60 seconds after glasses disconnect + +### โœ… Better Stack Logging for Go Bridge + +- **Files:** + - `logger/betterstack.go` - HTTP logger for Go + - `main.go` - Integrated logger + - `service.go` - Added logging to JoinRoom/LeaveRoom + - `docker-compose.dev.yml` - Added env vars +- **Impact:** Can now see what's happening in Go bridge via Better Stack + +### โŒ Reverted: Glasses WebSocket Checks + +- **Why:** Would break grace period functionality +- **Correct behavior:** Sessions SHOULD stay alive during grace period even if glasses WS is closed + +--- + +## What We Need to Fix + +### Priority 1: Bridge Health Monitoring + Auto-Reinitialize (CRITICAL) + +**Problem:** Token expires after 10 minutes. If laptop sleeps > 10 minutes, bridge resume fails. + +**Log Evidence:** + +``` +2025/10/18 04:12:27 "msg"="resume connection failed" +"error"="unauthorized: invalid token: ..., token is expired (exp)" +``` + +**Solution:** Detect broken bridge and reinitialize automatically + +```typescript +// In UserSession or LiveKitManager +setInterval(async () => { + if (!this.livekitRequested) return; // Skip if LiveKit not enabled + + const isHealthy = await this.liveKitManager.checkBridgeHealth(); + if (!isHealthy) { + this.logger.warn("Bridge unhealthy, reinitializing with fresh token..."); + await this.liveKitManager.handleLiveKitInit(); // Creates new bridge + } +}, 30000); // Every 30 seconds +``` + +**Why Critical:** + +- Handles laptop sleep/wake (token expiration) +- Handles server switches (bridge kicked out) +- Handles network disconnects > 10 minutes +- Single fix solves multiple scenarios + +### Priority 2: Always Reinitialize LiveKit on Reconnection (If Previously Enabled) + +**File:** `websocket-glasses.service.ts:handleConnectionInit()` + +**Current logic:** + +```typescript +if (livekitRequested) { + // Only if requested in current connection + await userSession.liveKitManager.handleLiveKitInit(); +} +``` + +**Should be:** + +```typescript +// Check if LiveKit was previously enabled OR is being requested now +const shouldInitLiveKit = livekitRequested || userSession.livekitRequested; + +if (shouldInitLiveKit) { + // Always reinitialize (creates new bridge if old one was kicked out) + const livekitInfo = await userSession.liveKitManager.handleLiveKitInit(); + // ... +} +``` + +**Why:** When user switches back to a server within grace period, the session exists but the LiveKit bridge was kicked out. We need to rejoin the room. + +### Priority 3: Bridge Auto-Reconnection in Go (Future Enhancement) + +Add reconnection logic to Go bridge: + +```go +OnDisconnected: func() { + log.Printf("Disconnected, attempting to rejoin...") + + // TODO: Request new token from TypeScript via gRPC + // TODO: Reconnect to room with new token +} +``` + +**Requires:** + +- New gRPC method for token refresh +- Token refresh logic in TypeScript + +--- + +## Testing Plan + +### Test 1: Switch A โ†’ B โ†’ Wait 70s โ†’ Switch Back to A + +**Expected:** + +- Server A session expires after 60s +- Switching back creates NEW session +- Fresh LiveKit bridge +- Everything works โœ… + +### Test 2: Switch A โ†’ B โ†’ Switch Back (within 60s) + +**Current behavior:** + +- Rejoin existing session +- LiveKit bridge is dead (was kicked out) +- Apps disconnect repeatedly โŒ + +**After Priority 1 & 2 fixes:** + +- Health check detects broken bridge +- Reinitializes with fresh token +- Everything works โœ… + +### Test 3: Stay on One Server, Brief Network Hiccup + +**Expected:** + +- Grace period (glasses WS closed temporarily) +- Reconnect within 60s +- Resume existing session +- Everything works โœ… + +**Should still work after fixes** โœ… + +### Test 4: Laptop Sleep > 10 Minutes โ†’ Wake Up + +**Current behavior:** + +- Token expired during sleep (TTL: 10 min) +- Bridge resume fails: "token is expired" +- Bridge permanently broken โŒ + +**After Priority 1 fix:** + +- Health check detects failed bridge +- Reinitializes with fresh token +- Everything works โœ… + +### Test 5: Monitor Bridge Disconnections via Better Stack + +```sql +-- Check for LiveKit disconnections +service:livekit-bridge AND message:"Disconnected from LiveKit room" + +-- Check for token expiration on resume +service:livekit-bridge AND error:*token is expired* + +-- Check for identity conflicts (if they exist) +service:livekit-bridge AND error:*duplicate* +``` + +### Test 6: Long Network Disconnect (> 10 minutes) + +**Expected:** Same as laptop sleep scenario - health check should recover โœ… + +--- + +## Open Questions + +1. **Does LiveKit SDK have built-in reconnection?** + - Check livekit-server-sdk-go documentation + - May already handle some scenarios + +2. **Should we use unique identity per server?** + - `cloud-agent:server-name:user@example.com` + - Prevents kicks, but multiple bridges in same room is wasteful + - Probably not needed if Priority 1 fix works + +3. **What happens to audio during bridge disconnect?** + - Is it buffered? + - How long until client notices? + - Need to test actual user experience + +4. **Is there a way to detect "kicked out" vs "network disconnect"?** + - Different error codes or events? + - Would help decide reconnection strategy + +--- + +## Known Breaking Scenarios + +### 1. Laptop Sleep > 10 Minutes ๐Ÿ”ด CRITICAL + +- Token expires (TTL: 10 min) +- Resume fails with expired token +- Bridge permanently broken +- **Fix:** Priority 1 (health monitoring) + +### 2. Server Switch Within Grace Period ๐Ÿ”ด CRITICAL + +- Bridge kicked out by other server +- Not reinitialized on reconnection +- Bridge permanently broken +- **Fix:** Priority 1 + Priority 2 + +### 3. Network Disconnect > 10 Minutes ๐Ÿ”ด CRITICAL + +- Same as laptop sleep +- Token expires during disconnect +- **Fix:** Priority 1 (health monitoring) + +### 4. Server Switch After Grace Period ๐ŸŸข WORKS + +- Session expires, creates new session +- Fresh bridge created +- **Already working** with grace period cleanup enabled + +--- + +## Next Steps + +1. **Implement Priority 1** (Bridge health monitoring + auto-reinitialize) - CRITICAL + - Solves laptop sleep, network disconnect, token expiration +2. **Implement Priority 2** (Always reinitialize on reconnection if previously enabled) + - Solves server switch within grace period +3. **Test all scenarios** (sleep, switch, network disconnect) +4. **Monitor Better Stack logs** for "resume connection failed" and "token is expired" +5. **Consider Priority 3** (Go bridge auto-reconnection) as future enhancement + +--- + +## Related Documentation + +- [LIVEKIT-IDENTITY-CONFLICT-THEORY.md](./LIVEKIT-IDENTITY-CONFLICT-THEORY.md) - Detailed theory +- [TOKEN-EXPIRATION-ANALYSIS.md](./TOKEN-EXPIRATION-ANALYSIS.md) - Token lifetime issues +- [README.md](./README.md) - Original bug report +- [QUICK-START.md](../../packages/cloud-livekit-bridge/QUICK-START.md) - Better Stack setup + +--- + +## Summary + +**The Core Issues:** + +1. **Token expiration** (laptop sleep > 10 min) - bridge resume fails, no recovery +2. **Identity conflict** (server switches) - bridge kicked out, not reinitialized +3. **No health monitoring** - broken bridges go undetected + +**The Primary Fixes:** + +1. **Bridge health monitoring** - detect failures, auto-reinitialize with fresh token +2. **Always reinitialize on reconnection** - handle server switch scenarios + +**Status:** Ready to implement and test. diff --git a/cloud/issues/livekit-ios-bug/DEBUG-COMMANDS.md b/cloud/issues/livekit-ios-bug/DEBUG-COMMANDS.md new file mode 100644 index 0000000000..e410f3ca0c --- /dev/null +++ b/cloud/issues/livekit-ios-bug/DEBUG-COMMANDS.md @@ -0,0 +1,264 @@ +# Debug Commands for LiveKit iOS Bug + +## Better Stack Logging Setup (RECOMMENDED) + +### Quick Setup for Go Bridge Logs + +The Go LiveKit bridge logs are not currently being captured. Follow these steps to send them to Better Stack: + +1. **Create Better Stack HTTP Source:** + + ```bash + # Go to https://telemetry.betterstack.com/ + # Create new source with platform: "HTTP" + # Name: "LiveKit gRPC Bridge" + # Save the token and ingesting host + ``` + +2. **Add to .env:** + + ```bash + BETTERSTACK_SOURCE_TOKEN=your_token_here + BETTERSTACK_INGESTING_HOST=sXXX.region.betterstackdata.com + ``` + +3. **Update docker-compose.dev.yml:** + + ```yaml + livekit-bridge: + environment: + - BETTERSTACK_SOURCE_TOKEN=${BETTERSTACK_SOURCE_TOKEN} + - BETTERSTACK_INGESTING_HOST=${BETTERSTACK_INGESTING_HOST} + ``` + +4. **See full setup guide:** + ```bash + cat cloud/packages/cloud-livekit-bridge/BETTERSTACK_SETUP.md + ``` + +### Search Better Stack After Setup + +Once configured, you can search logs with: + +- `service:livekit-bridge AND error:*token is expired*` +- `service:livekit-bridge AND user_id:"isaiah@mentra.glass"` +- `service:livekit-bridge AND message:*JoinRoom*` +- `service:livekit-bridge AND level:error` + +## Porter Setup + +```bash +# Login to Porter +porter auth login + +# List clusters +porter cluster list + +# Set cluster (for centralus) +porter cluster set 4689 +``` + +## Check Environment Variables + +```bash +# Get kubeconfig +porter kubectl --print-kubeconfig > /tmp/kubeconfig.yaml +export KUBECONFIG=/tmp/kubeconfig.yaml + +# Find cloud pods +kubectl get pods -n default | grep cloud + +# Check environment variables for cloud-debug +kubectl exec cloud-debug-cloud-XXXXX -n default -- env | grep LIVEKIT + +# Check environment variables for cloud-livekit +kubectl exec cloud-livekit-cloud-XXXXX -n default -- env | grep LIVEKIT +``` + +## Get Logs + +```bash +# Cloud debug logs (last 100 lines with livekit mentions) +kubectl logs cloud-debug-cloud-XXXXX -n default --tail=100 | grep -i livekit + +# Cloud livekit logs +kubectl logs cloud-livekit-cloud-XXXXX -n default --tail=100 | grep -i livekit + +# All recent logs without filter +kubectl logs cloud-debug-cloud-XXXXX -n default --tail=200 + +# Follow logs in real-time +kubectl logs cloud-debug-cloud-XXXXX -n default -f +``` + +## Check for Bridge Container/Sidecar + +```bash +# List containers in a pod +kubectl get pod cloud-debug-cloud-XXXXX -n default -o jsonpath='{.spec.containers[*].name}' + +# If bridge is a sidecar, get its logs +kubectl logs cloud-debug-cloud-XXXXX -n default -c livekit-bridge --tail=200 + +# Check all pods with "bridge" in name +kubectl get pods -n default | grep bridge +``` + +## Get gRPC Bridge Logs + +```bash +# Find bridge pods +kubectl get pods -n default | grep livekit + +# Get bridge logs +kubectl logs cloud-livekit-cloud-XXXXX -n default --tail=500 + +# Search for specific user +kubectl logs cloud-livekit-cloud-XXXXX -n default --tail=500 | grep "isaiah@mentra.glass" + +# Search for errors +kubectl logs cloud-livekit-cloud-XXXXX -n default --tail=500 | grep -i "error\|fail" +``` + +## Test Specific User Flow + +```bash +# Watch logs for specific user in real-time +kubectl logs cloud-debug-cloud-XXXXX -n default -f | grep "isaiah@mentra.glass" + +# Get all logs for user in last hour +kubectl logs cloud-debug-cloud-XXXXX -n default --since=1h | grep "isaiah@mentra.glass" +``` + +## Check Services and Endpoints + +```bash +# List services +kubectl get svc -n default | grep -E "livekit|cloud" + +# Get service details +kubectl describe svc cloud-debug-cloud -n default +kubectl describe svc cloud-livekit-cloud -n default + +# Check endpoints +kubectl get endpoints -n default | grep cloud +``` + +## Compare Regions + +```bash +# Centralus (cloud-debug) +kubectl exec cloud-debug-cloud-XXXXX -n default -- env | grep -E "LIVEKIT|REGION" | sort + +# Cloud-livekit +kubectl exec cloud-livekit-cloud-XXXXX -n default -- env | grep -E "LIVEKIT|REGION" | sort + +# Switch to france cluster +porter cluster set 4696 +export KUBECONFIG=/tmp/kubeconfig-france.yaml +porter kubectl --print-kubeconfig > /tmp/kubeconfig-france.yaml + +# Check france pods +kubectl get pods -n default | grep cloud + +# Compare france env vars +kubectl exec cloud-france-XXXXX -n default -- env | grep LIVEKIT +``` + +## BetterStack Queries + +```bash +# Already have connection established in the session +# Search for user logs with livekit +SELECT dt, raw +FROM remote(t373499_augmentos_logs) +WHERE raw LIKE '%isaiah@mentra.glass%' + AND raw LIKE '%livekit%' +ORDER BY dt DESC +LIMIT 100 + +# Search for errors +SELECT dt, raw +FROM remote(t373499_augmentos_logs) +WHERE (raw LIKE '%livekit%' OR raw LIKE '%LiveKit%') + AND (raw LIKE '%error%' OR raw LIKE '%Error%' OR raw LIKE '%fail%') +ORDER BY dt DESC +LIMIT 100 + +# Filter by region +SELECT dt, raw +FROM remote(t373499_augmentos_logs) +WHERE raw LIKE '%livekit%' + AND raw LIKE '%region%' + AND (raw LIKE '%centralus%' OR raw LIKE '%france%') +ORDER BY dt DESC +LIMIT 100 +``` + +## Test Region Switch Manually + +1. **Connect to cloud-debug (centralus):** + + ```bash + # In mobile app settings, select cloud-debug + # Watch logs: + kubectl logs cloud-debug-cloud-XXXXX -n default -f | grep "isaiah@mentra.glass" + ``` + +2. **Verify LiveKit is working:** + - Enable microphone + - Check for "Bridge health" logs + - Confirm audio is flowing + +3. **Switch to cloud-livekit:** + + ```bash + # In mobile app settings, select cloud-livekit + # Watch new logs: + kubectl logs cloud-livekit-cloud-XXXXX -n default -f | grep "isaiah@mentra.glass" + ``` + +4. **Capture logs from OLD region:** + ```bash + # Check if old session disposed + kubectl logs cloud-debug-cloud-XXXXX -n default --tail=200 | grep -A 20 "Disposing.*isaiah@mentra.glass" + ``` + +## Mobile Debug (if needed) + +```bash +# iOS logs (using Xcode or react-native) +npx react-native log-ios | grep -i livekit + +# Android logs +adb logcat | grep -i livekit +``` + +## Quick Health Check + +```bash +# Check all cloud services are running +kubectl get pods -n default | grep -E "cloud-debug|cloud-livekit|cloud-prod" + +# Check if pods are healthy +kubectl get pods -n default -o wide | grep cloud + +# Restart a pod if needed (rolling restart) +kubectl rollout restart deployment/cloud-debug-cloud -n default +``` + +## Useful Porter Commands + +```bash +# List all apps +porter app list + +# Get app details +porter app get cloud-debug + +# View app logs through porter (if supported) +porter logs cloud-debug + +# Check app status +porter app status cloud-debug +``` diff --git a/cloud/issues/livekit-ios-bug/LIVEKIT-IDENTITY-CONFLICT-THEORY.md b/cloud/issues/livekit-ios-bug/LIVEKIT-IDENTITY-CONFLICT-THEORY.md new file mode 100644 index 0000000000..59c62f22ed --- /dev/null +++ b/cloud/issues/livekit-ios-bug/LIVEKIT-IDENTITY-CONFLICT-THEORY.md @@ -0,0 +1,690 @@ +# LiveKit Identity Conflict Theory + +**Date:** 2025-10-17 +**Status:** ๐Ÿ” Theory - Needs Testing +**Priority:** High + +--- + +## Executive Summary + +**Theory:** When a user switches between cloud servers, both servers try to join the same LiveKit room with the same participant identity, causing the old server's bridge to be kicked out. If the user switches back before the session expires (60 second grace period), they rejoin a session where the LiveKit bridge was already kicked out and has no mechanism to rejoin. + +--- + +## The LiveKit Identity Setup + +### How Identities Are Generated + +**File:** `LiveKitManager.ts:182-185` + +```typescript +// mintAgentBridgeToken() +const at = new AccessToken(this.apiKey, this.apiSecret, { + identity: `cloud-agent:${this.session.userId}`, // โ† Same across all servers! + ttl: "600000m", // 10 minutes +}); +``` + +**Key Facts:** + +- **Identity:** `cloud-agent:user@example.com` +- **Room name:** `user@example.com` (same as userId) +- **Identity is the SAME across all servers** for the same user + +### LiveKit Room Participant Rules + +From LiveKit documentation: + +- Each participant in a room must have a **unique identity** +- If a participant joins with an identity that's already in the room, **the previous participant is kicked out** +- The kicked participant receives an `OnDisconnected` event +- **There is no automatic reconnection** - the application must handle reconnection + +--- + +## The Problem Scenario + +### Scenario 1: Switch Server A โ†’ Server B (Normal Case) + +``` +Time: T+0s +[User on Server A - cloud-local] +โœ… LiveKit bridge connected + - Identity: cloud-agent:user@example.com + - Room: user@example.com + - Status: Connected, receiving audio + +Time: T+10s +[User switches to Server B - cloud-prod] +โœ… Mobile connects to Server B +โœ… Server B creates new session +โœ… Server B starts LiveKit bridge + +[LiveKit Room] +โŒ Server B joins with identity: cloud-agent:user@example.com +โŒ LiveKit kicks Server A's bridge out (duplicate identity!) +โŒ Server A receives OnDisconnected event + +[Server A] +โŒ Glasses WebSocket closed (code 1000) +โŒ LiveKit bridge kicked out by Server B +โŒ Session stays alive (grace period cleanup disabled) +โŒ No reconnection logic in Go bridge +``` + +**Result:** Server A's session is now broken (no LiveKit bridge), but stays alive for 60 seconds. + +### Scenario 2: Laptop Sleep/Wake with Token Expiration + +``` +Time: T+0s +[User on Server A - laptop awake] +โœ… LiveKit bridge connected + - Identity: cloud-agent:user@example.com + - Token TTL: 10 minutes (600 seconds) + - Token issued at: T+0s + - Token expires at: T+600s + +Time: T+60s +[Laptop goes to sleep] +โŒ Network connections suspended +โŒ LiveKit connection frozen (not disconnected, just suspended) +โŒ Session stays alive on server + +Time: T+660s (11 minutes later - laptop wakes up) +[Laptop wakes, network restores] +โœ… LiveKit SDK attempts to resume connection +โŒ Token expired (issued at T+0s, expired at T+600s) +โŒ Resume fails: "invalid token: token is expired (exp)" +โŒ Bridge permanently broken + +[Server State] +โœ… Session still alive (glasses never explicitly disconnected) +โœ… Glasses WebSocket may reconnect successfully +โŒ LiveKit bridge: DISCONNECTED (token expired on resume) +โŒ No mechanism to detect bridge is down +โŒ No mechanism to request new token +โŒ No mechanism to rejoin room +``` + +**Result:** After laptop wakes from sleep, LiveKit bridge is permanently broken due to expired token on resume attempt. + +**Log Evidence:** + +``` +2025/10/18 04:12:27 "msg"="resume connection failed" +"error"="unauthorized: invalid token: ..., error: go-jose/go-jose/jwt: +validation failed, token is expired (exp)" +``` + +### Scenario 3: Switch Back Before Grace Period Expires + +``` +Time: T+0s +[User on Server A] +โœ… LiveKit bridge connected + +Time: T+10s +[User switches to Server B] +โœ… Server B LiveKit bridge connects +โŒ Server A LiveKit bridge KICKED OUT + +Time: T+30s (before 60 second grace period) +[User switches BACK to Server A] +โœ… Glasses reconnect to Server A +โœ… UserSession.createOrReconnect() finds existing session +โœ… Returns { reconnection: true } + +[handleConnectionInit with reconnection: true] +โŒ Apps NOT restarted (reconnection flag) +โŒ LiveKit NOT reinitialized (depends on livekitRequested flag) + +[Server A Session State] +โœ… Glasses WebSocket: OPEN +โŒ LiveKit bridge: DISCONNECTED (was kicked out) +โŒ No mechanism to detect bridge is disconnected +โŒ No mechanism to rejoin LiveKit room +``` + +**Result:** User is back on Server A, but LiveKit bridge is broken and never recovers! + +### Scenario 4: Token Expiration on Kicked Bridge + +``` +Time: T+0s +[Server A LiveKit bridge] +โœ… Connected with token (TTL: 10 minutes) + +Time: T+10s +[Kicked out by Server B] +โŒ Disconnected from room + +Time: T+11m (11 minutes later - token expired) +[If bridge tried to reconnect] +โŒ Token expired +โŒ Cannot rejoin room +โŒ No mechanism to request new token +``` + +**Result:** Even if bridge tried to reconnect, token would be expired. + +--- + +## Evidence from Code + +### No Reconnection Logic in Go Bridge + +**File:** `service.go:125-133` + +```go +OnDisconnected: func() { + log.Printf("Disconnected from LiveKit room: %s", req.RoomName) + s.bsLogger.LogWarn("Disconnected from LiveKit room", map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + }) +} +``` + +**What's missing:** + +- No reconnection attempt when disconnected +- No token refresh mechanism +- No notification to TypeScript cloud that bridge is down +- Session just sits there with broken LiveKit + +### No Bridge Health Check + +**File:** `LiveKitGrpcClient.ts` + +There's no mechanism to: + +- Check if Go bridge is still connected to LiveKit +- Detect if bridge was kicked out +- Automatically rejoin if disconnected +- Request new token if current token expired + +### Reconnection Skips LiveKit Initialization + +**File:** `websocket-glasses.service.ts:564-650` + +```typescript +if (!reconnection) { + // โ† Only runs on NEW connections + await userSession.appManager.startApp(SYSTEM_DASHBOARD_PACKAGE_NAME); + await userSession.appManager.startPreviouslyRunningApps(); +} + +// LiveKit initialization +if (livekitRequested) { + // โ† Only if explicitly requested + const livekitInfo = await userSession.liveKitManager.handleLiveKitInit(); +} +``` + +**Issue:** On reconnection, LiveKit might not be reinitialized even if it was kicked out. + +--- + +## Why This Explains the Observed Behavior + +### Observation 1: Apps Keep Disconnecting (Code 1006) + +Apps disconnect because: + +- LiveKit bridge is disconnected +- No audio flowing through the bridge +- Apps detect the broken state and disconnect +- Apps try to reconnect, but state is still broken +- Cycle repeats + +### Observation 2: Local Server Still Getting Audio + +After switching to prod: + +- Local server's session still alive +- Apps still connected to local server +- But LiveKit bridge was kicked out by prod server +- Audio from mobile goes to prod (working) +- Audio from local bridge is dead (kicked out) + +### Observation 3: Repeated Reconnect Cycles + +Apps reconnect every ~15-20 seconds because: + +- Apps cache the server URL +- They keep trying to reconnect +- Reconnection succeeds initially +- But LiveKit bridge is broken +- Connection fails after ~15 seconds +- Retry cycle continues + +--- + +## What Should Happen vs What Does Happen + +### Ideal Flow: Server Switch with Grace Period + +``` +1. User switches Server A โ†’ Server B +2. Server A's glasses WebSocket closes +3. Server A's LiveKit bridge kicked out by Server B +4. Server A's session enters grace period (60s) +5. After 60s, Server A's session disposed โœ… +6. If user switches back after 60s: + - Creates NEW session + - NEW LiveKit bridge + - Everything works โœ… +``` + +### Current Broken Flow: Switch Back Before Grace Period + +``` +1. User switches Server A โ†’ Server B +2. Server A's LiveKit bridge kicked out +3. User switches back to Server A (before 60s) +4. Reconnects to EXISTING session +5. LiveKit NOT reinitialized (reconnection flag) +6. Session has broken LiveKit bridge +7. No mechanism to detect or fix +8. Everything broken โŒ +``` + +--- + +## The Multiple Issues at Play + +### Issue 1: Grace Period Cleanup Disabled + +```typescript +const GRACE_PERIOD_CLEANUP_ENABLED = false; // โ† Sessions never expire +``` + +**Impact:** Zombie sessions stay alive forever, not just 60 seconds. + +### Issue 2: Same Identity Across Servers + +```typescript +identity: `cloud-agent:${this.session.userId}`; // โ† Duplicate identity! +``` + +**Impact:** New server kicks old server out of LiveKit room. + +### Issue 3: No Reconnection Logic in Go Bridge + +```go +OnDisconnected: func() { + log.Printf("Disconnected...") + // โ† No reconnection attempt! +} +``` + +**Impact:** Once kicked out, bridge never rejoins. + +### Issue 4: No Bridge Health Monitoring + +**Impact:** TypeScript doesn't know bridge is disconnected. + +### Issue 5: Reconnection Doesn't Reinitialize LiveKit + +```typescript +if (!reconnection) { + // โ† Skipped on reconnection + // Start LiveKit +} +``` + +**Impact:** Reconnecting to session with dead bridge doesn't fix it. + +### Issue 6: Token Expiration (10 minutes) + +```typescript +// Token TTL in mintAgentBridgeToken() +ttl: "600000m"; // 10 minutes +``` + +**Impact:** + +- Token expires after 10 minutes +- Laptop sleep > 10 minutes = expired token on wake +- Bridge resume fails with "token is expired" +- No mechanism to request new token +- No automatic reconnection + +### Issue 7: No Token Refresh on Resume + +**Impact:** + +- LiveKit SDK tries to resume with old token +- If token expired (sleep > 10 min), resume fails +- Bridge left in broken state permanently + +--- + +## Potential Solutions + +### Solution 1: Unique Identity Per Server โœ… + +```typescript +// Include server identifier in identity +identity: `cloud-agent:${serverName}:${this.session.userId}`; +// Example: cloud-agent:cloud-prod:user@example.com +``` + +**Pros:** + +- Each server has unique identity +- No more kicking out other servers +- Multiple servers can coexist in same room (though pointless) + +**Cons:** + +- Still have zombie sessions +- Multiple bridges in same room is wasteful + +### Solution 2: Bridge Reconnection Logic โœ… + +```go +// In Go bridge +OnDisconnected: func() { + log.Printf("Disconnected, attempting to rejoin...") + + // Request new token from TypeScript + newToken := requestNewToken(req.UserId) + + // Reconnect to room + err := room.Reconnect(newToken) + if err != nil { + log.Printf("Failed to reconnect: %v", err) + } +} +``` + +**Pros:** + +- Bridge automatically rejoins if kicked out +- Handles network disconnects too + +**Cons:** + +- Need token refresh mechanism +- Need gRPC method for token refresh + +### Solution 3: Token Refresh on Resume Failure โœ… + +```go +// In Go bridge - detect resume failure +OnDisconnected: func() { + log.Printf("Disconnected from LiveKit room") +} + +// Add new callback for connection errors +OnConnectionQualityChanged: func(quality lksdk.ConnectionQuality) { + if quality == lksdk.ConnectionQualityLost { + // Request new token from TypeScript + newToken := requestNewTokenViaGRPC(req.UserId) + + // Rejoin room with fresh token + err := rejoinRoom(newToken) + } +} +``` + +**Or simpler approach in TypeScript:** + +```typescript +// Detect bridge health issues and refresh +setInterval(async () => { + const bridgeHealthy = await grpcBridge.healthCheck(); + + if (!bridgeHealthy) { + logger.warn("Bridge unhealthy, reinitializing..."); + // This creates new bridge with fresh token + await liveKitManager.handleLiveKitInit(); + } +}, 30000); // Every 30 seconds +``` + +**Pros:** + +- Handles token expiration automatically +- Works for both sleep/wake and kicked scenarios +- Simple to implement on TypeScript side + +**Cons:** + +- Slight delay before detection/recovery +- Need to dispose old bridge first + +### Solution 4: Always Reinitialize LiveKit on Reconnection โœ… + +```typescript +// Always check/reinitialize LiveKit on reconnection +const shouldInitLiveKit = livekitRequested || userSession.livekitRequested; + +if (shouldInitLiveKit) { + // Check if bridge is still connected + const bridgeConnected = await userSession.liveKitManager.checkBridgeHealth(); + + if (!bridgeConnected) { + // Reinitialize bridge + await userSession.liveKitManager.handleLiveKitInit(); + } +} +``` + +**Pros:** + +- Fixes broken bridges on reconnection +- User can switch back and forth without issues + +**Cons:** + +- Might restart bridge unnecessarily + +### Solution 5: Enable Grace Period Cleanup โœ… (Already Done) + +```typescript +const GRACE_PERIOD_CLEANUP_ENABLED = true; +``` + +**Pros:** + +- Zombie sessions die after 60 seconds +- Prevents most multi-server conflicts +- Forces new session on return (with fresh bridge) + +**Cons:** + +- None (this is the correct behavior) + +### Solution 6: Bridge Health Monitoring โœ… + +```typescript +// Periodically check bridge health +setInterval(async () => { + const isConnected = await grpcBridge.healthCheck(); + if (!isConnected) { + logger.warn("Bridge disconnected, reconnecting..."); + await liveKitManager.handleLiveKitInit(); + } +}, 30000); // Check every 30 seconds +``` + +**Pros:** + +- Detects broken bridges automatically +- Can recover without user intervention + +**Cons:** + +- Additional overhead +- Might be unnecessary with other fixes + +--- + +## Recommended Fix Strategy + +### Immediate (Critical): + +1. โœ… **Enable grace period cleanup** (already done) + - Sessions expire after 60 seconds + - Prevents most multi-server conflicts + +### Short-term (Important): + +2. **Add token refresh / bridge health monitoring** + - Detect when bridge is disconnected + - Handle resume failures (expired tokens) + - Reinitialize bridge automatically + +3. **Always reinitialize LiveKit on reconnection** + - Check if `livekitRequested` was previously true + - Reinitialize bridge even on reconnections + - Fixes broken bridges when switching back + +### Short-term (Important): + +4. **Add bridge health check** + - Detect when bridge is disconnected + - Log warnings for visibility + - Eventually trigger reconnection + +### Long-term (Nice to have): + +5. **Bridge auto-reconnection logic** + - Add reconnection logic to Go bridge + - Add token refresh gRPC method + - Handle network disconnects gracefully + +6. **Unique identity per server** (optional) + - Prevents kicking out other servers + - Allows multiple servers in grace period + - May not be necessary with other fixes + +--- + +## Testing Plan + +### Test 1: Switch Server A โ†’ Server B โ†’ Wait 70s โ†’ Switch Back to A + +**Expected:** New session, new bridge, everything works โœ… + +### Test 2: Switch Server A โ†’ Server B โ†’ Switch Back (within 60s) + +**Current:** Broken bridge, apps disconnect repeatedly โŒ +**After Fix:** Bridge reinitialized, everything works โœ… + +### Test 3: Laptop Sleep > 10 Minutes โ†’ Wake Up + +**Current:** + +- Token expired during sleep +- Resume fails: "token is expired" +- Bridge permanently broken โŒ + +**After Fix:** + +- Bridge health check detects failure +- Reinitializes with fresh token +- Everything works โœ… + +### Test 4: Stay on One Server, Network Hiccup + +**Expected:** Grace period, reconnect, resume โœ… +**Should still work with fixes** โœ… + +### Test 5: Check Better Stack Logs + +**Search for:** + +```sql +service:livekit-bridge AND message:"Disconnected from LiveKit room" +``` + +**Expected:** Should see kicks when servers conflict + +--- + +## Known Scenarios That Break LiveKit + +### 1. Server Switch Within Grace Period + +- User switches Server A โ†’ B โ†’ A (within 60s) +- Bridge kicked out, not reinitialized +- **Fix:** Always reinitialize on reconnection + +### 2. Laptop Sleep > 10 Minutes + +- Token expires during sleep (TTL: 10 min) +- Resume fails with expired token +- **Fix:** Bridge health monitoring + auto-reinitialize + +### 3. Network Disconnect > 10 Minutes + +- Same as laptop sleep +- Token expires during long disconnect +- **Fix:** Bridge health monitoring + auto-reinitialize + +### 4. Multiple Servers Same User + +- Identity conflict causes kicks +- **Fix:** Grace period cleanup (done) + reinitialize on reconnection + +--- + +## Open Questions + +1. **Does LiveKit SDK have built-in reconnection?** + - Need to check livekit-server-sdk-go documentation + - May already handle some reconnections + +2. **What happens to audio packets during disconnect?** + - Are they buffered? + - Are they dropped? + - How long until client notices? + +3. **Can we detect the "kicked out" vs "network disconnect" scenario?** + - Different error codes? + - Different events? + - Important for deciding whether to reconnect + +4. **Should we prevent multiple servers from same user entirely?** + - Block new connection if session exists elsewhere? + - Or just handle gracefully? + +--- + +## Related Files + +- `LiveKitManager.ts:182` - Identity generation +- `service.go:125` - OnDisconnected callback (no reconnection) +- `websocket-glasses.service.ts:564` - handleConnectionInit (reconnection logic) +- `LiveKitGrpcClient.ts` - gRPC bridge client (no health checks) + +--- + +## Conclusion + +The theory explains all observed symptoms: + +- โœ… Memory leaks (sessions never expire) +- โœ… Apps disconnecting repeatedly (broken bridge) +- โœ… Local server still getting audio (bridge kicked out by prod) +- โœ… Pattern happens after server switches +- โœ… "resume connection failed" with expired token after laptop sleep + +**All scenarios result in:** Broken LiveKit bridge with no automatic recovery mechanism. + +The fix is multi-layered: + +1. โœ… Enable grace period cleanup (done) +2. ๐Ÿ”ด Add token refresh / bridge health monitoring (critical for laptop sleep) +3. ๐Ÿ”ด Reinitialize LiveKit on reconnection (critical for server switches) +4. ๐ŸŸก Add bridge reconnection logic (future enhancement) + +**Primary fixes needed:** + +1. Bridge health monitoring that detects disconnections and reinitializes +2. Always reinitialize LiveKit on reconnection when it was previously enabled diff --git a/cloud/issues/livekit-ios-bug/README.md b/cloud/issues/livekit-ios-bug/README.md new file mode 100644 index 0000000000..814f7898f3 --- /dev/null +++ b/cloud/issues/livekit-ios-bug/README.md @@ -0,0 +1,589 @@ +# LiveKit iOS Region Switching Bug + +**Status:** ๐Ÿ” Investigation +**Priority:** High +**Affects:** iOS mobile clients using LiveKit audio transport +**Date Created:** 2025-10-17 + +--- + +## ๐Ÿš€ Quick Start: Better Stack Logging Setup + +**NEW:** We've added Better Stack HTTP logging to capture Go bridge logs! This is critical for debugging the token expiration issue. + +### 5-Minute Setup + +1. **Create Better Stack HTTP Source** at https://telemetry.betterstack.com/ + - Platform: HTTP + - Name: "LiveKit gRPC Bridge" + - Save the token and ingesting host + +2. **Test the connection:** + + ```bash + cd cloud/packages/cloud-livekit-bridge + export BETTERSTACK_SOURCE_TOKEN="your_token" + export BETTERSTACK_INGESTING_HOST="your_host" + ./test-betterstack.sh + ``` + +3. **Add to `.env` and update `docker-compose.dev.yml`** + +4. **Update Go code** - See [QUICK-START.md](../../packages/cloud-livekit-bridge/QUICK-START.md) + +### ๐Ÿ“š Documentation + +- **Quick Start**: [cloud/packages/cloud-livekit-bridge/QUICK-START.md](../../packages/cloud-livekit-bridge/QUICK-START.md) +- **Full Setup**: [cloud/packages/cloud-livekit-bridge/BETTERSTACK_SETUP.md](../../packages/cloud-livekit-bridge/BETTERSTACK_SETUP.md) +- **Token Analysis**: [TOKEN-EXPIRATION-ANALYSIS.md](./TOKEN-EXPIRATION-ANALYSIS.md) + +### Why This Matters + +The token expiration errors we're seeing in the logs: + +``` +"error"="invalid token: ..., error: token is expired (exp)" +``` + +Are now searchable in Better Stack: + +``` +service:livekit-bridge AND error:*token is expired* +service:livekit-bridge AND user_id:"isaiah@mentra.glass" +``` + +--- + +## Problem Description + +When an iOS mobile client switches between cloud regions (e.g., `cloud-debug` in `centralus` โ†’ `cloud-livekit` or `france`), LiveKit audio completely breaks. The initial connection works fine, but switching regions causes the LiveKit connection to fail. + +**Key observation:** Each region is a separate cloud instance with its own UserSession. Regions don't communicate with each other. + +--- + +## Flow Analysis + +### 1. How Mobile Gets LiveKit URL + +**File:** `mobile/src/managers/SocketComms.ts` + +```typescript +private handle_connection_ack(msg: any) { + console.log("SocketCommsTS: connection ack, connecting to livekit") + livekitManager.connect() + GlobalEventEmitter.emit("APP_STATE_CHANGE", msg) +} +``` + +**File:** `mobile/src/managers/LivekitManager.ts` + +```typescript +public async connect() { + try { + const {url, token} = await restComms.getLivekitUrlAndToken() + console.log(`LivekitManager: Connecting to room: ${url}, ${token}`) + this.room = new Room() + await this.room.connect(url, token) + // ... + } +} +``` + +**File:** `mobile/src/managers/RestComms.tsx` + +```typescript +public async getLivekitUrlAndToken(): Promise<{url: string; token: string}> { + const response = await this.authenticatedRequest("GET", "/api/client/livekit/token") + const {url, token} = response.data + return {url, token} +} +``` + +**Steps:** + +1. Mobile connects to cloud WebSocket (glasses-ws) +2. Cloud sends `CONNECTION_ACK` message +3. Mobile's `SocketComms` receives `CONNECTION_ACK`, calls `livekitManager.connect()` +4. `LivekitManager.connect()` calls `restComms.getLivekitUrlAndToken()` +5. REST API call to `/api/client/livekit/token` on the cloud server +6. Cloud returns `{url, token}` from its environment variables +7. Mobile connects to LiveKit using that URL and token + +### 2. How Cloud Sends LiveKit Info + +**File:** `cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts` + +```typescript +private async handleConnectionInit( + userSession: UserSession, + reconnection: boolean, + livekitRequested = false, +): Promise { + // ... start apps ... + + const ackMessage: ConnectionAck = { + type: CloudToGlassesMessageType.CONNECTION_ACK, + sessionId: userSession.sessionId, + timestamp: new Date(), + }; + + if (livekitRequested) { + try { + const livekitInfo = await userSession.liveKitManager.handleLiveKitInit(); + if (livekitInfo) { + (ackMessage as any).livekit = { + url: livekitInfo.url, + roomName: livekitInfo.roomName, + token: livekitInfo.token, + }; + } + } catch (error) { + // ... + } + } + + userSession.websocket.send(JSON.stringify(ackMessage)); +} +``` + +**File:** `cloud/packages/cloud/src/services/session/livekit/LiveKitManager.ts` + +```typescript +constructor(session: UserSession) { + this.apiKey = process.env.LIVEKIT_API_KEY || ""; + this.apiSecret = process.env.LIVEKIT_API_SECRET || ""; + this.livekitUrl = process.env.LIVEKIT_URL || ""; + // ... +} + +async handleLiveKitInit(): Promise<{ + url: string; + roomName: string; + token: string; +} | null> { + const url = this.getUrl(); // Returns process.env.LIVEKIT_URL + const roomName = this.getRoomName(); // Returns userId + const token = await this.mintClientPublishToken(); + + if (!url || !roomName || !token) { + return null; + } + + await this.startBridgeSubscriber({ url, roomName }); + return { url, roomName, token }; +} +``` + +**Environment Variables (per region):** + +- `LIVEKIT_URL` - LiveKit WebSocket URL +- `LIVEKIT_API_KEY` - LiveKit API key +- `LIVEKIT_API_SECRET` - LiveKit API secret + +### 3. How gRPC Bridge Connects + +**File:** `cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts` + +```typescript +constructor(userSession: UserSession, bridgeUrl?: string) { + const socketPath = process.env.LIVEKIT_GRPC_SOCKET; + if (socketPath) { + this.bridgeUrl = `unix:${socketPath}`; + } else { + this.bridgeUrl = + bridgeUrl || + process.env.LIVEKIT_GRPC_BRIDGE_URL || + "livekit-bridge:9090"; + } +} +``` + +**Environment Variables:** + +- `LIVEKIT_GRPC_BRIDGE_URL` - URL to Go gRPC bridge (e.g., `livekit-bridge:9090`) +- `LIVEKIT_GRPC_SOCKET` - Unix socket path (alternative to TCP) + +--- + +## Hypotheses + +### Hypothesis 1: Mobile Doesn't Disconnect from Old LiveKit Session + +**Problem:** When switching regions, mobile might: + +- Not disconnect from old LiveKit room +- Keep sending audio to old region's LiveKit +- Try to connect to new region's LiveKit while still connected to old + +**Evidence Needed:** + +- Mobile logs showing LiveKit disconnect/reconnect +- Check if `livekitManager.disconnect()` is called before `connect()` + +**Files to Check:** + +- `mobile/src/managers/LivekitManager.ts` - Does it have disconnect logic? +- `mobile/src/managers/WebSocketManager.ts` - Does it clean up LiveKit on region switch? + +### Hypothesis 2: Different Regions Point to Different LiveKit Instances + +**Problem:** Each region might have: + +- Different `LIVEKIT_URL` environment variable +- Pointing to different LiveKit cloud instances +- But using the same `roomName` (userId) + +**Result:** User tries to join room on different LiveKit instance, but mobile is still connected to old instance. + +**Evidence Needed:** + +- Check `LIVEKIT_URL` for each region: + - `cloud-debug` (centralus) + - `cloud-livekit` (centralus) + - `france` region +- Are they the same LiveKit instance or different? + +**How to Check:** + +```bash +# Get env vars for each deployment +kubectl exec -n default cloud-debug-cloud-XXX -- env | grep LIVEKIT +kubectl exec -n default cloud-livekit-cloud-XXX -- env | grep LIVEKIT +``` + +### Hypothesis 3: Token Mismatch or Expired Token + +**Problem:** Mobile gets token from old region, tries to use it with new region's LiveKit info. + +**Evidence Needed:** + +- Check if token is region-specific +- Check token TTL (currently 300 seconds / 5 minutes) +- Mobile logs showing token errors + +### Hypothesis 4: gRPC Bridge Conflict + +**Problem:** Both regions' gRPC bridges might be trying to: + +- Join the same LiveKit room (roomName = userId) +- Publish/subscribe to same audio streams +- Causing conflicts + +**Evidence Needed:** + +- Check if old region's gRPC bridge properly disconnects +- Check Go bridge logs for conflicts +- Check if `UserSession.dispose()` properly cleans up LiveKit + +**Files:** + +- `cloud/packages/cloud/src/services/session/UserSession.ts` (line 596-687: dispose method) +- `cloud/packages/cloud/src/services/session/livekit/LiveKitManager.ts` (line 312-323: dispose method) +- `cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts` (line 765-775: dispose/disconnect) + +### Hypothesis 5: REST API Caching or Wrong URL + +**Problem:** Mobile might be: + +- Caching old region's REST API URL +- Calling `/api/client/livekit/token` on old region instead of new region +- Getting stale LiveKit URL + +**Evidence Needed:** + +- Check if `RestComms` base URL updates when switching regions +- Check if mobile properly updates API endpoint on region switch + +**Files:** + +- `mobile/src/managers/RestComms.tsx` - Check if `baseUrl` updates on region change + +--- + +## Required Information to Debug + +### From Mobile (iOS) + +1. **Logs during region switch:** + + ``` + - "LivekitManager: Connecting to room: ${url}, ${token}" + - "LivekitManager: Connected to room" + - "LivekitManager: Disconnected from room" + ``` + +2. **Check if disconnect is called:** + - Does `LivekitManager` have a `disconnect()` method? + - Is it called before switching regions? + +3. **API endpoint used:** + - What URL is `restComms.getLivekitUrlAndToken()` calling? + - Is it the old region or new region? + +### From Cloud (Both Regions) + +1. **Environment variables:** + + ```bash + # For each region + echo $LIVEKIT_URL + echo $LIVEKIT_API_KEY + echo $LIVEKIT_GRPC_BRIDGE_URL + ``` + +2. **Logs from old region (when user switches away):** + - Does UserSession dispose? + - Does LiveKitManager dispose? + - Does gRPC bridge disconnect from LiveKit room? + +3. **Logs from new region (when user switches to it):** + - Does CONNECTION_ACK include LiveKit info? + - Does gRPC bridge successfully join LiveKit room? + +### From Go Bridge + +**๐Ÿšจ CRITICAL FINDING: Go bridge logs are NOT being captured!** + +1. **Problem discovered:** + - The `start.sh` script runs Go bridge in background: `./livekit-bridge &` + - stdout/stderr from Go process is NOT being redirected + - `kubectl logs` only shows TypeScript/Bun logs + - Go bridge errors are invisible! + +2. **Evidence:** + - File: `/app/start.sh` in pod + - Line 22: `./livekit-bridge &` (no output redirection) + - Line 51: `cd packages/cloud && PORT=80 bun run start &` + - Only Bun process logs are captured + +3. **How to see bridge startup (container level):** + + ```bash + # The start.sh shows these echo statements at startup: + # "๐Ÿš€ Starting Go LiveKit gRPC bridge on Unix socket: /tmp/livekit-bridge.sock" + # "โœ… Unix socket created successfully" + # "โ˜๏ธ Starting Bun cloud service on :80..." + # BUT the actual Go bridge logs after startup are lost! + ``` + +4. **This means:** + - We can't see gRPC errors + - We can't see LiveKit room join/leave events + - We can't see connection conflicts + - We can't debug the actual bug without fixing logging first! + +5. **TypeScript side shows bridge is "connected":** + ``` + "Bridge health" logs every 10 seconds showing: + - isConnected: true + - userId: "isaiah@mentra.glass" + ``` + But we don't know what the Go bridge is actually doing! + +--- + +## Reproduction Steps + +1. Start with iOS client connected to `cloud-debug` (centralus) with LiveKit enabled +2. Verify audio is working (microphone on, transcription working) +3. Switch regions in mobile app settings +4. Connect to different region (`cloud-livekit` or `france`) +5. **Expected:** LiveKit audio continues working +6. **Actual:** LiveKit audio breaks + +--- + +## Files Involved + +### Cloud (TypeScript) + +1. **WebSocket Connection:** + - `cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts` + - Line 571-650: `handleConnectionInit()` - sends CONNECTION_ACK with LiveKit info + +2. **LiveKit Manager:** + - `cloud/packages/cloud/src/services/session/livekit/LiveKitManager.ts` + - Line 20-45: constructor - loads env vars + - Line 90-120: `handleLiveKitInit()` - returns LiveKit connection info + - Line 312-323: `dispose()` - cleanup + +3. **gRPC Bridge Client:** + - `cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts` + - Line 55-80: constructor - loads bridge URL + - Line 200-350: `connect()` and `JoinRoom()` + - Line 712-763: `disconnect()` - leaves LiveKit room + - Line 765-775: `dispose()` - cleanup + +4. **UserSession:** + - `cloud/packages/cloud/src/services/session/UserSession.ts` + - Line 596-687: `dispose()` - calls `liveKitManager.dispose()` + +5. **REST API:** + - `cloud/packages/cloud/src/api/client/livekit.api.ts` (likely location) + - Endpoint: `GET /api/client/livekit/token` + +### Mobile (TypeScript/React Native) + +1. **LiveKit Manager:** + - `mobile/src/managers/LivekitManager.ts` + - Line 35-48: `connect()` - connects to LiveKit + - Need to check: `disconnect()` method + +2. **Socket Communications:** + - `mobile/src/managers/SocketComms.ts` + - Line 305-311: `handle_connection_ack()` - triggers LiveKit connect + +3. **REST Communications:** + - `mobile/src/managers/RestComms.tsx` + - Line 291-295: `getLivekitUrlAndToken()` - fetches from cloud + +4. **WebSocket Manager:** + - `mobile/src/managers/WebSocketManager.ts` + - Need to check: cleanup on region switch + +### Go Bridge + +1. **gRPC Bridge Service:** + - `cloud/packages/cloud-livekit-bridge/service.go` + - `cloud/packages/cloud-livekit-bridge/session.go` + - Need to check: room join/leave logic + +--- + +## Next Steps + +1. **Add mobile logging:** + - Log when `livekitManager.connect()` is called + - Log when `livekitManager.disconnect()` is called (if exists) + - Log the URL and token being used + - Log the REST API URL being called + +2. **Add cloud logging:** + - Log `LIVEKIT_URL` on each region when `handleLiveKitInit()` is called + - Log when `LiveKitManager.dispose()` is called + - Log when `LiveKitGrpcClient.disconnect()` is called + +3. **Check environment variables:** + - Verify `LIVEKIT_URL` for each region + - Verify they point to the same or different LiveKit instances + +4. **Check Go bridge logs:** + - Find bridge container/sidecar + - Check for room join/leave events + - Check for conflicts or errors + +5. **Test with explicit disconnect:** + - Add `livekitManager.disconnect()` in mobile before connecting to new region + - See if that fixes the issue + +--- + +## Potential Fixes + +### Fix 1: Ensure Mobile Disconnects Before Reconnecting + +```typescript +// mobile/src/managers/LivekitManager.ts +public async connect() { + // Disconnect from old session first + if (this.room && this.room.state === ConnectionState.Connected) { + console.log("LivekitManager: Disconnecting from old room before reconnecting") + await this.room.disconnect() + } + + // Then connect to new room + const {url, token} = await restComms.getLivekitUrlAndToken() + // ... +} +``` + +### Fix 2: Clean Up Old Region's LiveKit Connection Immediately + +```typescript +// cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts +private handleGlassesConnectionClose(...) { + // BEFORE entering grace period, immediately clean up LiveKit + if (userSession.liveKitManager) { + logger.info("Immediately disposing LiveKit due to glasses disconnect") + userSession.liveKitManager.dispose() + } + + // Then handle grace period... +} +``` + +### Fix 3: Use Region-Specific Room Names + +```typescript +// cloud/packages/cloud/src/services/session/livekit/LiveKitManager.ts +getRoomName(): string { + const region = process.env.REGION || "unknown" + return `${this.session.userId}-${region}` // e.g., "user@email.com-centralus" +} +``` + +This would prevent room conflicts between regions, but might be overkill. + +--- + +## Testing Plan + +1. Connect to region A with LiveKit +2. Verify audio works +3. Add logging to mobile and cloud +4. Switch to region B +5. Capture logs from: + - Mobile console + - Region A cloud logs + - Region B cloud logs + - Go bridge logs (both regions) +6. Analyze logs to identify exact failure point +7. Implement fix +8. Repeat test to verify fix + +--- + +## โœ… RESOLVED: Go Bridge Logging + +**UPDATE:** We've implemented Better Stack HTTP logging for the Go bridge! + +Instead of fixing `start.sh` to redirect stdout/stderr, we've implemented a proper logging solution: + +### What We Added + +1. **Better Stack HTTP Logger** (`cloud/packages/cloud-livekit-bridge/logger/betterstack.go`) + - Sends logs directly to Better Stack via HTTP + - Batch processing for efficiency + - Structured logging with user_id, session_id, room_name + - Auto-flush every 5 seconds + +2. **Integration in Go Code** + - Logger initialized in `main.go` + - Used throughout `service.go` for all events + - Captures errors, warnings, and info logs + +3. **Test Script** (`test-betterstack.sh`) + - Verify logging works before deployment + - Tests single logs, batches, and complex log entries + +### Benefits Over stdout/stderr Redirection + +โœ… **Structured logs** - JSON with fields for filtering +โœ… **Searchable** - Query by user_id, session_id, error type +โœ… **Centralized** - Logs from all regions in one place +โœ… **Reliable** - No dependency on kubectl/container logs +โœ… **Real-time** - Immediate visibility in Better Stack + +### Setup Instructions + +See [QUICK-START.md](../../packages/cloud-livekit-bridge/QUICK-START.md) for 5-minute setup guide. + +--- + +## Related Files + +- Design docs: `cloud/issues/livekit-grpc/` +- LiveKit gRPC implementation: `cloud/packages/cloud-livekit-bridge/` +- Mobile LiveKit integration: `mobile/src/managers/LivekitManager.ts` +- **Startup script with logging issue:** `cloud/start.sh` (line 22) diff --git a/cloud/issues/livekit-ios-bug/TOKEN-EXPIRATION-ANALYSIS.md b/cloud/issues/livekit-ios-bug/TOKEN-EXPIRATION-ANALYSIS.md new file mode 100644 index 0000000000..48e2410fa1 --- /dev/null +++ b/cloud/issues/livekit-ios-bug/TOKEN-EXPIRATION-ANALYSIS.md @@ -0,0 +1,273 @@ +# LiveKit Token Expiration Analysis + +## Problem Summary + +The LiveKit bridge is showing token expiration errors when attempting to reconnect: + +``` +livekit-bridge-1 | 2025/10/17 22:32:27 "msg"="error establishing signal connection" +"error"="websocket: bad handshake" "duration"="145.39925ms" "status"=401 +"response"="invalid token: eyJhbGci..., error: go-jose/go-jose/jwt: validation failed, +token is expired (exp)" +``` + +## Token Analysis + +### Decoded Token Claims + +```json +{ + "exp": 1760739548, + "identity": "cloud-agent:isaiah@mentra.glass", + "iss": "APIHxBuhqxPzR66", + "kind": "standard", + "nbf": 1760738948, + "sub": "cloud-agent:isaiah@mentra.glass", + "video": { + "canPublish": true, + "canPublishData": true, + "canSubscribe": true, + "room": "isaiah@mentra.glass", + "roomJoin": true + } +} +``` + +### Token Lifetime + +- **NBF (Not Before)**: 1760738948 (October 17, 2025, 22:22:28 UTC) +- **EXP (Expiration)**: 1760739548 (October 17, 2025, 22:32:28 UTC) +- **Lifetime**: 600 seconds (10 minutes) + +### Connection Attempt + +- **Attempt Time**: 2025/10/17 22:32:27 +- **Token Expiration**: October 17, 2025, 22:32:28 UTC +- **Issue**: Token was about to expire (1 second remaining) + +## Root Causes + +### 1. Token Reuse After Expiration + +The bridge is trying to reconnect with an expired token. This suggests: + +- Token was generated 10 minutes ago +- Connection was lost or interrupted +- Bridge is attempting automatic reconnection with the old token +- No token refresh mechanism exists + +### 2. Reconnection Logic Issue + +The log shows: + +``` +"level"=0 "msg"="resuming connection..." "reconnectCount"=4 +``` + +This indicates: + +- 4 reconnection attempts have been made +- Each attempt is using the same expired token +- No token refresh happens between reconnection attempts + +### 3. Region Switch Scenario + +When switching regions (e.g., centralus โ†’ france): + +1. **Old region connection**: May still be active with old token +2. **New region connection**: Attempts to establish with new credentials +3. **Old connection cleanup**: May not be disposing properly +4. **Token conflict**: Old token may be reused for new connection + +## Why This Happens + +### Token Generation Flow + +1. **Client requests LiveKit access** + - Cloud generates token with 10-minute expiration + - Token includes room name, user identity, permissions + +2. **Token sent to bridge** + - Bridge receives token via gRPC + - Bridge connects to LiveKit room using token + +3. **Connection interruption** + - Network hiccup, region switch, or session change + - Bridge attempts automatic reconnection + +4. **Reconnection with stale token** + - Bridge reuses the original token + - Token has expired (10 minutes passed) + - LiveKit rejects with 401 Unauthorized + +### LiveKit SDK Behavior + +The LiveKit Go SDK has built-in reconnection logic: + +- Automatically attempts to reconnect on connection loss +- Default reconnection attempts: multiple retries with backoff +- **Does NOT automatically refresh tokens** +- Expects application to handle token refresh + +## Solutions + +### Solution 1: Token Refresh Callback + +Implement token refresh in the bridge service: + +```go +// In service.go +func (s *LiveKitBridgeService) JoinRoom(req *pb.JoinRoomRequest, stream pb.LiveKitBridge_JoinRoomServer) error { + // Create room with token refresh callback + roomCallback := &lksdk.RoomCallback{ + OnReconnecting: func() { + log.Println("Reconnecting to room...") + }, + OnReconnected: func() { + log.Println("Successfully reconnected to room") + }, + OnDisconnected: func() { + log.Println("Disconnected from room") + }, + } + + // Set up token refresh + room.OnTokenRefresh = func() (string, error) { + // Request new token from cloud service + newToken, err := s.requestNewToken(req.UserId, req.RoomName, req.SessionId) + if err != nil { + return "", fmt.Errorf("failed to refresh token: %w", err) + } + log.Printf("Token refreshed for user %s", req.UserId) + return newToken, nil + } + + room, err := lksdk.ConnectToRoom(req.LivekitUrl, lksdk.ConnectInfo{ + APIKey: req.ApiKey, + APISecret: req.ApiSecret, + RoomName: req.RoomName, + ParticipantIdentity: participantIdentity, + }, roomCallback) +} +``` + +### Solution 2: Proactive Token Refresh + +Refresh tokens before they expire: + +```go +// Start token refresh goroutine +go func() { + ticker := time.NewTicker(8 * time.Minute) // Refresh before 10-min expiry + defer ticker.Stop() + + for { + select { + case <-ticker.C: + newToken, err := s.requestNewToken(req.UserId, req.RoomName, req.SessionId) + if err != nil { + log.Printf("Failed to proactively refresh token: %v", err) + continue + } + // Update room connection with new token + room.UpdateToken(newToken) + log.Printf("Proactively refreshed token for user %s", req.UserId) + case <-ctx.Done(): + return + } + } +}() +``` + +### Solution 3: Increase Token Lifetime + +Temporary workaround - increase token expiration time: + +```typescript +// In cloud service (TypeScript) +const token = new AccessToken(apiKey, apiSecret, { + identity: participantIdentity, + ttl: "1h", // Increase from 10 minutes to 1 hour +}); +``` + +**Trade-off**: Less secure, but reduces frequency of expiration issues. + +### Solution 4: Graceful Disconnect on Region Switch + +Ensure old connections are properly closed: + +```go +// When region switch detected +func (s *LiveKitBridgeService) onRegionSwitch(userId, sessionId string) { + s.roomsMu.Lock() + defer s.roomsMu.Unlock() + + // Find and close old connection + for key, conn := range s.rooms { + if conn.userId == userId && conn.sessionId != sessionId { + log.Printf("Closing old connection for user %s (old session: %s)", userId, conn.sessionId) + conn.room.Disconnect() + delete(s.rooms, key) + } + } +} +``` + +## Recommended Implementation Order + +1. **Immediate (Quick Fix)** + - โœ… Set up Better Stack logging (see BETTERSTACK_SETUP.md) + - โœ… Increase token TTL to 1 hour temporarily + - โœ… Add explicit disconnect on region switch + +2. **Short-term (Proper Fix)** + - Implement token refresh callback in bridge + - Add token refresh gRPC method to bridge proto + - Cloud service provides token refresh endpoint + - Bridge requests new token when needed + +3. **Long-term (Robust Solution)** + - Implement proactive token refresh + - Add token expiration monitoring + - Set up alerts for token refresh failures + - Add metrics for connection lifetime vs token lifetime + +## Testing Plan + +### Test 1: Token Expiration + +1. Set token TTL to 30 seconds +2. Connect to LiveKit +3. Wait 35 seconds +4. Verify connection stays alive (token was refreshed) + +### Test 2: Region Switch + +1. Connect to cloud-debug (centralus) +2. Enable microphone/audio +3. Switch to cloud-livekit (france) +4. Verify: + - Old connection closes cleanly + - New connection establishes successfully + - No token expiration errors + +### Test 3: Network Interruption + +1. Connect to LiveKit +2. Simulate network drop (disable WiFi briefly) +3. Restore network +4. Verify automatic reconnection with token refresh + +## Related Files + +- `cloud/packages/cloud-livekit-bridge/service.go` - Bridge service implementation +- `cloud/packages/cloud-livekit-bridge/proto/livekit_bridge.proto` - gRPC definitions +- `cloud/packages/cloud/src/services/livekit.service.ts` - Token generation +- `cloud/packages/cloud/src/services/websocket.service.ts` - Session management + +## References + +- [LiveKit Token Authentication](https://docs.livekit.io/realtime/concepts/authentication/) +- [LiveKit Go SDK - Room](https://pkg.go.dev/github.com/livekit/server-sdk-go/v2@v2.10.0#Room) +- [JWT Token Expiration Best Practices](https://auth0.com/docs/secure/tokens/json-web-tokens/json-web-token-claims) diff --git a/cloud/packages/cloud-livekit-bridge/BETTERSTACK_SETUP.md b/cloud/packages/cloud-livekit-bridge/BETTERSTACK_SETUP.md new file mode 100644 index 0000000000..81f76cd27c --- /dev/null +++ b/cloud/packages/cloud-livekit-bridge/BETTERSTACK_SETUP.md @@ -0,0 +1,440 @@ +# Better Stack Logging Setup for LiveKit Bridge + +This guide explains how to set up Better Stack logging for the LiveKit gRPC bridge to capture and search logs from the Go service. + +## Why Better Stack for Go Logs? + +The LiveKit bridge logs are critical for debugging issues like: + +- Token expiration errors +- Region switching problems +- Connection failures +- Room join/leave events + +By sending these logs to Better Stack, you can: + +- Search and filter logs in real-time +- Correlate Go bridge logs with TypeScript cloud logs +- Set up alerts for specific error patterns +- Debug production issues without SSH access + +## Quick Setup + +### Step 1: Create a Better Stack HTTP Source + +1. Go to [Better Stack Telemetry](https://telemetry.betterstack.com/) +2. Navigate to **Sources** โ†’ **Create Source** +3. Choose platform: **HTTP** +4. Set name: **"LiveKit gRPC Bridge"** +5. Choose your data region (e.g., `us_east`, `germany`, etc.) +6. Click **Create Source** + +You'll receive: + +- **Source Token**: `YOUR_SOURCE_TOKEN` +- **Ingesting Host**: `sXXX.region.betterstackdata.com` + +### Step 2: Add Environment Variables + +Update your `.env` file or environment configuration: + +```bash +# Better Stack Configuration +BETTERSTACK_SOURCE_TOKEN=YOUR_SOURCE_TOKEN +BETTERSTACK_INGESTING_HOST=sXXX.region.betterstackdata.com +``` + +### Step 3: Update Docker Compose + +Add the environment variables to your `docker-compose.dev.yml`: + +```yaml +livekit-bridge: + build: + context: ./packages/cloud-livekit-bridge + dockerfile: Dockerfile + environment: + - PORT=9090 + - LOG_LEVEL=debug + - LIVEKIT_URL=${LIVEKIT_URL} + - LIVEKIT_API_KEY=${LIVEKIT_API_KEY} + - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET} + - LIVEKIT_GRPC_SOCKET=/var/run/livekit/bridge.sock + - BETTERSTACK_SOURCE_TOKEN=${BETTERSTACK_SOURCE_TOKEN} + - BETTERSTACK_INGESTING_HOST=${BETTERSTACK_INGESTING_HOST} + volumes: + - livekit_socket:/var/run/livekit + restart: "no" +``` + +### Step 4: Integrate Logger in main.go + +Update `main.go` to use the Better Stack logger: + +```go +package main + +import ( + "log" + "net" + "os" + "os/signal" + "path/filepath" + "syscall" + + "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/logger" + pb "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/proto" + "google.golang.org/grpc" + "google.golang.org/grpc/health" + "google.golang.org/grpc/health/grpc_health_v1" + "google.golang.org/grpc/reflection" +) + +func main() { + // Initialize Better Stack logger + bsLogger := logger.NewFromEnv() + defer bsLogger.Close() + + bsLogger.LogInfo("LiveKit gRPC Bridge starting", map[string]interface{}{ + "version": "1.0.0", + "socket": os.Getenv("LIVEKIT_GRPC_SOCKET"), + "port": os.Getenv("PORT"), + }) + + // Rest of your existing main.go code... + socketPath := os.Getenv("LIVEKIT_GRPC_SOCKET") + port := os.Getenv("PORT") + + var lis net.Listener + var err error + + if socketPath != "" { + // Unix socket mode + socketDir := filepath.Dir(socketPath) + if err := os.MkdirAll(socketDir, 0755); err != nil { + bsLogger.LogError("Failed to create socket directory", err, nil) + log.Fatalf("Failed to create socket directory: %v", err) + } + + os.Remove(socketPath) + lis, err = net.Listen("unix", socketPath) + if err != nil { + bsLogger.LogError("Failed to listen on Unix socket", err, map[string]interface{}{ + "socket_path": socketPath, + }) + log.Fatalf("Failed to listen on Unix socket: %v", err) + } + + if err := os.Chmod(socketPath, 0777); err != nil { + bsLogger.LogError("Failed to set socket permissions", err, nil) + log.Fatalf("Failed to set socket permissions: %v", err) + } + + bsLogger.LogInfo("Server listening on Unix socket", map[string]interface{}{ + "socket_path": socketPath, + }) + log.Printf("gRPC server listening on Unix socket: %s", socketPath) + } else { + // TCP mode + if port == "" { + port = "9090" + } + addr := fmt.Sprintf("0.0.0.0:%s", port) + lis, err = net.Listen("tcp", addr) + if err != nil { + bsLogger.LogError("Failed to listen on TCP", err, map[string]interface{}{ + "address": addr, + }) + log.Fatalf("Failed to listen: %v", err) + } + + bsLogger.LogInfo("Server listening on TCP", map[string]interface{}{ + "address": addr, + }) + log.Printf("gRPC server listening on: %s", addr) + } + + // Create gRPC server with logger + grpcServer := grpc.NewServer() + lkService := NewLiveKitBridgeService(bsLogger) + pb.RegisterLiveKitBridgeServer(grpcServer, lkService) + + // Health check + healthServer := health.NewServer() + healthServer.SetServingStatus("", grpc_health_v1.HealthCheckResponse_SERVING) + grpc_health_v1.RegisterHealthServer(grpcServer, healthServer) + + // Reflection + reflection.Register(grpcServer) + + // Handle graceful shutdown + sigCh := make(chan os.Signal, 1) + signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM) + + go func() { + <-sigCh + bsLogger.LogInfo("Received shutdown signal, gracefully stopping", nil) + log.Println("Received shutdown signal, gracefully stopping...") + grpcServer.GracefulStop() + }() + + bsLogger.LogInfo("gRPC server started successfully", nil) + if err := grpcServer.Serve(lis); err != nil { + bsLogger.LogError("Server failed", err, nil) + log.Fatalf("Failed to serve: %v", err) + } +} +``` + +### Step 5: Update Service to Use Logger + +Update `service.go` to accept and use the logger: + +```go +type LiveKitBridgeService struct { + pb.UnimplementedLiveKitBridgeServer + rooms map[string]*RoomConnection + roomsMu sync.RWMutex + bsLogger *logger.BetterStackLogger +} + +func NewLiveKitBridgeService(bsLogger *logger.BetterStackLogger) *LiveKitBridgeService { + return &LiveKitBridgeService{ + rooms: make(map[string]*RoomConnection), + bsLogger: bsLogger, + } +} + +func (s *LiveKitBridgeService) JoinRoom(req *pb.JoinRoomRequest, stream pb.LiveKitBridge_JoinRoomServer) error { + s.bsLogger.LogInfo("JoinRoom request received", map[string]interface{}{ + "user_id": req.UserId, + "session_id": req.SessionId, + "room_name": req.RoomName, + "livekit_url": req.LivekitUrl, + }) + + // Your existing JoinRoom logic... + + // Log errors with context + if err != nil { + s.bsLogger.LogError("Failed to join room", err, map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + "session_id": req.SessionId, + }) + return err + } + + s.bsLogger.LogInfo("Successfully joined room", map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + "session_id": req.SessionId, + }) + + return nil +} +``` + +### Step 6: Restart Services + +```bash +# Stop services +docker-compose -f docker-compose.dev.yml down + +# Rebuild and start +docker-compose -f docker-compose.dev.yml up --build +``` + +## Searching Logs in Better Stack + +### Example Queries + +1. **Find token expiration errors:** + + ``` + service:livekit-bridge AND error:*token is expired* + ``` + +2. **Search by user ID:** + + ``` + service:livekit-bridge AND user_id:"isaiah@mentra.glass" + ``` + +3. **Find room join events:** + + ``` + service:livekit-bridge AND message:*JoinRoom* + ``` + +4. **Filter by log level:** + + ``` + service:livekit-bridge AND level:error + ``` + +5. **Region-specific logs:** + ``` + service:livekit-bridge AND extra.livekit_url:*france* + ``` + +## Production Deployment + +### For Porter/Kubernetes + +Add to your deployment environment variables: + +```yaml +env: + - name: BETTERSTACK_SOURCE_TOKEN + valueFrom: + secretKeyRef: + name: betterstack-secrets + key: source-token + - name: BETTERSTACK_INGESTING_HOST + value: "sXXX.region.betterstackdata.com" +``` + +### For Docker Compose (Production) + +```yaml +livekit-bridge: + image: your-registry/livekit-bridge:latest + environment: + - BETTERSTACK_SOURCE_TOKEN=${BETTERSTACK_SOURCE_TOKEN} + - BETTERSTACK_INGESTING_HOST=${BETTERSTACK_INGESTING_HOST} + env_file: + - .env.production +``` + +## Troubleshooting + +### Logs Not Appearing in Better Stack + +1. **Check environment variables:** + + ```bash + docker-compose exec livekit-bridge env | grep BETTERSTACK + ``` + +2. **Check container logs for Better Stack messages:** + + ```bash + docker-compose logs livekit-bridge | grep BetterStack + ``` + + You should see: + + ``` + [BetterStack] Logger enabled, sending to sXXX.region.betterstackdata.com + ``` + +3. **Test HTTP endpoint manually:** + ```bash + curl -X POST https://YOUR_INGESTING_HOST \ + -H "Authorization: Bearer YOUR_SOURCE_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"message":"Test from LiveKit bridge","level":"info","dt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' + ``` + +### High Log Volume + +Adjust batching settings in `logger/betterstack.go`: + +```go +return NewBetterStackLogger(Config{ + Token: token, + IngestingHost: host, + BatchSize: 50, // Increase batch size + FlushInterval: 10 * time.Second, // Less frequent flushing + Enabled: enabled, +}) +``` + +### Filtering Noisy Logs + +Add log level filtering: + +```go +func (l *BetterStackLogger) shouldLog(level string) bool { + minLevel := os.Getenv("LOG_LEVEL") + if minLevel == "error" && level == "debug" { + return false + } + return true +} +``` + +## Debugging the Region Switch Issue + +Now that you have Better Stack logging, here's how to debug the region switch: + +### 1. Enable Debug Logging + +```bash +LOG_LEVEL=debug +``` + +### 2. Add Region-Specific Logs + +In your service.go: + +```go +func (s *LiveKitBridgeService) JoinRoom(req *pb.JoinRoomRequest, stream pb.LiveKitBridge_JoinRoomServer) error { + // Extract region from LiveKit URL + region := extractRegion(req.LivekitUrl) + + s.bsLogger.LogInfo("JoinRoom request", map[string]interface{}{ + "user_id": req.UserId, + "session_id": req.SessionId, + "room_name": req.RoomName, + "livekit_url": req.LivekitUrl, + "region": region, + "token_preview": req.AccessToken[:20] + "...", + }) + + // When connection fails + if err != nil { + s.bsLogger.LogError("Room connection failed", err, map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + "region": region, + "error_type": getErrorType(err), + }) + } +} +``` + +### 3. Track Region Switches + +```go +func (s *LiveKitBridgeService) onRegionSwitch(userId, oldRegion, newRegion string) { + s.bsLogger.LogWarn("Region switch detected", map[string]interface{}{ + "user_id": userId, + "old_region": oldRegion, + "new_region": newRegion, + "action": "closing_old_connections", + }) +} +``` + +### 4. Search for the Issue + +In Better Stack, search: + +``` +service:livekit-bridge AND user_id:"isaiah@mentra.glass" AND (error:*token* OR message:*region*) +``` + +## Next Steps + +1. **Set up alerts** for token expiration errors +2. **Create dashboards** to visualize region distribution +3. **Monitor connection success rates** per region +4. **Track session lifetimes** across region switches + +## Additional Resources + +- [Better Stack Logs Documentation](https://betterstack.com/docs/logs/) +- [Go JSON Logging Best Practices](https://betterstack.com/community/guides/logging/go/) +- [LiveKit Server SDK Docs](https://docs.livekit.io/server-sdk-go/) diff --git a/cloud/packages/cloud-livekit-bridge/QUICK-START.md b/cloud/packages/cloud-livekit-bridge/QUICK-START.md new file mode 100644 index 0000000000..c6fadda58c --- /dev/null +++ b/cloud/packages/cloud-livekit-bridge/QUICK-START.md @@ -0,0 +1,287 @@ +# Quick Start: Better Stack Logging for LiveKit Bridge + +This is a **5-minute setup** to get Go bridge logs into Better Stack so you can debug the token expiration issue. + +## ๐Ÿš€ Quick Setup (5 minutes) + +### Step 1: Create Better Stack Source (2 min) + +1. Go to https://telemetry.betterstack.com/ +2. Click **Sources** โ†’ **New Source** +3. Select platform: **HTTP** +4. Name: `LiveKit gRPC Bridge` +5. Region: Choose your region (e.g., `us_east`) +6. Click **Create** + +You'll get: + +- **Source Token**: Copy this (looks like: `FczKcxEhjEDE58dBX7XaeX1q`) +- **Ingesting Host**: Copy this (looks like: `s123.us-east-1.betterstackdata.com`) + +### Step 2: Test the Connection (1 min) + +```bash +cd cloud/packages/cloud-livekit-bridge + +# Export your credentials +export BETTERSTACK_SOURCE_TOKEN="YOUR_TOKEN_HERE" +export BETTERSTACK_INGESTING_HOST="YOUR_HOST_HERE" + +# Run test script +./test-betterstack.sh +``` + +You should see: + +``` +โœ… Single log sent successfully (HTTP 202) +โœ… Batch logs sent successfully (HTTP 202) +โœ… Complex log sent successfully (HTTP 202) +๐ŸŽ‰ All tests passed! +``` + +### Step 3: Add to Your Environment (1 min) + +Add to `cloud/.env`: + +```bash +# Better Stack Configuration +BETTERSTACK_SOURCE_TOKEN=FczKcxEhjEDE58dBX7XaeX1q +BETTERSTACK_INGESTING_HOST=s123.us-east-1.betterstackdata.com +``` + +Update `cloud/docker-compose.dev.yml`: + +```yaml +livekit-bridge: + build: + context: ./packages/cloud-livekit-bridge + dockerfile: Dockerfile + environment: + - PORT=9090 + - LOG_LEVEL=debug + - LIVEKIT_URL=${LIVEKIT_URL} + - LIVEKIT_API_KEY=${LIVEKIT_API_KEY} + - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET} + - LIVEKIT_GRPC_SOCKET=/var/run/livekit/bridge.sock + - BETTERSTACK_SOURCE_TOKEN=${BETTERSTACK_SOURCE_TOKEN} # โ† Add this + - BETTERSTACK_INGESTING_HOST=${BETTERSTACK_INGESTING_HOST} # โ† Add this + volumes: + - livekit_socket:/var/run/livekit + restart: "no" +``` + +### Step 4: Update Go Code (1 min) + +**Update `main.go`:** + +```go +package main + +import ( + "log" + "os" + "os/signal" + "syscall" + + "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/logger" + pb "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/proto" + // ... other imports +) + +func main() { + // Initialize Better Stack logger + bsLogger := logger.NewFromEnv() + defer bsLogger.Close() + + bsLogger.LogInfo("LiveKit gRPC Bridge starting", map[string]interface{}{ + "version": "1.0.0", + }) + + // ... rest of your main.go code + + // Pass logger to service + lkService := NewLiveKitBridgeService(bsLogger) + pb.RegisterLiveKitBridgeServer(grpcServer, lkService) + + // ... rest of setup +} +``` + +**Update `service.go`:** + +```go +type LiveKitBridgeService struct { + pb.UnimplementedLiveKitBridgeServer + rooms map[string]*RoomConnection + roomsMu sync.RWMutex + bsLogger *logger.BetterStackLogger // โ† Add this +} + +func NewLiveKitBridgeService(bsLogger *logger.BetterStackLogger) *LiveKitBridgeService { + return &LiveKitBridgeService{ + rooms: make(map[string]*RoomConnection), + bsLogger: bsLogger, // โ† Add this + } +} + +func (s *LiveKitBridgeService) JoinRoom(req *pb.JoinRoomRequest, stream pb.LiveKitBridge_JoinRoomServer) error { + s.bsLogger.LogInfo("JoinRoom request received", map[string]interface{}{ + "user_id": req.UserId, + "session_id": req.SessionId, + "room_name": req.RoomName, + "livekit_url": req.LivekitUrl, + }) + + // ... your existing code + + if err != nil { + s.bsLogger.LogError("Failed to join room", err, map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + }) + return err + } + + s.bsLogger.LogInfo("Successfully joined room", map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + }) + + return nil +} +``` + +### Step 5: Restart and Test (1 min) + +```bash +cd cloud + +# Rebuild and restart +docker-compose -f docker-compose.dev.yml down +docker-compose -f docker-compose.dev.yml up --build livekit-bridge + +# You should see: +# [BetterStack] Logger enabled, sending to s123.us-east-1.betterstackdata.com +``` + +## ๐Ÿ” Search Your Logs + +Go to Better Stack โ†’ Your Source โ†’ Live Tail + +Try these queries: + +``` +# All bridge logs +service:livekit-bridge + +# Errors only +service:livekit-bridge AND level:error + +# Token errors +service:livekit-bridge AND error:*token* + +# Specific user +service:livekit-bridge AND user_id:"isaiah@mentra.glass" + +# Region switch events +service:livekit-bridge AND message:*region* +``` + +## ๐Ÿ› Debugging Token Expiration + +Now that logs are in Better Stack, you can trace the token expiration issue: + +1. **Reproduce the issue:** + + ```bash + # Switch from cloud-debug to cloud-livekit in mobile app + ``` + +2. **Search for token errors:** + + ``` + service:livekit-bridge AND error:*expired* + ``` + +3. **Find the connection sequence:** + + ``` + service:livekit-bridge AND user_id:"isaiah@mentra.glass" + AND (message:*JoinRoom* OR error:*token*) + ``` + +4. **Check timestamps:** + - Look at when token was issued + - Look at when connection failed + - Calculate time difference (should be ~10 minutes) + +## ๐Ÿ“š Full Documentation + +For complete setup and advanced features: + +- **Full Setup Guide**: [BETTERSTACK_SETUP.md](./BETTERSTACK_SETUP.md) +- **Token Analysis**: [../../../issues/livekit-ios-bug/TOKEN-EXPIRATION-ANALYSIS.md](../../../issues/livekit-ios-bug/TOKEN-EXPIRATION-ANALYSIS.md) +- **Debug Commands**: [../../../issues/livekit-ios-bug/DEBUG-COMMANDS.md](../../../issues/livekit-ios-bug/DEBUG-COMMANDS.md) + +## โœ… Verification Checklist + +- [ ] Better Stack source created +- [ ] Test script passes (all 3 tests) +- [ ] Environment variables added to `.env` +- [ ] Docker compose updated +- [ ] `main.go` updated with logger +- [ ] `service.go` updated with logger +- [ ] Container restarted +- [ ] See logs in Better Stack Live Tail +- [ ] Can search and filter logs + +## ๐Ÿ†˜ Troubleshooting + +**Logs not appearing?** + +```bash +# Check environment variables in container +docker-compose exec livekit-bridge env | grep BETTERSTACK + +# Check container logs for Better Stack messages +docker-compose logs livekit-bridge | grep BetterStack +``` + +You should see: + +``` +[BetterStack] Logger enabled, sending to s123... +``` + +If you see: + +``` +[BetterStack] Logger disabled (missing BETTERSTACK_SOURCE_TOKEN or BETTERSTACK_INGESTING_HOST) +``` + +Then environment variables are not set correctly. + +**HTTP 403 Forbidden?** + +Your source token is invalid. Double-check the token from Better Stack. + +**HTTP 413 Payload Too Large?** + +Logs are too big. Reduce batch size in `logger/betterstack.go`: + +```go +BatchSize: 5, // Reduce from 10 +``` + +## ๐ŸŽฏ Next Steps + +1. โœ… Get logs flowing to Better Stack +2. ๐Ÿ” Reproduce token expiration issue +3. ๐Ÿ“Š Analyze logs to confirm 10-minute token lifetime +4. ๐Ÿ› ๏ธ Implement token refresh (see TOKEN-EXPIRATION-ANALYSIS.md) +5. โœ… Test region switching works without errors + +--- + +**Questions?** Check the [full documentation](./BETTERSTACK_SETUP.md) or the [GitHub issues](https://github.com/Mentra-Community/MentraOS/issues). diff --git a/cloud/packages/cloud-livekit-bridge/livekit-bridge b/cloud/packages/cloud-livekit-bridge/livekit-bridge index b376ff1097..69033d90d4 100755 Binary files a/cloud/packages/cloud-livekit-bridge/livekit-bridge and b/cloud/packages/cloud-livekit-bridge/livekit-bridge differ diff --git a/cloud/packages/cloud-livekit-bridge/logger/betterstack.go b/cloud/packages/cloud-livekit-bridge/logger/betterstack.go new file mode 100644 index 0000000000..6c1e77cc40 --- /dev/null +++ b/cloud/packages/cloud-livekit-bridge/logger/betterstack.go @@ -0,0 +1,256 @@ +package logger + +import ( + "bytes" + "encoding/json" + "fmt" + "io" + "log" + "net/http" + "os" + "sync" + "time" +) + +// BetterStackLogger sends logs to Better Stack HTTP endpoint +type BetterStackLogger struct { + token string + ingestingHost string + client *http.Client + batchSize int + flushInterval time.Duration + buffer []LogEntry + bufferMu sync.Mutex + stopCh chan struct{} + wg sync.WaitGroup + enabled bool +} + +// LogEntry represents a single log entry +type LogEntry struct { + Message string `json:"message"` + Level string `json:"level,omitempty"` + Timestamp string `json:"dt"` + Service string `json:"service,omitempty"` + UserID string `json:"user_id,omitempty"` + SessionID string `json:"session_id,omitempty"` + RoomName string `json:"room_name,omitempty"` + Error string `json:"error,omitempty"` + Extra map[string]interface{} `json:"extra,omitempty"` +} + +// Config for BetterStackLogger +type Config struct { + Token string + IngestingHost string + BatchSize int + FlushInterval time.Duration + Enabled bool +} + +// NewBetterStackLogger creates a new Better Stack logger +func NewBetterStackLogger(cfg Config) *BetterStackLogger { + if cfg.BatchSize == 0 { + cfg.BatchSize = 10 + } + if cfg.FlushInterval == 0 { + cfg.FlushInterval = 5 * time.Second + } + + logger := &BetterStackLogger{ + token: cfg.Token, + ingestingHost: cfg.IngestingHost, + client: &http.Client{ + Timeout: 10 * time.Second, + }, + batchSize: cfg.BatchSize, + flushInterval: cfg.FlushInterval, + buffer: make([]LogEntry, 0, cfg.BatchSize), + stopCh: make(chan struct{}), + enabled: cfg.Enabled, + } + + if logger.enabled { + logger.wg.Add(1) + go logger.flushWorker() + } + + return logger +} + +// Log sends a log entry to Better Stack +func (l *BetterStackLogger) Log(entry LogEntry) { + if !l.enabled { + return + } + + // Set timestamp if not provided + if entry.Timestamp == "" { + entry.Timestamp = time.Now().UTC().Format(time.RFC3339Nano) + } + + l.bufferMu.Lock() + l.buffer = append(l.buffer, entry) + shouldFlush := len(l.buffer) >= l.batchSize + l.bufferMu.Unlock() + + if shouldFlush { + l.Flush() + } +} + +// LogInfo logs an info message +func (l *BetterStackLogger) LogInfo(message string, fields map[string]interface{}) { + l.Log(LogEntry{ + Message: message, + Level: "info", + Service: "livekit-bridge", + Extra: fields, + }) +} + +// LogError logs an error message +func (l *BetterStackLogger) LogError(message string, err error, fields map[string]interface{}) { + if fields == nil { + fields = make(map[string]interface{}) + } + + entry := LogEntry{ + Message: message, + Level: "error", + Service: "livekit-bridge", + Extra: fields, + } + + if err != nil { + entry.Error = err.Error() + } + + l.Log(entry) +} + +// LogDebug logs a debug message +func (l *BetterStackLogger) LogDebug(message string, fields map[string]interface{}) { + l.Log(LogEntry{ + Message: message, + Level: "debug", + Service: "livekit-bridge", + Extra: fields, + }) +} + +// LogWarn logs a warning message +func (l *BetterStackLogger) LogWarn(message string, fields map[string]interface{}) { + l.Log(LogEntry{ + Message: message, + Level: "warn", + Service: "livekit-bridge", + Extra: fields, + }) +} + +// Flush sends all buffered logs immediately +func (l *BetterStackLogger) Flush() { + if !l.enabled { + return + } + + l.bufferMu.Lock() + if len(l.buffer) == 0 { + l.bufferMu.Unlock() + return + } + + // Copy buffer and clear it + entries := make([]LogEntry, len(l.buffer)) + copy(entries, l.buffer) + l.buffer = l.buffer[:0] + l.bufferMu.Unlock() + + // Send in background to avoid blocking + go l.sendBatch(entries) +} + +// sendBatch sends a batch of log entries to Better Stack +func (l *BetterStackLogger) sendBatch(entries []LogEntry) { + if len(entries) == 0 { + return + } + + jsonData, err := json.Marshal(entries) + if err != nil { + log.Printf("[BetterStack] Failed to marshal log entries: %v", err) + return + } + + url := fmt.Sprintf("https://%s", l.ingestingHost) + req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData)) + if err != nil { + log.Printf("[BetterStack] Failed to create request: %v", err) + return + } + + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", l.token)) + + resp, err := l.client.Do(req) + if err != nil { + log.Printf("[BetterStack] Failed to send logs: %v", err) + return + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusAccepted { + body, _ := io.ReadAll(resp.Body) + log.Printf("[BetterStack] Failed to send logs (status %d): %s", resp.StatusCode, string(body)) + } +} + +// flushWorker periodically flushes the buffer +func (l *BetterStackLogger) flushWorker() { + defer l.wg.Done() + + ticker := time.NewTicker(l.flushInterval) + defer ticker.Stop() + + for { + select { + case <-ticker.C: + l.Flush() + case <-l.stopCh: + l.Flush() // Final flush on shutdown + return + } + } +} + +// Close stops the logger and flushes remaining logs +func (l *BetterStackLogger) Close() { + if !l.enabled { + return + } + + close(l.stopCh) + l.wg.Wait() +} + +// NewFromEnv creates a BetterStackLogger from environment variables +func NewFromEnv() *BetterStackLogger { + token := os.Getenv("BETTERSTACK_SOURCE_TOKEN") + host := os.Getenv("BETTERSTACK_INGESTING_HOST") + enabled := token != "" && host != "" + + if !enabled { + log.Println("[BetterStack] Logger disabled (missing BETTERSTACK_SOURCE_TOKEN or BETTERSTACK_INGESTING_HOST)") + } else { + log.Printf("[BetterStack] Logger enabled, sending to %s", host) + } + + return NewBetterStackLogger(Config{ + Token: token, + IngestingHost: host, + BatchSize: 10, + FlushInterval: 5 * time.Second, + Enabled: enabled, + }) +} diff --git a/cloud/packages/cloud-livekit-bridge/main.go b/cloud/packages/cloud-livekit-bridge/main.go index 363fee99a6..f7de54fd1a 100644 --- a/cloud/packages/cloud-livekit-bridge/main.go +++ b/cloud/packages/cloud-livekit-bridge/main.go @@ -4,8 +4,11 @@ import ( "log" "net" "os" + "os/signal" "path/filepath" + "syscall" + "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/logger" pb "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/proto" "google.golang.org/grpc" "google.golang.org/grpc/health" @@ -14,11 +17,22 @@ import ( ) func main() { + // Initialize Better Stack logger + bsLogger := logger.NewFromEnv() + defer bsLogger.Close() + log.Println("Starting LiveKit gRPC Bridge...") + bsLogger.LogInfo("LiveKit gRPC Bridge starting", map[string]interface{}{ + "version": "1.0.0", + }) // Load configuration config := loadConfig() log.Printf("Configuration loaded: Port=%s, LiveKitURL=%s", config.Port, config.LiveKitURL) + bsLogger.LogInfo("Configuration loaded", map[string]interface{}{ + "port": config.Port, + "livekit_url": config.LiveKitURL, + }) // Create gRPC server grpcServer := grpc.NewServer( @@ -27,7 +41,7 @@ func main() { ) // Register LiveKit bridge service - bridgeService := NewLiveKitBridgeService(config) + bridgeService := NewLiveKitBridgeService(config, bsLogger) pb.RegisterLiveKitBridgeServer(grpcServer, bridgeService) // Register health check service @@ -47,39 +61,69 @@ func main() { // Use Unix domain socket // Remove existing socket file if it exists if err := os.RemoveAll(socketPath); err != nil { + bsLogger.LogError("Failed to remove existing socket", err, nil) log.Fatalf("Failed to remove existing socket: %v", err) } // Ensure directory exists socketDir := filepath.Dir(socketPath) if err := os.MkdirAll(socketDir, 0755); err != nil { + bsLogger.LogError("Failed to create socket directory", err, map[string]interface{}{ + "socket_dir": socketDir, + }) log.Fatalf("Failed to create socket directory: %v", err) } lis, err = net.Listen("unix", socketPath) if err != nil { + bsLogger.LogError("Failed to listen on Unix socket", err, map[string]interface{}{ + "socket_path": socketPath, + }) log.Fatalf("Failed to listen on Unix socket %s: %v", socketPath, err) } // Set socket permissions to allow access if err := os.Chmod(socketPath, 0666); err != nil { + bsLogger.LogError("Failed to set socket permissions", err, nil) log.Fatalf("Failed to set socket permissions: %v", err) } log.Printf("โœ… LiveKit gRPC Bridge listening on Unix socket: %s", socketPath) + bsLogger.LogInfo("Server listening on Unix socket", map[string]interface{}{ + "socket_path": socketPath, + }) } else { // Use TCP port (backward compatibility) lis, err = net.Listen("tcp", ":"+config.Port) if err != nil { + bsLogger.LogError("Failed to listen on TCP", err, map[string]interface{}{ + "port": config.Port, + }) log.Fatalf("Failed to listen on port %s: %v", config.Port, err) } log.Printf("โœ… LiveKit gRPC Bridge listening on TCP port: %s", config.Port) + bsLogger.LogInfo("Server listening on TCP", map[string]interface{}{ + "port": config.Port, + }) } log.Println("Ready to accept connections...") + bsLogger.LogInfo("gRPC server ready to accept connections", nil) + + // Handle graceful shutdown + sigCh := make(chan os.Signal, 1) + signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM) + + go func() { + <-sigCh + bsLogger.LogInfo("Received shutdown signal, gracefully stopping", nil) + log.Println("Received shutdown signal, gracefully stopping...") + grpcServer.GracefulStop() + }() // Start serving if err := grpcServer.Serve(lis); err != nil { + bsLogger.LogError("Server failed", err, nil) log.Fatalf("Failed to serve: %v", err) } } diff --git a/cloud/packages/cloud-livekit-bridge/service.go b/cloud/packages/cloud-livekit-bridge/service.go index ae5c71c0c1..dcda1c351e 100644 --- a/cloud/packages/cloud-livekit-bridge/service.go +++ b/cloud/packages/cloud-livekit-bridge/service.go @@ -8,6 +8,7 @@ import ( "sync" "time" + "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/logger" pb "github.com/Mentra-Community/MentraOS/cloud/packages/cloud-livekit-bridge/proto" lksdk "github.com/livekit/server-sdk-go/v2" "google.golang.org/grpc/codes" @@ -34,13 +35,15 @@ type LiveKitBridgeService struct { sessions sync.Map // userId -> *RoomSession config *Config + bsLogger *logger.BetterStackLogger mu sync.RWMutex } // NewLiveKitBridgeService creates a new service instance -func NewLiveKitBridgeService(config *Config) *LiveKitBridgeService { +func NewLiveKitBridgeService(config *Config, bsLogger *logger.BetterStackLogger) *LiveKitBridgeService { return &LiveKitBridgeService{ - config: config, + config: config, + bsLogger: bsLogger, } } @@ -50,9 +53,17 @@ func (s *LiveKitBridgeService) JoinRoom( req *pb.JoinRoomRequest, ) (*pb.JoinRoomResponse, error) { log.Printf("JoinRoom request: userId=%s, room=%s", req.UserId, req.RoomName) + s.bsLogger.LogInfo("JoinRoom request received", map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + "livekit_url": req.LivekitUrl, + }) // Check if session already exists if _, exists := s.sessions.Load(req.UserId); exists { + s.bsLogger.LogWarn("Session already exists for user", map[string]interface{}{ + "user_id": req.UserId, + }) return &pb.JoinRoomResponse{ Success: false, Error: "session already exists for this user", @@ -99,6 +110,13 @@ func (s *LiveKitBridgeService) JoinRoom( case session.audioFromLiveKit <- pcmData: // Log periodically to show audio is flowing if receivedPackets%100 == 0 { + s.bsLogger.LogDebug("Audio flowing from LiveKit", map[string]interface{}{ + "user_id": req.UserId, + "received": receivedPackets, + "dropped": droppedPackets, + "channel_len": len(session.audioFromLiveKit), + "room_name": req.RoomName, + }) log.Printf("Audio flowing for %s: received=%d, dropped=%d, channelLen=%d", req.UserId, receivedPackets, droppedPackets, len(session.audioFromLiveKit)) } @@ -106,6 +124,12 @@ func (s *LiveKitBridgeService) JoinRoom( // Drop frame if channel full (backpressure) droppedPackets++ if droppedPackets%50 == 0 { + s.bsLogger.LogWarn("Dropping audio frames", map[string]interface{}{ + "user_id": req.UserId, + "total_dropped": droppedPackets, + "channel_full": len(session.audioFromLiveKit), + "room_name": req.RoomName, + }) log.Printf("Dropping audio frames for %s: total_dropped=%d, channel_full=%d", req.UserId, droppedPackets, len(session.audioFromLiveKit)) } @@ -113,6 +137,10 @@ func (s *LiveKitBridgeService) JoinRoom( }, }, OnDisconnected: func() { + s.bsLogger.LogWarn("Disconnected from LiveKit room", map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + }) log.Printf("Disconnected from LiveKit room: %s", req.RoomName) }, } @@ -125,6 +153,11 @@ func (s *LiveKitBridgeService) JoinRoom( lksdk.WithAutoSubscribe(false), ) if err != nil { + s.bsLogger.LogError("Failed to connect to LiveKit room", err, map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + "livekit_url": req.LivekitUrl, + }) return &pb.JoinRoomResponse{ Success: false, Error: fmt.Sprintf("failed to connect to room: %v", err), @@ -142,6 +175,13 @@ func (s *LiveKitBridgeService) JoinRoom( log.Printf("Successfully joined room: userId=%s, participantId=%s", req.UserId, room.LocalParticipant.Identity()) + s.bsLogger.LogInfo("Successfully joined LiveKit room", map[string]interface{}{ + "user_id": req.UserId, + "room_name": req.RoomName, + "participant_id": string(room.LocalParticipant.Identity()), + "participant_count": len(room.GetRemoteParticipants()) + 1, + }) + return &pb.JoinRoomResponse{ Success: true, ParticipantId: string(room.LocalParticipant.Identity()), @@ -155,6 +195,9 @@ func (s *LiveKitBridgeService) LeaveRoom( req *pb.LeaveRoomRequest, ) (*pb.LeaveRoomResponse, error) { log.Printf("LeaveRoom request: userId=%s", req.UserId) + s.bsLogger.LogInfo("LeaveRoom request received", map[string]interface{}{ + "user_id": req.UserId, + }) sessionVal, ok := s.sessions.Load(req.UserId) if !ok { @@ -268,10 +311,18 @@ func (s *LiveKitBridgeService) StreamAudio( } sentPackets++ if sentPackets%100 == 0 { + s.bsLogger.LogDebug("Sent audio chunks to TypeScript", map[string]interface{}{ + "user_id": userId, + "sent": sentPackets, + "channel_len": len(session.audioFromLiveKit), + }) log.Printf("Sent %d audio chunks to TypeScript for user %s (channelLen=%d)", sentPackets, userId, len(session.audioFromLiveKit)) } case <-time.After(2 * time.Second): + s.bsLogger.LogError("StreamAudio send timeout", fmt.Errorf("timeout after 2s"), map[string]interface{}{ + "user_id": userId, + }) log.Printf("StreamAudio send timeout for %s after 2s, client may be stuck", userId) errChan <- fmt.Errorf("send timeout after 2s") return @@ -288,10 +339,16 @@ func (s *LiveKitBridgeService) StreamAudio( // Wait for error or cancellation select { case err := <-errChan: + s.bsLogger.LogError("StreamAudio error", err, map[string]interface{}{ + "user_id": userId, + }) log.Printf("StreamAudio error for userId=%s: %v", userId, err) // CRITICAL: Clean up session on stream error // This prevents zombie sessions and "channel full" errors after reconnection issues + s.bsLogger.LogWarn("Cleaning up session due to stream error", map[string]interface{}{ + "user_id": userId, + }) log.Printf("Cleaning up session for %s due to stream error", userId) session.Close() s.sessions.Delete(userId) diff --git a/cloud/packages/cloud-livekit-bridge/test-betterstack.sh b/cloud/packages/cloud-livekit-bridge/test-betterstack.sh new file mode 100755 index 0000000000..26f0ec8bdb --- /dev/null +++ b/cloud/packages/cloud-livekit-bridge/test-betterstack.sh @@ -0,0 +1,156 @@ +#!/bin/bash +set -e + +# Test script for Better Stack HTTP logging +# This script tests the Better Stack HTTP endpoint without needing the full Go app + +echo "๐Ÿงช Testing Better Stack HTTP Logging" +echo "======================================" + +# Check for required environment variables +if [ -z "$BETTERSTACK_SOURCE_TOKEN" ]; then + echo "โŒ Error: BETTERSTACK_SOURCE_TOKEN is not set" + echo " Please set it in your .env file or export it:" + echo " export BETTERSTACK_SOURCE_TOKEN=your_token_here" + exit 1 +fi + +if [ -z "$BETTERSTACK_INGESTING_HOST" ]; then + echo "โŒ Error: BETTERSTACK_INGESTING_HOST is not set" + echo " Please set it in your .env file or export it:" + echo " export BETTERSTACK_INGESTING_HOST=sXXX.region.betterstackdata.com" + exit 1 +fi + +echo "โœ… Environment variables found:" +echo " Token: ${BETTERSTACK_SOURCE_TOKEN:0:20}..." +echo " Host: $BETTERSTACK_INGESTING_HOST" +echo "" + +# Test 1: Single log entry +echo "๐Ÿ“ค Test 1: Sending single log entry..." +RESPONSE=$(curl -s -w "\nHTTP_CODE:%{http_code}" -X POST \ + "https://$BETTERSTACK_INGESTING_HOST" \ + -H "Authorization: Bearer $BETTERSTACK_SOURCE_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "message": "Test log from LiveKit Bridge", + "level": "info", + "service": "livekit-bridge", + "dt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'" + }') + +HTTP_CODE=$(echo "$RESPONSE" | grep "HTTP_CODE" | cut -d: -f2) +BODY=$(echo "$RESPONSE" | grep -v "HTTP_CODE") + +if [ "$HTTP_CODE" = "202" ]; then + echo "โœ… Single log sent successfully (HTTP 202)" +else + echo "โŒ Failed to send single log (HTTP $HTTP_CODE)" + echo " Response: $BODY" + exit 1 +fi + +sleep 1 + +# Test 2: Batch of logs +echo "" +echo "๐Ÿ“ค Test 2: Sending batch of logs..." +RESPONSE=$(curl -s -w "\nHTTP_CODE:%{http_code}" -X POST \ + "https://$BETTERSTACK_INGESTING_HOST" \ + -H "Authorization: Bearer $BETTERSTACK_SOURCE_TOKEN" \ + -H "Content-Type: application/json" \ + -d '[ + { + "message": "LiveKit bridge started", + "level": "info", + "service": "livekit-bridge", + "dt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'", + "extra": { + "version": "1.0.0", + "socket": "/var/run/livekit/bridge.sock" + } + }, + { + "message": "JoinRoom request received", + "level": "info", + "service": "livekit-bridge", + "user_id": "test@example.com", + "session_id": "test-session-123", + "room_name": "test-room", + "dt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'" + }, + { + "message": "Token validation failed", + "level": "error", + "service": "livekit-bridge", + "user_id": "test@example.com", + "error": "token is expired", + "dt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'" + } + ]') + +HTTP_CODE=$(echo "$RESPONSE" | grep "HTTP_CODE" | cut -d: -f2) +BODY=$(echo "$RESPONSE" | grep -v "HTTP_CODE") + +if [ "$HTTP_CODE" = "202" ]; then + echo "โœ… Batch logs sent successfully (HTTP 202)" +else + echo "โŒ Failed to send batch logs (HTTP $HTTP_CODE)" + echo " Response: $BODY" + exit 1 +fi + +sleep 1 + +# Test 3: Log with all fields +echo "" +echo "๐Ÿ“ค Test 3: Sending log with all fields..." +RESPONSE=$(curl -s -w "\nHTTP_CODE:%{http_code}" -X POST \ + "https://$BETTERSTACK_INGESTING_HOST" \ + -H "Authorization: Bearer $BETTERSTACK_SOURCE_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "message": "Region switch detected", + "level": "warn", + "service": "livekit-bridge", + "user_id": "isaiah@mentra.glass", + "session_id": "session-xyz-789", + "room_name": "isaiah@mentra.glass", + "dt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'", + "extra": { + "old_region": "centralus", + "new_region": "france", + "old_url": "wss://mentraos.livekit.cloud", + "new_url": "wss://mentraos-france.livekit.cloud", + "action": "closing_old_connections" + } + }') + +HTTP_CODE=$(echo "$RESPONSE" | grep "HTTP_CODE" | cut -d: -f2) +BODY=$(echo "$RESPONSE" | grep -v "HTTP_CODE") + +if [ "$HTTP_CODE" = "202" ]; then + echo "โœ… Complex log sent successfully (HTTP 202)" +else + echo "โŒ Failed to send complex log (HTTP $HTTP_CODE)" + echo " Response: $BODY" + exit 1 +fi + +# Summary +echo "" +echo "๐ŸŽ‰ All tests passed!" +echo "" +echo "๐Ÿ“Š Next steps:" +echo " 1. Go to https://telemetry.betterstack.com/" +echo " 2. Navigate to your 'LiveKit gRPC Bridge' source" +echo " 3. You should see the test logs in Live Tail" +echo "" +echo "๐Ÿ” Try these queries:" +echo " - service:livekit-bridge" +echo " - service:livekit-bridge AND level:error" +echo " - service:livekit-bridge AND user_id:test@example.com" +echo " - service:livekit-bridge AND message:*Region switch*" +echo "" +echo "โœจ Integration is working! Now update your Go code to use the logger." diff --git a/cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts b/cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts index c0e0ca27b1..6c84dcdaf4 100644 --- a/cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts +++ b/cloud/packages/cloud/src/services/session/livekit/LiveKitGrpcClient.ts @@ -50,10 +50,8 @@ export class LiveKitGrpcClient { private currentParams: JoinRoomParams | null = null; private eventHandlers: Map void> = new Map(); - // Endianness handling - private readonly endianMode: "auto" | "swap" | "off"; - private endianSwapDetermined = false; - private shouldSwapBytes = false; + // Endianness handling: "swap" to force byte swapping, "off" for no swapping (default) + private readonly shouldSwapBytes: boolean; constructor(userSession: UserSession, bridgeUrl?: string) { this.userSession = userSession; @@ -74,9 +72,21 @@ export class LiveKitGrpcClient { "livekit-bridge:9090"; } - // Initialize endianness mode from environment - const mode = (process.env.LIVEKIT_PCM_ENDIAN || "auto").toLowerCase(); - this.endianMode = mode as "auto" | "swap" | "off"; + // Initialize endianness mode from environment: "swap" or "off" (default) + const mode = (process.env.LIVEKIT_PCM_ENDIAN || "off").toLowerCase(); + this.shouldSwapBytes = mode === "swap"; + + if (this.shouldSwapBytes) { + this.logger.info( + { feature: "livekit-grpc" }, + "Endianness: SWAP mode enabled - will convert big-endian to little-endian", + ); + } else { + this.logger.info( + { feature: "livekit-grpc" }, + "Endianness: OFF mode - no byte swapping", + ); + } // Load proto and create gRPC client this.initializeGrpcClient(); @@ -261,9 +271,9 @@ export class LiveKitGrpcClient { try { let pcmData = Buffer.from(chunk.pcm_data); - // Handle endianness if needed - if (this.endianMode !== "off" && pcmData.length >= 2) { - pcmData = this.handleEndianness(pcmData, receivedChunks); + // Swap bytes if mode is "swap" + if (this.shouldSwapBytes && pcmData.length >= 2) { + pcmData = this.swapBytes(pcmData); } receivedChunks++; @@ -398,17 +408,11 @@ export class LiveKitGrpcClient { } /** - * Handle endianness detection and byte swapping + * Swap bytes for endianness conversion (big-endian to little-endian) */ - private handleEndianness(buf: Buffer, frameCount: number): Buffer { - // Guard: ensure even-length PCM data - if ((buf.length & 1) === 1) { - if (frameCount % 200 === 0) { - this.logger.warn( - { feature: "livekit-grpc", rawLen: buf.length }, - "Odd-length PCM payload detected; dropping last byte", - ); - } + private swapBytes(buf: Buffer): Buffer { + // Ensure even-length buffer + if (buf.length % 2 === 1) { buf = buf.slice(0, buf.length - 1); } @@ -416,72 +420,11 @@ export class LiveKitGrpcClient { return buf; } - // Force swap if mode is "swap" (check FIRST before auto-detection) - if (this.endianMode === "swap") { - this.shouldSwapBytes = true; - this.endianSwapDetermined = true; - } - - // Detect endianness once in 'auto' mode (only if not already forced) - if ( - !this.endianSwapDetermined && - this.endianMode === "auto" && - buf.length >= 16 - ) { - let oddAreMostlyFFor00 = 0; // count of MSB being 0xFF or 0x00 (sign-extension in BE) - let evenAreMostlyFFor00 = 0; // count of LSB being 0xFF or 0x00 (sign-extension in LE) - const pairs = Math.min(16, Math.floor(buf.length / 2)); - - for (let i = 0; i < pairs; i++) { - const b0 = buf[2 * i]; // LSB if LE, MSB if BE - const b1 = buf[2 * i + 1]; // MSB if LE, LSB if BE - if (b0 === 0x00 || b0 === 0xff) evenAreMostlyFFor00++; - if (b1 === 0x00 || b1 === 0xff) oddAreMostlyFFor00++; - } - - // If upper byte (b1) has more sign-extension pattern than lower byte, - // it's likely big-endian and needs swapping to little-endian - if (oddAreMostlyFFor00 >= evenAreMostlyFFor00 + 6) { - this.shouldSwapBytes = true; - } else { - this.shouldSwapBytes = false; - } - - this.endianSwapDetermined = true; - this.logger.info( - { - feature: "livekit-grpc", - oddFF00: oddAreMostlyFFor00, - evenFF00: evenAreMostlyFFor00, - willSwap: this.shouldSwapBytes, - }, - "PCM endianness detection result", - ); - } - - // Perform byte swapping if needed - if (this.shouldSwapBytes) { - // Log once to confirm swapping is happening - if (frameCount === 0) { - this.logger.info( - { feature: "livekit-grpc", mode: this.endianMode }, - "SWAPPING BYTES - converting big-endian to little-endian", - ); - } - // Create new buffer for swapped data - const swapped = Buffer.allocUnsafe(buf.length); - for (let i = 0; i + 1 < buf.length; i += 2) { - swapped[i] = buf[i + 1]; - swapped[i + 1] = buf[i]; - } - return swapped; - } - - if (frameCount === 0) { - this.logger.info( - { feature: "livekit-grpc", mode: this.endianMode }, - "NOT SWAPPING BYTES - data is already little-endian", - ); + // Swap bytes in-place + for (let i = 0; i < buf.length; i += 2) { + const tmp = buf[i]; + buf[i] = buf[i + 1]; + buf[i + 1] = tmp; } return buf; diff --git a/cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts b/cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts index 3a3e85e10a..32ffdc19dd 100644 --- a/cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts +++ b/cloud/packages/cloud/src/services/websocket/websocket-glasses.service.ts @@ -40,7 +40,7 @@ const logger = rootLogger.child({ service: SERVICE_NAME }); const RECONNECT_GRACE_PERIOD_MS = 1000 * 60 * 1; // 1 minute // SAFETY FLAG: Set to false to disable grace period cleanup entirely -const GRACE_PERIOD_CLEANUP_ENABLED = false; // TODO: Set to true when ready to enable auto-cleanup +const GRACE_PERIOD_CLEANUP_ENABLED = true; // Enable auto-cleanup when WebSocket disconnects const DEFAULT_AUGMENTOS_SETTINGS = { useOnboardMic: false, diff --git a/cloud/porter-dev.yaml b/cloud/porter-dev.yaml index 238d84730f..ca16394471 100644 --- a/cloud/porter-dev.yaml +++ b/cloud/porter-dev.yaml @@ -11,12 +11,14 @@ build: dockerfile: ./cloud/docker/Dockerfile.porter services: -- name: cloud - type: web - run: node packages/cloud/dist/index.js - port: 80 - cpuCores: 2.9 - ramMegabytes: 4096 - env: - HOST: "0.0.0.0" - SERVICE_NAME: "cloud" + - name: cloud + type: web + run: node packages/cloud/dist/index.js + port: 80 + cpuCores: 2.9 + ramMegabytes: 4096 + env: + HOST: "0.0.0.0" + SERVICE_NAME: "cloud" + # Better Stack logging for Go bridge (when enabled) + BETTERSTACK_INGESTING_HOST: "s1311181.eu-nbg-2.betterstackdata.com" diff --git a/cloud/porter-livekit.yaml b/cloud/porter-livekit.yaml index e28daef9f8..6b336cfd88 100644 --- a/cloud/porter-livekit.yaml +++ b/cloud/porter-livekit.yaml @@ -21,3 +21,6 @@ services: LIVEKIT_PCM_ENDIAN: "off" # Add any Go service environment variables here LOG_LEVEL: "info" + # Better Stack logging for Go bridge + BETTERSTACK_SOURCE_TOKEN: "${BETTERSTACK_SOURCE_TOKEN}" + BETTERSTACK_INGESTING_HOST: "s1311181.eu-nbg-2.betterstackdata.com" diff --git a/cloud/porter.yaml b/cloud/porter.yaml index 411ad20255..4d63f405d7 100644 --- a/cloud/porter.yaml +++ b/cloud/porter.yaml @@ -45,6 +45,9 @@ services: LIVEKIT_PCM_ENDIAN: "off" # Add any Go service environment variables here LOG_LEVEL: "info" + # Better Stack logging for Go bridge + BETTERSTACK_SOURCE_TOKEN: "${BETTERSTACK_SOURCE_TOKEN}" + BETTERSTACK_INGESTING_HOST: "s1311181.eu-nbg-2.betterstackdata.com" # Optional: Configure health checks # healthCheck: # enabled: true diff --git a/docs/beginner-setup-guide.mdx b/docs/beginner-setup-guide.mdx new file mode 100644 index 0000000000..5d57df750f --- /dev/null +++ b/docs/beginner-setup-guide.mdx @@ -0,0 +1,524 @@ +--- +title: "Complete Beginner's Setup Guide" +description: "Everything you need to know to get started with MentraOS development from scratch. Perfect for complete beginners with no prior experience." +--- + +# Complete Beginner's Setup Guide + +Welcome to MentraOS! This comprehensive guide will take you from zero to having a fully functional smart glasses development environment. Whether you're a complete beginner or have some programming experience, this guide will walk you through every step. + +## Table of Contents + +1. [What is MentraOS?](#what-is-mentraos) +2. [Understanding the Architecture](#understanding-the-architecture) +3. [Prerequisites & System Requirements](#prerequisites--system-requirements) +4. [Step 1: Install Development Tools](#step-1-install-development-tools) +5. [Step 2: Set Up Your Development Environment](#step-2-set-up-your-development-environment) +6. [Step 3: Get Smart Glasses (Optional)](#step-3-get-smart-glasses-optional) +7. [Step 4: Create Your First App](#step-4-create-your-first-app) +8. [Step 5: Test Your Setup](#step-5-test-your-setup) +9. [Troubleshooting Common Issues](#troubleshooting-common-issues) +10. [Next Steps](#next-steps) + +## What is MentraOS? + +MentraOS is an open-source operating system and development platform for smart glasses. Think of it as "Android for smart glasses" - it provides: + +- **Cross-compatibility**: Your apps work on any supported smart glasses +- **Easy development**: Use familiar web technologies (TypeScript, React) +- **Real-time communication**: Apps can respond to voice, gestures, and sensors +- **Cloud integration**: Powerful backend services for AI, storage, and more + +### Why Smart Glasses? + +Smart glasses are the next evolution of personal computing: +- **Hands-free interaction**: Perfect for when your hands are busy +- **Always-available information**: Get help without looking at your phone +- **Augmented reality**: Overlay digital information on the real world +- **Accessibility**: Help people with disabilities access technology + +## Understanding the Architecture + +Before diving in, let's understand how MentraOS works: + +```mermaid +graph TD + A[Smart Glasses] -->|Bluetooth| B[Mobile App] + B -->|Internet| C[MentraOS Cloud] + C -->|WebSocket| D[Your App Server] + D -->|Display Commands| C + C -->|Commands| B + B -->|Display| A + + E[Developer Console] -->|Manage Apps| C + F[App Store] -->|Discover Apps| C +``` + +### Key Components: + +1. **Smart Glasses**: The physical device (camera, display, microphone, speakers) +2. **Mobile App**: Runs on your phone, connects glasses to the cloud +3. **MentraOS Cloud**: Backend services that handle communication and AI +4. **Your App Server**: Your custom application that users interact with +5. **Developer Console**: Web interface to manage your apps +6. **App Store**: Where users discover and install apps + +## Prerequisites & System Requirements + +### Required Knowledge +- **Basic programming**: Familiarity with any programming language +- **Command line**: Comfortable with terminal/command prompt +- **Web concepts**: Understanding of HTTP, APIs, and web servers + +### System Requirements + +#### For App Development (Minimum) +- **Operating System**: Windows 10+, macOS 10.15+, or Ubuntu 18.04+ +- **RAM**: 8GB minimum, 16GB recommended +- **Storage**: 10GB free space +- **Internet**: Stable broadband connection + +#### For Mobile App Development +- **Android Studio**: For Android development +- **Xcode**: For iOS development (macOS only) +- **Physical device**: Android phone or iPhone for testing + +#### For Smart Glasses Development +- **Compatible glasses**: See [supported devices](#supported-smart-glasses) +- **Android phone**: Required for glasses pairing + +## Step 1: Install Development Tools + +### 1.1 Install Node.js + +**Why**: MentraOS apps are built with TypeScript/JavaScript + +**Windows:** +1. Go to [nodejs.org](https://nodejs.org) +2. Download the LTS version (18.x or later) +3. Run the installer and follow the prompts +4. Verify installation: + ```bash + node --version + npm --version + ``` + +**macOS:** +```bash +# Using Homebrew (recommended) +brew install node + +# Or download from nodejs.org +``` + +**Linux (Ubuntu/Debian):** +```bash +# Using NodeSource repository +curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - +sudo apt-get install -y nodejs + +# Verify installation +node --version +npm --version +``` + +### 1.2 Install Bun (Recommended) + +**Why**: Bun is faster than npm and is the preferred package manager for MentraOS + +```bash +# Install Bun +curl -fsSL https://bun.sh/install | bash + +# Restart your terminal, then verify +bun --version +``` + +### 1.3 Install Git + +**Why**: Required for cloning repositories and version control + +**Windows:** +- Download from [git-scm.com](https://git-scm.com) +- Run installer with default settings + +**macOS:** +```bash +# Using Homebrew +brew install git + +# Or install Xcode Command Line Tools +xcode-select --install +``` + +**Linux:** +```bash +sudo apt update +sudo apt install git +``` + +### 1.4 Install a Code Editor + +**Recommended: Visual Studio Code** + +1. Download from [code.visualstudio.com](https://code.visualstudio.com) +2. Install these extensions: + - TypeScript and JavaScript Language Features + - Prettier - Code formatter + - ESLint + - GitLens + +## Step 2: Set Up Your Development Environment + +### 2.1 Create a Development Directory + +```bash +# Create a dedicated folder for MentraOS development +mkdir mentraos-development +cd mentraos-development +``` + +### 2.2 Clone the MentraOS Repository + +```bash +# Clone the main repository +git clone https://github.com/Mentra-Community/MentraOS.git +cd MentraOS + +# Verify the structure +ls -la +``` + +You should see folders like: +- `mobile/` - Mobile app source code +- `cloud/` - Backend services +- `asg_client/` - Smart glasses client +- `docs/` - Documentation + +### 2.3 Set Up Mobile App Development + +**For Android Development:** + +1. **Install Android Studio:** + - Download from [developer.android.com](https://developer.android.com/studio) + - Install with default settings + - Open Android Studio and complete the setup wizard + +2. **Install Android SDK:** + - Open Android Studio + - Go to Tools โ†’ SDK Manager + - Install Android 13 (API 33) or later + - Install Android SDK Build-Tools + +3. **Set up environment variables:** + + **Windows:** + ```cmd + # Add to System Environment Variables + ANDROID_HOME=C:\Users\%USERNAME%\AppData\Local\Android\Sdk + # Add to PATH + %ANDROID_HOME%\platform-tools + %ANDROID_HOME%\tools + ``` + + **macOS/Linux:** + ```bash + # Add to ~/.bashrc or ~/.zshrc + export ANDROID_HOME=$HOME/Library/Android/sdk + export PATH=$PATH:$ANDROID_HOME/platform-tools + export PATH=$PATH:$ANDROID_HOME/tools + ``` + +**For iOS Development (macOS only):** + +1. **Install Xcode:** + - Download from Mac App Store + - Install Xcode Command Line Tools: + ```bash + xcode-select --install + ``` + +2. **Install CocoaPods:** + ```bash + sudo gem install cocoapods + ``` + +### 2.4 Set Up Cloud Development (Optional) + + +**For Cloud Development**: If you plan to work on the MentraOS Cloud backend or need to run the full cloud stack locally, see the comprehensive [Local Development Setup](https://docs.mentraos.com/development/local-setup) guide in the Cloud documentation. + +The cloud setup includes: +- Detailed environment configuration +- Docker setup for services +- Database configuration (MongoDB) +- Authentication setup (Supabase) +- Third-party service integrations (Azure, OpenAI, etc.) +- Web portals (Store & Developer Console) + +**For App Development Only**: You can skip this section and use the production cloud at `https://api.mentra.glass` + + +For a quick cloud setup overview: + +```bash +# Navigate to cloud directory +cd cloud + +# Install dependencies +bun install + +# Create environment file +cp .env.example .env +``` + +## Step 3: Get Smart Glasses (Optional) + +### Supported Smart Glasses + +| Device | Type | Display | Camera | Price Range | Best For | +|--------|------|---------|--------|-------------|----------| +| **Mentra Live** | Android-based | No | Yes | $200-400 | Camera apps, streaming | +| **Even Realities G1** | HUD | Yes | No | $800-1200 | Display apps, AR | +| **Mentra Mach 1** | HUD | Yes | No | $600-900 | Display apps, productivity | +| **Vuzix Z100** | HUD | Yes | No | $1000+ | Enterprise apps | + +### Choosing Your First Glasses + +**For beginners, we recommend:** + +1. **Mentra Live** - Best for learning camera-based development +2. **Even Realities G1** - Best for learning display-based development + +**If you don't have glasses yet:** +- You can still develop and test apps using the mobile app +- Use the simulator mode for basic testing +- Consider borrowing or renting glasses for initial development + +### Setting Up Your Glasses + +**For Mentra Live:** +1. Charge the glasses fully +2. Download MentraOS mobile app +3. Follow the in-app pairing instructions +4. Connect to WiFi through the app + +**For HUD glasses:** +1. Charge the glasses +2. Download MentraOS mobile app +3. Pair via Bluetooth +4. Calibrate the display if prompted + +## Step 4: Create Your First App + +### 4.1 Choose a Starting Template + +We'll use the Live Captions example - it's perfect for beginners: + +```bash +# Create a new app from template +gh repo create my-first-mentraos-app --template Mentra-Community/MentraOS-Cloud-Example-App + +# Clone your new repository +git clone https://github.com/YOUR_USERNAME/my-first-mentraos-app.git +cd my-first-mentraos-app + +# Install dependencies +bun install +``` + +### 4.2 Set Up Your App + +1. **Create environment file:** + ```bash + cp .env.example .env + ``` + +2. **Edit the .env file:** + ```env + PORT=3000 + PACKAGE_NAME=com.yourname.myfirstapp + MENTRAOS_API_KEY=your_api_key_here + ``` + +3. **Register your app:** + - Go to [console.mentra.glass](https://console.mentra.glass) + - Sign in with your account + - Click "Create App" + - Use the package name from your .env file + - Copy the API key to your .env file + +### 4.3 Run Your App + +```bash +# Start the development server +bun run dev + +# In another terminal, expose your app to the internet +ngrok http 3000 +``` + +### 4.4 Test Your App + +1. **Install MentraOS mobile app** on your phone +2. **Pair your glasses** (if you have them) +3. **Open the MentraOS app** and look for your app +4. **Launch your app** and test the functionality + +## Step 5: Test Your Setup + +### 5.1 Verify Mobile App Development + +```bash +# Navigate to mobile directory +cd ../mobile + +# Install dependencies +npm install + +# For iOS (macOS only) +cd ios && pod install && cd .. + +# Start the development server +npm start + +# In another terminal, run on device +npm run android # or npm run ios +``` + +### 5.2 Verify Cloud Development + +```bash +# Navigate to cloud directory +cd ../cloud + +# Start development environment +bun run dev + +# Check if services are running +curl http://localhost:8002/health +``` + +### 5.3 Test App Integration + +1. **Create a simple test app** that displays "Hello World" +2. **Deploy it locally** using ngrok +3. **Register it** in the developer console +4. **Test it** on your mobile app and glasses + +## Troubleshooting Common Issues + +### Issue: "Command not found" errors + +**Solution:** +- Restart your terminal after installing tools +- Check your PATH environment variable +- Verify installations with `--version` flags + +### Issue: Android Studio setup problems + +**Solution:** +- Ensure Java SDK 17 is installed +- Check ANDROID_HOME environment variable +- Run Android Studio as administrator (Windows) + +### Issue: Bun installation fails + +**Solution:** +- Try the alternative installation method: + ```bash + npm install -g bun + ``` +- Check your internet connection +- Ensure you have Node.js installed first + +### Issue: Mobile app won't build + +**Solution:** +- Clean and rebuild: + ```bash + cd mobile + npm run clean + npm install + npm run android + ``` +- Check Android SDK installation +- Verify device is connected and authorized + +### Issue: Cloud services won't start + +**Solution:** +- Check if ports 3000 and 8002 are available +- Verify Docker is running (if using Docker) +- Check the .env file configuration +- Look at the logs for specific error messages + +### Issue: Glasses won't pair + +**Solution:** +- Ensure Bluetooth is enabled +- Restart both the glasses and mobile app +- Check if glasses are in pairing mode +- Try forgetting and re-pairing the device + +### Issue: App not appearing in MentraOS + +**Solution:** +- Verify ngrok is running and accessible +- Check the app registration in developer console +- Ensure the package name matches exactly +- Restart the MentraOS mobile app + +## Next Steps + +Congratulations! You now have a complete MentraOS development environment. Here's what to do next: + +### 1. Explore the Documentation +- [Core Concepts](core-concepts) - Understand sessions, events, and app lifecycle +- [Events](events) - Learn about user interactions and sensor data +- [Layouts](layouts) - Create visual experiences on smart glasses +- [Permissions](permissions) - Access device capabilities securely + +### 2. Try More Examples +- [Example Apps](example-apps) - Explore different app types +- [Build From Scratch](getting-started) - Create apps from the ground up +- [Advanced Features](tools) - Implement AI tools and webviews + +### 3. Join the Community +- [Discord Server](https://mentra.glass/discord) - Get help and share ideas +- [GitHub Discussions](https://github.com/Mentra-Community/MentraOS/discussions) - Ask questions +- [Contributing Guide](contributing) - Contribute to the project + +### 4. Build Your First Real App +- Start with a simple idea (e.g., a note-taking app) +- Use the examples as a starting point +- Test on real devices +- Share your creation with the community + +### 5. Learn Advanced Topics +- [Deployment](railway-deployment) - Deploy your apps to production +- [Hardware Integration](hardware-requirements) - Work with different device capabilities +- [Performance Optimization](core-concepts) - Make your apps faster and more efficient + +## Getting Help + +If you run into issues: + +1. **Check the troubleshooting section** above +2. **Search existing issues** on GitHub +3. **Ask on Discord** - the community is very helpful +4. **Create a new issue** if you can't find a solution +5. **Read the logs** - they often contain helpful error messages + +## Additional Resources + +- [MentraOS Website](https://mentra.glass) - Official website +- [Developer Console](https://console.mentra.glass) - Manage your apps +- [App Store](https://apps.mentra.glass) - Discover existing apps +- [GitHub Organization](https://github.com/Mentra-Community) - Source code and examples + +--- + +**Welcome to the future of computing!** ๐Ÿš€ + +You're now ready to build amazing smart glasses applications. Start small, experiment often, and don't hesitate to ask for help. The MentraOS community is here to support you on your journey. diff --git a/docs/docs.json b/docs/docs.json index 47f9532b28..0b5c9deb26 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -14,6 +14,7 @@ { "group": "Getting Started", "pages": [ + "beginner-setup-guide", "quickstart", "example-apps", "getting-started",