diff --git a/README.md b/README.md index af0fa9f..a895a3b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Data Connectors -Playwright-based data connectors for [DataConnect](https://github.com/vana-com/data-connect). Each connector exports a user's data from a web platform using browser automation — credentials never leave the device. +Playwright-based data connectors for [DataConnect](https://github.com/vana-com/data-connect). Each connector exports a user's data from a web platform using browser automation. Credentials never leave the device. ## Connectors @@ -13,12 +13,22 @@ Playwright-based data connectors for [DataConnect](https://github.com/vana-com/d | Spotify | Spotify | playwright | spotify.profile, spotify.savedTracks, spotify.playlists | | YouTube | Google | playwright | youtube.profile, youtube.subscriptions, youtube.playlists, youtube.playlistItems, youtube.likes, youtube.watchLater, youtube.history (top 50 recent items) | +## Running a connector + +```bash +node run-connector.cjs ./github/github-playwright.js # JSON output (for agents) +node run-connector.cjs ./github/github-playwright.js --pretty # colored output (for humans) +node run-connector.cjs ./github/github-playwright.js --inputs '{"username":"x","password":"y"}' +``` + +See [`skills/vana-connect/`](skills/vana-connect/) for the agent skill: setup, running, creating new connectors, and data recipes. + ## Repository structure ``` -connectors/ +├── run-connector.cjs # Connector runner (symlink) ├── registry.json # Central registry (checksums, versions) -├── test-connector.cjs # Standalone test runner (see Testing locally) +├── skills/vana-connect/ # Agent skill (setup, create, run, recipes) ├── types/ │ └── connector.d.ts # TypeScript type definitions ├── schemas/ # JSON schemas for exported data @@ -46,24 +56,24 @@ connectors/ Each connector consists of two files inside a `/` directory: -- **`-playwright.js`** — the connector script (plain JS, runs inside the Playwright runner sidecar) -- **`-playwright.json`** — metadata (display name, login URL, selectors, scopes) +- **`-playwright.js`** -- the connector script (plain JS, runs inside the Playwright runner sidecar) +- **`-playwright.json`** -- metadata (display name, login URL, selectors, scopes) --- ## How connectors work -Connectors run in a sandboxed Playwright browser managed by the DataConnect app. The runner provides a `page` API object (not raw Playwright). The browser starts **headless**; connectors call `page.showBrowser()` when login is needed and `page.goHeadless()` after. +Connectors run in a sandboxed Playwright browser managed by the DataConnect app. The runner provides a `page` API object (not raw Playwright). The browser starts headless; connectors call `page.showBrowser()` when login is needed and `page.goHeadless()` after. ### Two-phase architecture -**Phase 1 — Login (visible browser)** +**Phase 1 -- Login (visible browser)** 1. Navigate to the platform's login page (headless) 2. Check if the user is already logged in via persistent session 3. If not, show the browser so the user can log in manually 4. Extract auth tokens/cookies once logged in -**Phase 2 — Data collection (headless)** +**Phase 2 -- Data collection (headless)** 1. Switch to headless mode (browser disappears) 2. Fetch data via API calls, network capture, or DOM scraping 3. Report structured progress to the UI @@ -71,7 +81,7 @@ Connectors run in a sandboxed Playwright browser managed by the DataConnect app. ### Scoped result format -Connectors return a **scoped result object** where data keys use the format `source.category` (e.g., `linkedin.profile`, `chatgpt.conversations`). The frontend auto-detects these scoped keys (any key containing a `.` that isn't a metadata field) and POSTs each scope separately to the Personal Server at `POST /v1/data/{scope}`. +Connectors return a scoped result object where data keys use the format `source.category` (e.g., `linkedin.profile`, `chatgpt.conversations`). The frontend auto-detects scoped keys (any key containing a `.` that isn't a metadata field) and POSTs each scope separately to the Personal Server at `POST /v1/data/{scope}`. ```javascript const result = { @@ -98,155 +108,14 @@ Metadata keys (`exportSummary`, `timestamp`, `version`, `platform`) are not trea ## Building a new connector -### 1. Create the metadata file - -Create `connectors//-playwright.json`: - -```json -{ - "id": "-playwright", - "version": "1.0.0", - "name": "Platform Name", - "company": "Company", - "description": "Exports your ... using Playwright browser automation.", - "connectURL": "https://platform.com/login", - "connectSelector": "css-selector-for-logged-in-state", - "exportFrequency": "daily", - "runtime": "playwright", - "vectorize_config": { "documents": "field_name" } -} -``` - -- `runtime` must be `"playwright"` -- `connectURL` is where the browser navigates initially -- `connectSelector` detects whether the user is logged in (e.g. an element only visible post-login) - -### 2. Create the connector script - -Create `connectors//-playwright.js`: - -```javascript -// State management -const state = { isComplete: false }; - -// ─── Login check ────────────────────────────────────── -const checkLoginStatus = async () => { - try { - return await page.evaluate(` - (() => { - const hasLoggedInEl = !!document.querySelector('LOGGED_IN_SELECTOR'); - const hasLoginForm = !!document.querySelector('LOGIN_FORM_SELECTOR'); - return hasLoggedInEl && !hasLoginForm; - })() - `); - } catch { return false; } -}; - -// ─── Main flow ──────────────────────────────────────── -(async () => { - // Phase 1: Login - await page.setData('status', 'Checking login status...'); - await page.sleep(2000); - - if (!(await checkLoginStatus())) { - await page.showBrowser('https://platform.com/login'); - await page.setData('status', 'Please log in...'); - await page.promptUser( - 'Please log in. Click "Done" when ready.', - async () => await checkLoginStatus(), - 2000 - ); - } - - // Phase 2: Headless data collection - await page.goHeadless(); - - await page.setProgress({ - phase: { step: 1, total: 2, label: 'Fetching profile' }, - message: 'Loading profile data...', - }); - - // ... fetch your data here ... - const items = []; - - // Build result using scoped keys (exportSummary is required) - const result = { - 'platform.items': { - items: items, - total: items.length, - }, - exportSummary: { - count: items.length, - label: items.length === 1 ? 'item' : 'items', - }, - timestamp: new Date().toISOString(), - version: '1.0.0-playwright', - platform: 'platform-name', - }; - - state.isComplete = true; - await page.setData('result', result); -})(); -``` - -### 3. Add a data schema (optional) - -Create `connectors/schemas/..json` to describe the exported data format: - -```json -{ - "name": "Platform Items", - "version": "1.0.0", - "scope": "platform.items", - "dialect": "json", - "description": "Description of the exported data", - "schema": { - "type": "object", - "properties": { - "items": { - "type": "array", - "items": { - "properties": { - "id": { "type": "string" }, - "title": { "type": "string" } - }, - "required": ["id", "title"] - } - } - }, - "required": ["items"] - } -} -``` - -### 4. Update the registry +See [`skills/vana-connect/CREATE.md`](skills/vana-connect/CREATE.md) for the full walkthrough. Summary: -Add your connector to `registry.json`. Generate checksums with: - -```bash -shasum -a 256 /-playwright.js | awk '{print "sha256:" $1}' -shasum -a 256 /-playwright.json | awk '{print "sha256:" $1}' -``` - -Then add an entry to the `connectors` array: - -```json -{ - "id": "-playwright", - "company": "", - "version": "1.0.0", - "name": "Platform Name", - "description": "...", - "files": { - "script": "/-playwright.js", - "metadata": "/-playwright.json" - }, - "checksums": { - "script": "sha256:", - "metadata": "sha256:" - } -} -``` +1. **Scaffold:** `node scripts/scaffold.cjs [company]` -- generates script, metadata, and stub schema +2. **Implement:** Write login + data collection logic (see CREATE.md for auth patterns, extraction strategies, and reference connectors) +3. **Validate structure:** `node scripts/validate-connector.cjs /-playwright.js` +4. **Test:** `node run-connector.cjs /-playwright.js --inputs '{"username":"x","password":"y"}'` +5. **Validate output:** `node scripts/validate-connector.cjs /-playwright.js --check-result ~/.dataconnect/last-result.json` +6. **Register:** `node scripts/register.cjs /-playwright.js` -- adds entry + checksums to `registry.json` --- @@ -279,10 +148,10 @@ await page.setProgress({ }); ``` -- `phase.step` / `phase.total` — drives the step indicator ("Step 1 of 3") -- `phase.label` — short label for the current phase -- `message` — human-readable progress text -- `count` — numeric count for progress tracking +- `phase.step` / `phase.total` -- drives the step indicator ("Step 1 of 3") +- `phase.label` -- short label for the current phase +- `message` -- human-readable progress text +- `count` -- numeric count for progress tracking --- @@ -327,25 +196,31 @@ This copies your connector files to `~/.dataconnect/connectors/` where the runni ### Standalone test runner -You can test connectors directly without starting the full DataConnect app using the included test runner. It spawns the playwright-runner as a child process and pretty-prints the JSON protocol messages. +Test connectors without the full DataConnect app. The runner spawns playwright-runner as a child process and outputs JSON protocol messages. **Prerequisites:** The [DataConnect](https://github.com/vana-com/data-connect) repo cloned alongside this one (the runner auto-detects `../data-dt-app/playwright-runner`), or set `PLAYWRIGHT_RUNNER_DIR` to point to the playwright-runner directory. ```bash -# Run a connector in headed mode (browser visible — default) -node test-connector.cjs ./linkedin/linkedin-playwright.js +# Run a connector (headed by default, browser visible) +node run-connector.cjs ./linkedin/linkedin-playwright.js + +# Colored, human-readable output +node run-connector.cjs ./linkedin/linkedin-playwright.js --pretty + +# Pre-supply credentials +node run-connector.cjs ./linkedin/linkedin-playwright.js --inputs '{"username":"x","password":"y"}' # Run headless (no visible browser) -node test-connector.cjs ./linkedin/linkedin-playwright.js --headless +node run-connector.cjs ./linkedin/linkedin-playwright.js --headless # Override the initial URL -node test-connector.cjs ./linkedin/linkedin-playwright.js --url https://linkedin.com/feed +node run-connector.cjs ./linkedin/linkedin-playwright.js --url https://linkedin.com/feed # Save result to a custom path (default: ./connector-result.json) -node test-connector.cjs ./linkedin/linkedin-playwright.js --output ./my-result.json +node run-connector.cjs ./linkedin/linkedin-playwright.js --output ./my-result.json ``` -The runner reads the connector's sibling `.json` metadata to automatically resolve the `connectURL`. In headed mode, the browser stays visible throughout the run (the `goHeadless()` call becomes a no-op), making it easy to observe what the connector is doing. +The runner reads the connector's sibling `.json` metadata to resolve the `connectURL`. In headed mode, `goHeadless()` becomes a no-op so the browser stays visible throughout. --- @@ -356,9 +231,9 @@ The runner reads the connector's sibling `.json` metadata to automatically resol 1. Fork this repo 2. Create a branch: `git checkout -b feat/-connector` 3. Add your files in `connectors//`: - - `-playwright.js` — connector script - - `-playwright.json` — metadata - - `schemas/..json` — data schema (optional but encouraged) + - `-playwright.js` -- connector script + - `-playwright.json` -- metadata + - `schemas/..json` -- data schema (optional but encouraged) 4. Test locally using the instructions above 5. Update `registry.json` with your connector entry and checksums 6. Open a pull request @@ -374,14 +249,14 @@ The runner reads the connector's sibling `.json` metadata to automatically resol ### Guidelines -- **Credentials stay on-device.** Connectors run in a local browser. Never send tokens or passwords to external servers. -- **Use `page.setProgress()`** to report progress. Users should see what's happening during long exports. +- **Credentials stay on-device.** Never send tokens or passwords to external servers. +- **Use `page.setProgress()`** to report progress during long exports. - **Include `exportSummary`** in the result. The UI uses it to display what was collected. -- **Handle errors gracefully.** Use `page.setData('error', message)` and provide clear error messages. -- **Prefer API fetch over DOM scraping** when the platform has usable APIs. APIs are more stable than DOM structure. -- **Avoid relying on CSS class names** — many platforms obfuscate them. Use structural selectors, heading text, and content heuristics instead. -- **Rate-limit API calls.** Add `page.sleep()` between requests to avoid triggering rate limits. -- **Test pagination edge cases** — empty results, single page, large datasets. +- **Handle errors.** Use `page.setData('error', message)` with clear error messages. +- **Prefer API fetch over DOM scraping.** APIs are more stable than DOM structure. +- **Avoid obfuscated CSS class names.** Use structural selectors, heading text, and content heuristics. +- **Rate-limit API calls.** Add `page.sleep()` between requests. +- **Test pagination edge cases** -- empty results, single page, large datasets. ### Registry checksums @@ -403,4 +278,4 @@ DataConnect fetches `registry.json` from this repo on app startup and during `np 3. Verify SHA-256 checksums match 4. Write to local `connectors/` directory -This enables OTA connector updates without requiring a full app release. +This enables OTA connector updates without a full app release. diff --git a/max/max-playwright.js b/max/max-playwright.js new file mode 100644 index 0000000..9ceeb4a --- /dev/null +++ b/max/max-playwright.js @@ -0,0 +1,508 @@ +/** + * Max Connector (Playwright) — Hybrid API + DOM Extraction + * + * Phase 1 (Browser, visible if login needed): + * - Detects login via persistent browser session + * - If not logged in, shows browser for user to log in + * - Captures JWT token and profile via network interception + * + * Phase 2 (Background — browser closed): + * - Uses captured token with httpFetch to call Max CMS API + * - Extracts profile, viewing history, continue watching, My List + * - Falls back to DOM extraction if API capture fails + */ + +const state = { + token: null, + profileId: null, + profile: null, + isComplete: false, +}; + +// ─── Helpers ──────────────────────────────────────────── + +const stripBidi = (s) => (s || '').replace(/[\u2066\u2067\u2068\u2069\u202A-\u202E\u200E\u200F]/g, '').trim(); + +const cleanTitle = (raw) => { + let t = stripBidi(raw); + t = t.replace(/\.\s*\d+\s+z\s+\d+\.?.*$/, ''); + t = t.replace(/\.\s*\d+\s+of\s+\d+\.?.*$/, ''); + t = t.replace(/^Oglądaj\s+/, ''); + const seasonMatch = t.match(/^(.+?)\.\s*Sezon\s+\d+,\s*odcinek\s+\d+:\s*(.+)$/); + if (seasonMatch) t = seasonMatch[1] + ' — ' + seasonMatch[2]; + t = t.replace(/\.\s*Pozostało\s+.+$/, ''); + t = t.replace(/\.\s*Klasyfikacja:.+$/, ''); + t = t.replace(/\.\s*Premiera w\s+\d{4}$/, ''); + return t.trim(); +}; + +// ─── Login Detection ──────────────────────────────────── + +const checkLoginStatus = async () => { + try { + return await page.evaluate(` + (() => { + const path = window.location.pathname; + const host = window.location.hostname; + if (/\\/(signin|sign-in|login|auth)/.test(path)) return false; + if (!host.includes('max.com')) return false; + if (!!document.querySelector('input[type="password"]')) return false; + if (!!document.querySelector('input[type="email"]')) return false; + if (document.querySelector('[data-testid="default-avatar"]')) return true; + if (document.querySelector('img[alt*="profile" i]')) return true; + if (document.querySelector('nav') && path === '/') return true; + return false; + })() + `); + } catch (e) { + return false; + } +}; + +// ─── API Helpers ──────────────────────────────────────── + +const apiHeaders = () => ({ + 'Authorization': 'Bearer ' + state.token, + 'Accept': 'application/vnd.api+json', + 'x-hbo-profile-id': state.profileId || '', +}); + +const apiFetch = async (url) => { + const resp = await page.httpFetch(url, { headers: apiHeaders() }); + if (!resp.ok) return null; + return resp.json; +}; + +// ─── Scroll for DOM fallback ──────────────────────────── + +const scrollToLoadContent = async () => { + for (let i = 0; i < 10; i++) { + await page.evaluate(`window.scrollBy(0, 800)`); + await page.sleep(1500); + } + await page.evaluate(`window.scrollTo(0, 0)`); + await page.sleep(1000); +}; + +// ─── DOM Extraction (fallback) ────────────────────────── + +const extractFromDOM = async () => { + await page.setData('status', 'API capture failed, falling back to DOM extraction...'); + + await page.goto('https://play.max.com/'); + await page.sleep(5000); + await scrollToLoadContent(); + + const contentData = await page.evaluate(` + (() => { + const rails = []; + const allSections = document.querySelectorAll('section, [role="region"], [data-testid*="collection"]'); + for (const section of allSections) { + const heading = section.querySelector('h2, h3, [role="heading"], [data-testid*="title"]'); + const railTitle = heading ? (heading.textContent || '').trim() : ''; + if (!railTitle) continue; + const items = []; + const allLinks = section.querySelectorAll('a[href]'); + for (const link of allLinks) { + const href = link.getAttribute('href') || ''; + if (!href.startsWith('/') || href.includes('/settings') || href === '/') continue; + let title = ''; + const img = link.querySelector('img'); + if (img) title = (img.getAttribute('alt') || '').trim(); + if (!title) title = (link.getAttribute('aria-label') || '').trim(); + if (!title) continue; + let progress = 0; + const bars = link.querySelectorAll('[class*="progress" i] div, [class*="Progress" i] div'); + for (const bar of bars) { + const width = bar.style?.width; + if (width && width.includes('%')) { progress = parseInt(width) || 0; break; } + } + items.push({ title, href, progress }); + } + if (items.length > 0) rails.push({ title: railTitle, items }); + } + return rails; + })() + `); + + const CW_KW = ['oglądaj dalej', 'continue watching', 'kontynuuj']; + const FY_KW = ['dla ciebie', 'for you', 'because you watched']; + const LIVE_KW = ['na żywo', 'live', 'sport']; + + const continueWatching = []; + const recommendations = []; + + for (const rail of contentData) { + const rl = rail.title.toLowerCase(); + if (LIVE_KW.some(kw => rl.includes(kw))) continue; + const isCW = CW_KW.some(kw => rl.includes(kw)); + const isFY = FY_KW.some(kw => rl.includes(kw)); + for (const item of rail.items) { + const title = cleanTitle(item.title); + const href = item.href; + const idMatch = href.match(/\/(?:video\/watch\/)?([0-9a-f-]{36})/); + const entry = { title, url: 'https://play.max.com' + href, id: idMatch?.[1] || '' }; + if (isCW || item.progress > 0) continueWatching.push({ ...entry, progressPercent: item.progress }); + else if (isFY) recommendations.push(entry); + } + } + + return { continueWatching, recommendations, myList: [], source: 'dom' }; +}; + +// ─── Main Export Flow ─────────────────────────────────── + +(async () => { + const TOTAL_STEPS = 5; + + // ═══ STEP 1: Login + Token Capture ═══ + await page.setProgress({ + phase: { step: 1, total: TOTAL_STEPS, label: 'Login' }, + message: 'Navigating to Max...', + }); + + await page.captureNetwork({ urlPattern: 'token', key: 'token' }); + await page.captureNetwork({ urlPattern: 'profiles', key: 'profiles' }); + + await page.goto('https://play.max.com/'); + await page.sleep(4000); + + let loggedIn = await checkLoginStatus(); + + if (!loggedIn) { + await page.showBrowser('https://play.max.com/'); + await page.sleep(2000); + + await page.promptUser( + 'Please log in to Max. Click "Done" when you see the home page.', + async () => await checkLoginStatus(), + 3000 + ); + await page.sleep(3000); + } else { + await page.setData('status', 'Session restored from previous login'); + } + + // Wait a bit more for API calls to complete + await page.sleep(4000); + + // Capture token and profiles + const tokenResp = await page.getCapturedResponse('token'); + state.token = tokenResp?.data?.data?.attributes?.token; + + const profilesResp = await page.getCapturedResponse('profiles'); + const profiles = profilesResp?.data?.data || []; + state.profileId = profiles[0]?.id; + + // If no token, fall back to DOM + if (!state.token) { + await page.goHeadless(); + const domData = await extractFromDOM(); + const result = { + 'max.profile': {}, + 'max.continueWatching': { items: domData.continueWatching, total: domData.continueWatching.length }, + 'max.recommendations': { items: domData.recommendations, total: domData.recommendations.length }, + 'max.myList': { items: [], total: 0 }, + exportSummary: { + count: domData.continueWatching.length + domData.recommendations.length, + label: 'items', + details: domData.continueWatching.length + ' in progress, ' + domData.recommendations.length + ' recommended (DOM fallback)', + }, + timestamp: new Date().toISOString(), + version: '1.0.0-playwright', + platform: 'max', + }; + state.isComplete = true; + await page.setData('result', result); + await page.setData('status', 'Complete (DOM fallback)'); + return { success: true, data: result }; + } + + // ═══ Close browser for API access ═══ + await page.setData('status', 'Token captured, switching to API mode...'); + await page.closeBrowser(); + + // ═══ STEP 2: Fetch Profile ═══ + await page.setProgress({ + phase: { step: 2, total: TOTAL_STEPS, label: 'Profile' }, + message: 'Fetching profile...', + }); + + const userResp = await apiFetch('https://default.beam-emea.prd.api.hbomax.com/users/me'); + const userAttrs = userResp?.data?.attributes || {}; + + state.profile = { + name: [userAttrs.firstName, userAttrs.lastName].filter(Boolean).join(' ') || undefined, + email: userAttrs.email || undefined, + }; + + // Get profile names + for (const p of profiles) { + if (p.id === state.profileId && p.attributes?.name) { + state.profile.profileName = p.attributes.name; + } + } + + await page.setData('status', 'Profile: ' + (state.profile.name || state.profile.email || 'unknown')); + + // ═══ STEP 3: Fetch Home Data (viewing history + continue watching) ═══ + await page.setProgress({ + phase: { step: 3, total: TOTAL_STEPS, label: 'Watch history' }, + message: 'Fetching viewing data...', + }); + + const homeData = await apiFetch( + 'https://default.any-emea.prd.api.hbomax.com/cms/routes/home?include=default&decorators=viewingHistory,isFavorite,contentAction,badges&page[items.size]=50' + ); + + // Build a lookup of shows by ID + const showMap = new Map(); + const viewedItems = []; + const continueWatchingCollectionId = []; + + if (homeData?.included) { + for (const item of homeData.included) { + if (item.type === 'show') { + showMap.set(item.id, { + title: item.attributes?.name, + isFavorite: item.attributes?.isFavorite, + }); + } + if (item.type === 'collection') { + const name = item.attributes?.name || ''; + if (name.includes('continue-watching')) { + continueWatchingCollectionId.push(item.id); + } + } + } + + for (const item of homeData.included) { + if (item.type !== 'video') continue; + const vh = item.attributes?.viewingHistory; + if (!vh?.viewed && !vh?.completed && !vh?.position) continue; + + const showId = item.relationships?.show?.data?.id; + const show = showMap.get(showId); + + viewedItems.push({ + title: item.attributes?.name || item.attributes?.title, + showTitle: show?.title, + type: showId ? 'episode' : 'movie', + id: item.id, + showId, + seasonNumber: item.attributes?.seasonNumber, + episodeNumber: item.attributes?.numberInSeason || item.attributes?.numberInShow, + duration: item.attributes?.duration, + position: vh.position, + completed: vh.completed, + lastWatched: vh.lastReportedTimestamp, + url: 'https://play.max.com/video/watch/' + item.id, + }); + } + } + + await page.setData('status', viewedItems.length + ' items with viewing history from home page'); + + // ═══ STEP 4: Fetch additional collection pages for more history ═══ + await page.setProgress({ + phase: { step: 4, total: TOTAL_STEPS, label: 'Collections' }, + message: 'Fetching personalized collections...', + }); + + // Fetch "because you watched" and "for you" collections + const interestingCollections = []; + if (homeData?.included) { + for (const item of homeData.included) { + if (item.type !== 'collection') continue; + const name = item.attributes?.name || ''; + if (name.includes('for-you') || name.includes('because-you-watched') || name.includes('continue-watching')) { + interestingCollections.push({ id: item.id, name }); + } + } + } + + const recommendations = []; + for (const coll of interestingCollections) { + if (coll.name.includes('continue-watching')) continue; // Already have these from viewedItems + + const collData = await apiFetch( + 'https://default.any-emea.prd.api.hbomax.com/cms/collections/' + coll.id + + '?include=default&decorators=viewingHistory,isFavorite,contentAction,badges&page[items.size]=50' + ); + + if (collData?.included) { + for (const item of collData.included) { + if (item.type === 'show') { + showMap.set(item.id, { + title: item.attributes?.name, + isFavorite: item.attributes?.isFavorite, + }); + } + } + for (const item of collData.included) { + if (item.type !== 'show' && item.type !== 'video') continue; + const title = item.attributes?.name; + if (!title) continue; + if (recommendations.find(r => r.id === item.id)) continue; + + recommendations.push({ + title, + type: item.type === 'show' ? 'series' : 'episode', + id: item.id, + url: 'https://play.max.com/' + (item.type === 'show' ? 'show' : 'video/watch') + '/' + item.id, + source: coll.name.includes('because') ? 'because-you-watched' : 'for-you', + }); + + // Also check for viewing history on these items + const vh = item.attributes?.viewingHistory; + if (vh?.viewed || vh?.completed || vh?.position) { + if (!viewedItems.find(v => v.id === item.id)) { + const showId = item.relationships?.show?.data?.id; + const show = showMap.get(showId); + viewedItems.push({ + title: item.attributes?.name, + showTitle: show?.title, + type: showId ? 'episode' : (item.type === 'show' ? 'series' : 'movie'), + id: item.id, + showId, + duration: item.attributes?.duration, + position: vh.position, + completed: vh.completed, + lastWatched: vh.lastReportedTimestamp, + url: 'https://play.max.com/video/watch/' + item.id, + }); + } + } + } + } + + await page.sleep(300); + } + + await page.setData('status', viewedItems.length + ' viewed, ' + recommendations.length + ' recommended'); + + // ═══ STEP 5: Fetch My List ═══ + await page.setProgress({ + phase: { step: 5, total: TOTAL_STEPS, label: 'My List' }, + message: 'Fetching My List...', + }); + + const myList = []; + const myStuffData = await apiFetch( + 'https://default.any-emea.prd.api.hbomax.com/cms/routes/my-stuff?include=default&decorators=viewingHistory,isFavorite,contentAction,badges&page[items.size]=50' + ); + + if (myStuffData?.included) { + // Build show map from my-stuff data too + for (const item of myStuffData.included) { + if (item.type === 'show') { + showMap.set(item.id, { + title: item.attributes?.name, + isFavorite: item.attributes?.isFavorite, + }); + } + } + + // Find the "my-list" collection and its items + const myListCollItems = new Set(); + for (const item of myStuffData.included) { + if (item.type === 'collection' && (item.attributes?.name || '').includes('my-list')) { + const itemRefs = item.relationships?.items?.data || []; + for (const ref of itemRefs) myListCollItems.add(ref.id); + } + } + + // Extract shows that are in my-list collection or marked as favorite + for (const item of myStuffData.included) { + if (item.type !== 'show') continue; + const title = item.attributes?.name; + if (!title) continue; + + // Check if this show is in the my-list collection items + // The collectionItem references the show, so check if any collectionItem for this show is in myListCollItems + let inMyList = item.attributes?.isFavorite; + if (!inMyList) { + for (const ci of myStuffData.included) { + if (ci.type === 'collectionItem' && myListCollItems.has(ci.id)) { + const targetId = ci.relationships?.content?.data?.id; + if (targetId === item.id) { inMyList = true; break; } + } + } + } + + if (inMyList || myListCollItems.size === 0) { + myList.push({ + title, + type: 'series', + id: item.id, + url: 'https://play.max.com/show/' + item.id, + }); + } + } + + // Also check for viewed items in my-stuff + for (const item of myStuffData.included) { + if (item.type !== 'video') continue; + const vh = item.attributes?.viewingHistory; + if (!vh?.viewed && !vh?.completed && !vh?.position) continue; + if (viewedItems.find(v => v.id === item.id)) continue; + + const showId = item.relationships?.show?.data?.id; + const show = showMap.get(showId); + viewedItems.push({ + title: item.attributes?.name, + showTitle: show?.title, + type: showId ? 'episode' : 'movie', + id: item.id, + showId, + duration: item.attributes?.duration, + position: vh.position, + completed: vh.completed, + lastWatched: vh.lastReportedTimestamp, + url: 'https://play.max.com/video/watch/' + item.id, + }); + } + } + + // Sort viewed items by last watched date + viewedItems.sort((a, b) => { + if (!a.lastWatched) return 1; + if (!b.lastWatched) return -1; + return new Date(b.lastWatched) - new Date(a.lastWatched); + }); + + // ═══ Build Result ═══ + const result = { + 'max.profile': state.profile, + 'max.viewingHistory': { + items: viewedItems, + total: viewedItems.length, + }, + 'max.recommendations': { + items: recommendations, + total: recommendations.length, + }, + 'max.myList': { + items: myList, + total: myList.length, + }, + exportSummary: { + count: viewedItems.length + myList.length, + label: 'items', + details: [ + viewedItems.length + ' watched', + recommendations.length + ' recommended', + myList.length + ' in My List', + ].join(', '), + }, + timestamp: new Date().toISOString(), + version: '1.0.0-playwright', + platform: 'max', + }; + + state.isComplete = true; + await page.setData('result', result); + await page.setData('status', 'Complete! ' + result.exportSummary.details + ' for ' + (state.profile.name || 'Max user')); + + return { success: true, data: result }; +})(); diff --git a/max/max-playwright.json b/max/max-playwright.json new file mode 100644 index 0000000..9418631 --- /dev/null +++ b/max/max-playwright.json @@ -0,0 +1,36 @@ +{ + "id": "max-playwright", + "version": "1.1.0", + "name": "Max", + "company": "Warner Bros. Discovery", + "description": "Exports your Max (HBO Max) profile, viewing history with timestamps, recommendations, and saved list using Playwright browser automation.", + "connectURL": "https://play.max.com/", + "connectSelector": "[data-testid='default-avatar'], img[alt*='profile' i], nav", + "exportFrequency": "weekly", + "runtime": "playwright", + "scopes": [ + { + "scope": "max.profile", + "label": "Your Max profile", + "description": "Profile information including name and email" + }, + { + "scope": "max.viewingHistory", + "label": "Your viewing history", + "description": "Movies and shows you've watched with timestamps and progress" + }, + { + "scope": "max.recommendations", + "label": "Recommendations", + "description": "Personalized content recommendations from Max" + }, + { + "scope": "max.myList", + "label": "Your My List", + "description": "Titles saved to your My List / watchlist" + } + ], + "vectorize_config": { + "documents": "viewingHistory" + } +} diff --git a/registry.json b/registry.json index adb14da..ebdd055 100644 --- a/registry.json +++ b/registry.json @@ -122,6 +122,21 @@ "script": "sha256:d58629b152ad3471e27dfb4cbe33b84d74f426a7ff330754aacdc1ec89f72f6c", "metadata": "sha256:d3ff5230a6ebe6ff3f655546d834e7ed59f741febdc540dd2f89894b6e5baa7a" } + }, + { + "id": "max-playwright", + "company": "max", + "version": "1.1.0", + "name": "Max", + "description": "Exports your Max (HBO Max) profile, viewing history with timestamps, recommendations, and saved list using Playwright browser automation.", + "files": { + "script": "max/max-playwright.js", + "metadata": "max/max-playwright.json" + }, + "checksums": { + "script": "sha256:e98a660f8fb7dca74d4352748b83749161c723243dd54bae37c27e5c8c7cfcb9", + "metadata": "sha256:4ff3add88088e1b300d99ae7103c6eb4d8a80f3b43cd802783bda49c2dff7f37" + } } ] } diff --git a/run-connector.cjs b/run-connector.cjs new file mode 120000 index 0000000..f44adcf --- /dev/null +++ b/run-connector.cjs @@ -0,0 +1 @@ +skills/vana-connect/scripts/run-connector.cjs \ No newline at end of file diff --git a/schemas/max.myList.json b/schemas/max.myList.json new file mode 100644 index 0000000..dd5b2ed --- /dev/null +++ b/schemas/max.myList.json @@ -0,0 +1,29 @@ +{ + "name": "Max My List", + "version": "1.0.0", + "scope": "max.myList", + "dialect": "json", + "description": "Titles saved to your Max watchlist / My List", + "schema": { + "type": "object", + "properties": { + "items": { + "type": "array", + "items": { + "type": "object", + "properties": { + "title": { "type": "string" }, + "type": { "type": "string", "enum": ["movie", "series", "episode", "sport", "unknown"] }, + "id": { "type": "string" }, + "url": { "type": "string" } + }, + "required": ["title", "url"], + "additionalProperties": false + } + }, + "total": { "type": "number" } + }, + "required": ["items", "total"], + "additionalProperties": false + } +} diff --git a/schemas/max.profile.json b/schemas/max.profile.json new file mode 100644 index 0000000..6f82aac --- /dev/null +++ b/schemas/max.profile.json @@ -0,0 +1,16 @@ +{ + "name": "Max Profile", + "version": "1.0.0", + "scope": "max.profile", + "dialect": "json", + "description": "Basic Max profile information including name, email, and subscription plan", + "schema": { + "type": "object", + "properties": { + "name": { "type": "string" }, + "email": { "type": "string" }, + "plan": { "type": "string" } + }, + "additionalProperties": false + } +} diff --git a/schemas/max.recommendations.json b/schemas/max.recommendations.json new file mode 100644 index 0000000..e5292b0 --- /dev/null +++ b/schemas/max.recommendations.json @@ -0,0 +1,30 @@ +{ + "name": "Max Recommendations", + "version": "1.0.0", + "scope": "max.recommendations", + "dialect": "json", + "description": "Personalized content recommendations from Max based on viewing history", + "schema": { + "type": "object", + "properties": { + "items": { + "type": "array", + "items": { + "type": "object", + "properties": { + "title": { "type": "string" }, + "type": { "type": "string", "enum": ["movie", "series", "episode", "sport", "unknown"] }, + "id": { "type": "string" }, + "url": { "type": "string" }, + "source": { "type": "string" } + }, + "required": ["title", "url"], + "additionalProperties": false + } + }, + "total": { "type": "number" } + }, + "required": ["items", "total"], + "additionalProperties": false + } +} diff --git a/schemas/max.viewingHistory.json b/schemas/max.viewingHistory.json new file mode 100644 index 0000000..4640891 --- /dev/null +++ b/schemas/max.viewingHistory.json @@ -0,0 +1,37 @@ +{ + "name": "Max Viewing History", + "version": "1.0.0", + "scope": "max.viewingHistory", + "dialect": "json", + "description": "Movies and shows watched on Max with timestamps, progress position, and completion status", + "schema": { + "type": "object", + "properties": { + "items": { + "type": "array", + "items": { + "type": "object", + "properties": { + "title": { "type": "string" }, + "showTitle": { "type": "string" }, + "type": { "type": "string", "enum": ["movie", "series", "episode"] }, + "id": { "type": "string" }, + "showId": { "type": "string" }, + "seasonNumber": { "type": "number" }, + "episodeNumber": { "type": "number" }, + "duration": { "type": "number" }, + "position": { "type": "number" }, + "completed": { "type": "boolean" }, + "lastWatched": { "type": "string", "format": "date-time" }, + "url": { "type": "string" } + }, + "required": ["title", "id"], + "additionalProperties": false + } + }, + "total": { "type": "number" } + }, + "required": ["items", "total"], + "additionalProperties": false + } +} diff --git a/skills/vana-connect/CREATE.md b/skills/vana-connect/CREATE.md new file mode 100644 index 0000000..e7720cb --- /dev/null +++ b/skills/vana-connect/CREATE.md @@ -0,0 +1,304 @@ +# Creating a Connector + +Build a data connector for a platform that isn't in the registry yet. + +## Prerequisites + +Before starting, read these reference docs: + +- `reference/PAGE-API.md` -- full `page` object API +- `reference/PATTERNS.md` -- extraction pattern examples (REST, network capture, DOM scraping) + +## Connector Format + +Scripts are plain JavaScript (CJS), no imports, no require. The runner injects a `page` object. The script body must be an async IIFE preceded by a blank line (the runner matches `\n(async`). + +```javascript +(async () => { + // connector logic here + await page.setData('result', { 'platform.scope': data }); + return { success: true }; +})() +``` + +## Reference Connectors + +Use existing connectors as models. Match the pattern closest to your target platform: + +| Platform | Strategy | Notes | +|------------|------------------|------------------------------------------| +| Reddit | REST API fetch | OAuth-like endpoints, JSON responses | +| Twitter/X | Network capture | GraphQL via captureNetwork | +| Instagram | REST + GraphQL | Cookie auth, pagination | +| LinkedIn | REST API fetch | Voyager API, CSRF token required | +| Notion | REST API fetch | Internal API, bearer token from cookies | +| Spotify | REST API fetch | Well-documented public API | + +Look at existing connectors in `~/.dataconnect/connectors/` for working examples. + +--- + +## Step 1 -- Research the Platform + +Map the platform's login flow, data APIs, and auth mechanism before writing code. + +### Web search queries + +Run these (or similar) searches: + +- `" API endpoints"`, `" graphql endpoint"` +- `" internal API"`, `" developer API"` +- `" data export"`, `" GDPR data download"` +- `" scraper github"` -- open-source scrapers reveal known API patterns + +### What to identify + +- **Login URL** (e.g. `https://platform.com/login`) +- **Login form selectors** -- stable selectors for username, password, submit. Use `input[name="..."]`, `input[type="password"]`, `button[type="submit"]`. Note if login is multi-step. +- **Logged-in indicator** -- a CSS selector or API response confirming auth. Becomes `connectSelector` in metadata. +- **Data endpoints** -- REST, GraphQL, or (last resort) DOM scraping +- **Auth mechanism** -- cookies, CSRF tokens, bearer tokens, session storage +- **Rate limits** -- throttling rules, if known +- **Data categories** -- each becomes a `platform.scope` key (e.g. `reddit.profile`, `reddit.posts`) + +### Choose an extraction strategy (in preference order) + +1. **REST API fetch** -- platform has discoverable endpoints (most reliable) +2. **Network capture** -- platform uses GraphQL/XHR during page navigation (use `page.captureNetwork`) +3. **DOM scraping** -- no API, data only in rendered HTML (fragile, last resort) + +--- + +## Step 2 -- Scaffold the Connector + +Generate boilerplate: + +```bash +node scripts/scaffold.cjs [company] +``` + +This creates `{company}/{platform}-playwright.js`, `{company}/{platform}-playwright.json`, and a stub schema in `schemas/`. Edit these files to implement your connector. + +### Auth pattern + +Connectors must support two credential sources: + +1. **`process.env`** -- for automated/CI runs. Convention: `USER_LOGIN_` and `USER_PASSWORD_`. +2. **`page.requestInput()`** -- for interactive runs where env vars aren't set. This prompts the user through the agent. + +Try env first, fall back to requestInput: + +```javascript +let username = process.env.USER_LOGIN_PLATFORMNAME || ''; +let password = process.env.USER_PASSWORD_PLATFORMNAME || ''; + +if (!username || !password) { + const creds = await page.requestInput({ + message: 'Enter your Platform credentials', + schema: { + type: 'object', + properties: { + username: { type: 'string', title: 'Email or username' }, + password: { type: 'string', title: 'Password' } + }, + required: ['username', 'password'] + } + }); + username = creds.username; + password = creds.password; +} +``` + +### Login implementation + +```javascript +const loginStr = JSON.stringify(username); +const passStr = JSON.stringify(password); + +await page.goto('https://platform.com/login'); +await page.sleep(2000); + +await page.evaluate(` + (() => { + const u = document.querySelector('input[name="username"], input[type="email"]'); + const p = document.querySelector('input[type="password"]'); + if (u) { u.focus(); u.value = ${loginStr}; u.dispatchEvent(new Event('input', {bubbles:true})); } + if (p) { p.focus(); p.value = ${passStr}; p.dispatchEvent(new Event('input', {bubbles:true})); } + })() +`); +await page.sleep(500); +await page.evaluate(`document.querySelector('button[type="submit"]')?.click()`); +await page.sleep(3000); +``` + +**Platform-specific adaptations:** + +- **Multi-step login** (email page, then password page): split into two evaluate+sleep sequences with a navigation between them. +- **React/Vue apps** that ignore `.value =`: use the `nativeInputValueSetter` pattern: + ```javascript + const nativeInputValueSetter = Object.getOwnPropertyDescriptor( + window.HTMLInputElement.prototype, 'value' + ).set; + nativeInputValueSetter.call(input, ${loginStr}); + input.dispatchEvent(new Event('input', { bubbles: true })); + ``` +- **2FA**: use `page.requestInput()` to ask for the code. CAPTCHA cannot be automated -- exit with a clear error. + +### Key rules + +- **`page.evaluate()` takes a string**, not a function. Pass variables in via `JSON.stringify()`. +- **No obfuscated CSS classes** (`.x1lliihq`, `.css-1dbjc4n`). Use ARIA roles, data attributes, semantic HTML. +- **Rate-limit API calls** -- `page.sleep(300-1000)` between requests. +- **Handle errors** -- check `resp.ok`, wrap fetches in try-catch, use `page.setData('error', ...)` for failures. +- **Scoped result keys** -- `platform.scope` format (e.g. `spotify.playlists`). +- **Report progress** -- `page.setProgress({ phase, message })` for long operations. +- **Include exportSummary** -- `{ count, label, details }` in the result object. + +### Page API quick reference + +``` +page.goto(url) Navigate +page.evaluate(jsString) Run JS in browser, return result +page.sleep(ms) Wait +page.requestInput({ message, schema }) Ask user for data (credentials, 2FA) +page.setData(key, value) 'result' for data, 'error' for failures +page.setProgress({ phase, message }) Progress reporting +page.captureNetwork({ key, urlPattern }) Intercept network requests +page.getCapturedResponse(key) Retrieve captured response +page.screenshot() Base64 JPEG screenshot +``` + +Full API: `reference/PAGE-API.md` + +--- + +## Step 3 -- Validate Structure + +Run the structural validator: + +```bash +node scripts/validate-connector.cjs /-playwright.js +``` + +This checks metadata fields, script patterns (IIFE, login detection, evaluate syntax, scoped keys), and schema files. + +Fix all errors before testing. Re-run after each fix until `"valid": true`. + +--- + +## Step 4 -- Test + +Run the connector headless via `run-connector.cjs`: + +```bash +node scripts/run-connector.cjs /-playwright.js [start-url] +``` + +To pre-supply credentials without env vars: + +```bash +node scripts/run-connector.cjs /-playwright.js --inputs '{"username":"x","password":"y"}' +``` + +**Exit codes:** 0 = success, 1 = error, 2 = needs input (missing credentials), 3 = legacy auth (not batch-compatible). + +On success, the result is written to `~/.dataconnect/last-result.json`. + +--- + +## Step 5 -- Validate Output + +Check that the collected data is correct: + +```bash +node scripts/validate-connector.cjs /-playwright.js --check-result ~/.dataconnect/last-result.json +``` + +This verifies: + +- All declared scopes are present and non-empty +- Array fields have items +- exportSummary has count, label, details +- timestamp, version, platform metadata present +- Data conforms to JSON schemas (type checking, required fields) + +All errors must pass before the connector is considered done. + +--- + +## Step 6 -- Iterate + +If testing or validation fails, fix and retry. **Maximum 3 attempts** before stopping to ask for help. + +### Diagnosis guide + +| Symptom | Likely cause | Fix | +|---------|-------------|-----| +| Login failed | Wrong selectors, multi-step login not handled | Inspect form with `page.screenshot()`, try `nativeInputValueSetter` | +| API returns 401/403 | Missing CSRF token, wrong auth header | Check cookies/headers in network capture | +| Empty data | API response shape differs from expected | Log raw response with `page.setData('status', '[DEBUG] ' + JSON.stringify(raw))` | +| Schema violations | Data shape mismatch | Fix schema or add a transform step | +| Script crash | Missing await, null ref, bad evaluate string | Check for function refs in evaluate, null checks | +| Timeout (5 min) | Infinite loop or missing await | Add progress logging to find where it stalls | + +### Debugging tips + +- Use `page.screenshot()` to see what the browser shows at any point. +- Add `page.setData('status', '[DEBUG] ...')` to log intermediate values. +- Test a single API call in isolation with `page.evaluate` + `fetch` before building the full flow. +- Check that the platform's API doesn't require specific headers (CSRF, content-type, custom auth). + +--- + +## Step 7 -- Generate Schemas (optional) + +If your schemas are rough drafts, refine them from actual test output: + +```bash +node scripts/generate-schemas.cjs ~/.dataconnect/last-result.json [output-dir] +``` + +This infers types and structure from actual data and writes draft schema files. Review and adjust before publishing. + +--- + +## Step 8 -- Register + +Add the connector to the registry with checksums: + +```bash +node scripts/register.cjs /-playwright.js +``` + +This computes `sha256` checksums for the script and metadata, then adds an entry to `registry.json`. + +--- + +## Success Criteria + +A connector is complete when all of these hold: + +- [ ] Metadata JSON has all required fields (id, version, name, company, description, connectURL, connectSelector, runtime, scopes) +- [ ] Script tries `process.env` credentials first, falls back to `page.requestInput()` +- [ ] Script handles login failure with a clear error message +- [ ] Script handles 2FA via `page.requestInput()` (if the platform uses it) +- [ ] `node scripts/validate-connector.cjs` exits 0 (structure valid) +- [ ] `node scripts/run-connector.cjs` completes without errors +- [ ] `node scripts/validate-connector.cjs --check-result` exits 0 (output valid) +- [ ] All declared scopes produce non-empty, schema-compliant data +- [ ] exportSummary has accurate count and details + +--- + +## Contributing Back + +To submit the connector upstream: + +1. All validation passes (Steps 3 and 5). +2. Run `node scripts/register.cjs /-playwright.js` to add the registry entry with checksums. +3. Required files: + - `/-playwright.js` -- connector script + - `/-playwright.json` -- metadata + - `schemas/..json` -- one per scope + - Updated `registry.json` +4. Open a PR against the connectors repo. Include the validation report output and a summary of what data the connector collects. diff --git a/skills/vana-connect/RECIPES.md b/skills/vana-connect/RECIPES.md new file mode 100644 index 0000000..60a0b4d --- /dev/null +++ b/skills/vana-connect/RECIPES.md @@ -0,0 +1,201 @@ +# Recipes + +What to do with collected data. Each recipe starts from `~/.dataconnect/last-result.json`. + +--- + +## User Profile Generation + +Build a profile from connected data that changes how your agent behaves. + +### What to read + +- `*.profile` -- identity, bio, settings. High signal, read first. +- `*.memories` -- saved context, preferences. High signal. +- `*.conversations` -- large. Sample 20-50 recent entries, don't read all of them. +- `*.repositories`, `*.playlists`, `*.posts` -- interests and activity patterns. + +### Profile structure + +Target 2,000-4,000 characters. Every line should change agent behavior. Cut generic filler. + +```markdown +# User Profile + +## Identity +- Name, location, timezone, languages + +## Professional +- Role, industry, skills, current projects + +## Knowledge & Expertise +- Expert domains, learning interests, tools/technologies + +## Communication Style +- Response preferences, technical depth, tone, pet peeves + +## Interests +- Core interests, values, media preferences + +## Data Sources +- Connected: [platforms] +- Last updated: [date] +- Confidence: [notes on data quality] +``` + +### Presenting and saving + +Show the profile before saving. Ask what to change. + +> "Based on your [platform] data: +> +> [profile] +> +> Anything to change before I save it?" + +Where to save: +- **Claude Code:** User memory or CLAUDE.md +- **OpenClaw/Kimi:** `USER.md` in the agent's workspace +- **Generic:** `~/.dataconnect/user-profile.md` + +When adding a new platform to an existing profile, merge. Don't overwrite. + +--- + +## Personal Knowledge Base + +Extract facts from conversations and memories into a searchable index. + +### Approach + +1. Read `*.conversations` and `*.memories` from the result. +2. Extract discrete facts, preferences, and decisions. One per line. +3. Group by topic (work, health, finance, hobbies, etc.). +4. Store in a format your agent can search: embeddings DB, markdown files, or structured JSON. + +### Example output + +```markdown +# Knowledge Base (from ChatGPT, 2026-03-10) + +## Work +- Building a Rust CLI tool for log analysis (project started Feb 2026) +- Prefers async/await over callbacks +- Uses PostgreSQL for most projects, SQLite for prototypes + +## Health +- Tracks sleep with Oura Ring +- Runs 3x/week, targeting sub-20 5K + +## Finance +- Budgets with YNAB +- Investing in index funds, no individual stocks +``` + +### Tips + +- Deduplicate across platforms. The same fact may appear in ChatGPT memories and LinkedIn profile. +- Date-stamp entries so stale facts can be pruned. +- Keep facts atomic. One claim per line, easy to update or delete. + +--- + +## Data Backup & Export + +Export personal data to portable formats. + +### Flat JSON + +The result file is already JSON. Copy it: + +```bash +cp ~/.dataconnect/last-result.json ~/backups/github-export-2026-03-10.json +``` + +### CSV (for tabular data) + +For array-shaped scopes like repositories, posts, or connections: + +```javascript +// Example: extract repos to CSV +const data = require('./last-result.json'); +const repos = data['github.repositories'] || []; +const header = 'name,language,stars,url'; +const rows = repos.map(r => `${r.name},${r.language || ''},${r.stars || 0},${r.url}`); +console.log([header, ...rows].join('\n')); +``` + +### Periodic backups + +Run the connector on a schedule (cron, agent heartbeat, etc.) and timestamp each export: + +```bash +node run-connector.cjs +cp ~/.dataconnect/last-result.json ~/backups/-$(date +%Y-%m-%d).json +``` + +--- + +## Cross-Platform Synthesis + +Combine data from multiple platforms. + +### Approach + +1. Connect platforms one at a time. Each run produces a separate result file. +2. Before each run, copy the previous result: `cp ~/.dataconnect/last-result.json ~/.dataconnect/-result.json` +3. After all platforms are connected, read all result files and synthesize. + +### What cross-referencing reveals + +- **ChatGPT + GitHub:** What you ask about vs. what you actually build. +- **Spotify + YouTube:** Full media consumption profile. +- **LinkedIn + GitHub:** Professional identity vs. side projects. +- **Instagram + Spotify:** Lifestyle and taste patterns. +- **Shop + YNAB:** Spending tracked from both sides. + +### Tips + +- Look for contradictions. LinkedIn says "Python expert" but GitHub repos are all TypeScript. The profile should reflect reality. +- Weight recent activity higher than old data. +- Note which platforms contributed which facts (provenance). + +--- + +## Activity Analytics + +Analyze patterns in collected data. + +### Examples + +**Conversation topics (ChatGPT):** +- Count conversations by topic/category +- Track what subjects come up most frequently +- Identify knowledge gaps (repeated questions on the same topic) + +**Listening habits (Spotify):** +- Top genres, artists, decades +- Listening time distribution +- Playlist evolution over time + +**Coding patterns (GitHub):** +- Language distribution across repos +- Commit frequency and active hours +- Most-starred vs. most-committed projects + +**Purchase patterns (Shop):** +- Spending by category +- Purchase frequency +- Brand preferences + +### Output format + +Structure analytics as a summary the agent can reference: + +```markdown +## Activity Summary (GitHub, 2026-03-10) +- 47 repositories: 60% TypeScript, 25% Rust, 15% Python +- Most active: chain-reaction (142 commits in 30 days) +- Stars received: 23 total, 18 on chain-reaction +- Typical commit hours: 9-11am, 9pm-midnight +``` diff --git a/skills/vana-connect/SETUP.md b/skills/vana-connect/SETUP.md new file mode 100644 index 0000000..55e1d46 --- /dev/null +++ b/skills/vana-connect/SETUP.md @@ -0,0 +1,48 @@ +# Connect -- Setup + +Skip if `~/.dataconnect/playwright-runner/index.cjs` and `~/.dataconnect/run-connector.cjs` both exist. + +## Prerequisites + +- Node.js v18+ +- Git + +## Install playwright-runner + Chromium + +```bash +mkdir -p ~/.dataconnect/connectors +cd ~/.dataconnect + +git clone --depth 1 --filter=blob:none --sparse \ + https://github.com/vana-com/data-connect.git _data-connect +cd _data-connect && git sparse-checkout set playwright-runner +cp -r playwright-runner ../playwright-runner +cd .. && rm -rf _data-connect +cd ~/.dataconnect/playwright-runner && npm install +npx playwright install --with-deps chromium +``` + +## Install run-connector.cjs + +```bash +curl -sL https://raw.githubusercontent.com/vana-com/data-connectors/main/skills/vana-connect/scripts/run-connector.cjs \ + > ~/.dataconnect/run-connector.cjs +``` + +## Verify + +```bash +ls ~/.dataconnect/playwright-runner/index.cjs ~/.dataconnect/run-connector.cjs +``` + +Both files should exist. + +## File Locations + +| Path | Purpose | +|------|---------| +| `~/.dataconnect/playwright-runner/` | Runner process | +| `~/.dataconnect/run-connector.cjs` | Batch-mode runner wrapper | +| `~/.dataconnect/connectors/` | Connector scripts | +| `~/.dataconnect/browser-profiles/` | Persistent sessions (cookies) | +| `~/.dataconnect/last-result.json` | Most recent result | diff --git a/skills/vana-connect/SKILL.md b/skills/vana-connect/SKILL.md new file mode 100644 index 0000000..1d63099 --- /dev/null +++ b/skills/vana-connect/SKILL.md @@ -0,0 +1,92 @@ +--- +name: vana-connect +description: > + Connect personal data from any web platform using browser automation. + Use when: (1) user wants to connect a data source like ChatGPT, Instagram, + Spotify, or any platform, (2) user says "connect my [platform]", + (3) user wants to generate or update their profile from connected data. + Also triggers on: "create a connector for [platform]". +--- + +# Connect + +Connect personal data from web platforms using local browser automation. + +## Setup + +If `~/.dataconnect/playwright-runner/index.cjs` or `~/.dataconnect/run-connector.cjs` does not exist, read and follow `SETUP.md` (co-located with this file) first. + +## Flow + +### 1. Find a connector + +```bash +curl -s https://raw.githubusercontent.com/vana-com/data-connectors/main/registry.json +``` + +Search the `connectors` array for the requested platform. If found, download the script: + +```bash +BASE_URL="https://raw.githubusercontent.com/vana-com/data-connectors/main" +mkdir -p ~/.dataconnect/connectors/{company} +curl -s "$BASE_URL/{script_path}" > ~/.dataconnect/connectors/{script_path} +``` + +**If no connector exists for the platform,** read `CREATE.md` and follow it to build one. Then continue from step 2 with the newly created connector. + +### 2. Read the connector + +Before running, read the connector script to understand: +- What URL it starts from (`page.goto()` or `connectURL` in metadata) +- Whether it uses `requestInput` (batch-compatible) or `showBrowser`/`promptUser` (legacy) +- What data it collects + +### 3. Run it + +```bash +node ~/.dataconnect/run-connector.cjs [start-url] +node ~/.dataconnect/run-connector.cjs [start-url] --inputs '{"username":"x","password":"y"}' +``` + +**Stdout** is line-delimited JSON: + +| type | meaning | action | +|------|---------|--------| +| `need-input` | Connector needs credentials or 2FA | Ask user, re-run with `--inputs` | +| `legacy-auth` | Legacy auth, can't run headless | See legacy section | +| `result` | Data saved to `resultPath` | Read the file | +| `error` | Failure | Report to user | + +**Exit codes:** 0 = success, 2 = needs input, 3 = legacy auth, 1 = error. + +### 4. Handle auth + +1. Check if `~/.dataconnect/browser-profiles/{script-filename}/` exists -- try without `--inputs` first (session may still be valid) +2. If exit 2 (`need-input`): ask user for the requested fields, re-run with `--inputs` +3. If exit 2 again (2FA): re-run with **all** previously-supplied inputs **plus** the new one: `--inputs '{"username":"...","password":"...","code":"..."}'`. Each run starts a fresh browser -- prior inputs are not remembered. + +**TOTP codes expire in ~30 seconds.** Re-run immediately after receiving a code. + +**Sessions persist.** Cookies saved in browser profiles last days to weeks. + +#### Legacy connectors + +Exit code 3 means the connector uses `showBrowser`/`promptUser` instead of `requestInput`: + +1. Try without `--inputs` -- if a browser profile exists, login may be skipped. +2. Check for a migrated version on the `main` branch. +3. Write a login script to establish a session, then run the stock connector. + +### 5. Use the data + +On success, collected data is at `~/.dataconnect/last-result.json`. Keys vary by connector (e.g. `github.profile`, `chatgpt.conversations`). + +See `RECIPES.md` for use cases: user profile generation, personal knowledge base, data backup, cross-platform synthesis, activity analytics. + +## Rules + +1. **Ask before saving** -- no writes to user profile without approval +2. **Never log credentials** -- no echo, print, or output of secrets +3. **One platform at a time** +4. **Check session first** -- try without credentials if a browser profile exists +5. **Read connectors before running them** diff --git a/skills/vana-connect/reference/PAGE-API.md b/skills/vana-connect/reference/PAGE-API.md new file mode 100644 index 0000000..5502bb0 --- /dev/null +++ b/skills/vana-connect/reference/PAGE-API.md @@ -0,0 +1,177 @@ +# Page API Reference + +The `page` object is injected as a global in connector scripts. It is NOT raw Playwright — it's a custom API provided by the DataConnect Playwright runner. + +## Methods + +### Navigation & Browser Control + +#### `page.goto(url)` +Navigate to a URL. +```javascript +await page.goto('https://www.linkedin.com/feed/'); +``` + +#### `page.showBrowser(url?)` +Switch to headed mode (visible browser window). Optionally navigate to a URL. +```javascript +await page.showBrowser('https://platform.com/login'); +``` + +#### `page.goHeadless()` +Switch to headless mode (browser disappears). Call this after login is confirmed, before data extraction. +```javascript +await page.goHeadless(); +``` + +#### `page.closeBrowser()` +Close the browser entirely. Use when you're done with browser interactions but still need the process alive for HTTP work. + +#### `page.sleep(ms)` +Wait for a specified number of milliseconds. +```javascript +await page.sleep(2000); // wait 2 seconds +``` + +### JavaScript Execution + +#### `page.evaluate(jsString)` +Execute JavaScript in the browser context and return the result. **Takes a string, not a function.** + +To pass variables from the connector scope into the browser context, use `JSON.stringify()`: + +```javascript +// Simple evaluation +const title = await page.evaluate(`document.title`); + +// With interpolated variables +const endpoint = '/api/me'; +const endpointStr = JSON.stringify(endpoint); +const data = await page.evaluate(` + (async () => { + const resp = await fetch(${endpointStr}, { credentials: 'include' }); + return await resp.json(); + })() +`); + +// DOM inspection +const isLoggedIn = await page.evaluate(` + (() => { + return !!document.querySelector('.logged-in-indicator'); + })() +`); +``` + +### Data Communication + +#### `page.setData(key, value)` +Send data to the host app. Three key types: + +| Key | Purpose | +|-----|---------| +| `'status'` | Display a status message in the UI | +| `'error'` | Report an error (stops execution) | +| `'result'` | Send the final export result | + +```javascript +await page.setData('status', 'Fetching profile...'); +await page.setData('error', 'Failed to fetch data: ' + errorMessage); +await page.setData('result', resultObject); +``` + +#### `page.setProgress({phase, message, count})` +Structured progress reporting for the UI. + +```javascript +await page.setProgress({ + phase: { step: 1, total: 3, label: 'Fetching profile' }, + message: 'Downloaded 50 of 200 items...', + count: 50, +}); +``` + +- `phase.step` / `phase.total` — drives the step indicator ("Step 1 of 3") +- `phase.label` — short label for the current phase +- `message` — human-readable progress text +- `count` — numeric count for progress tracking + +### User Interaction + +#### `page.promptUser(message, checkFn, pollInterval)` +Show a prompt to the user and poll a check function until it returns truthy. + +```javascript +await page.promptUser( + 'Please log in to LinkedIn. Click "Done" when you see your feed.', + async () => { + return await checkLoginStatus(); + }, + 2000 // poll every 2 seconds +); +``` + +The prompt displays in the DataConnect UI with a "Done" button. The `checkFn` is called every `pollInterval` ms. When it returns truthy, the prompt is dismissed and execution continues. + +### Network Capture + +#### `page.captureNetwork({urlPattern, bodyPattern, key})` +Register a network request interceptor. Captures responses matching the criteria. + +```javascript +await page.captureNetwork({ + urlPattern: 'instagram.com/graphql/query', // URL substring match + bodyPattern: 'User', // Response body substring match + key: 'user_data' // Retrieval key +}); +``` + +#### `page.getCapturedResponse(key)` +Retrieve a captured network response. Returns the parsed JSON body or `null`. + +```javascript +const response = await page.getCapturedResponse('user_data'); +if (response) { + const userData = response.data.user; +} +``` + +#### `page.clearNetworkCaptures()` +Clear all registered network captures. + +## Important Notes + +1. **`page.evaluate()` takes a STRING, not a function.** This is the most common mistake. The string is evaluated in the browser context. + +2. **Variable passing:** You cannot use closures. Variables from the connector scope must be serialized: + ```javascript + // WRONG — variable not available in browser context + const url = '/api/data'; + await page.evaluate(`fetch(url)`); + + // CORRECT — interpolate the value + const url = '/api/data'; + await page.evaluate(`fetch(${JSON.stringify(url)})`); + ``` + +3. **Async evaluate:** Wrap async code in an IIFE: + ```javascript + const data = await page.evaluate(` + (async () => { + const resp = await fetch('/api/data'); + return await resp.json(); + })() + `); + ``` + +4. **Error handling in evaluate:** Always try-catch inside the evaluated string: + ```javascript + const result = await page.evaluate(` + (async () => { + try { + const resp = await fetch('/api/data', { credentials: 'include' }); + if (!resp.ok) return { _error: resp.status }; + return await resp.json(); + } catch(e) { return { _error: e.message }; } + })() + `); + ``` diff --git a/skills/vana-connect/reference/PATTERNS.md b/skills/vana-connect/reference/PATTERNS.md new file mode 100644 index 0000000..8896db3 --- /dev/null +++ b/skills/vana-connect/reference/PATTERNS.md @@ -0,0 +1,297 @@ +# Data Extraction Patterns + +Three primary patterns for extracting data, in order of preference. + +## Pattern A: REST API Fetch (Preferred) + +**Use when:** The platform has REST/JSON APIs accessible from a logged-in browser session. +**Example:** LinkedIn, ChatGPT + +### How to discover APIs: +1. Open the platform in Chrome +2. Open DevTools > Network tab +3. Filter by XHR/Fetch +4. Browse the platform — watch for JSON responses +5. Note the endpoint URLs, required headers, auth mechanisms + +### Implementation — API fetch helper: + +```javascript +const fetchApi = async (endpoint) => { + const endpointStr = JSON.stringify(endpoint); + try { + return await page.evaluate(` + (async () => { + try { + // Get CSRF token from cookies (platform-specific) + const csrfToken = (document.cookie.match(/JSESSIONID="?([^";]+)/) || [])[1] || ''; + const resp = await fetch(${endpointStr}, { + headers: { 'csrf-token': csrfToken }, + credentials: 'include' + }); + if (!resp.ok) return { _error: resp.status }; + return await resp.json(); + } catch(e) { return { _error: e.message }; } + })() + `); + } catch (e) { + return { _error: e.message || String(e) }; + } +}; + +// Usage +const data = await fetchApi('/api/v1/me'); +if (data._error) { + await page.setData('error', 'API failed: ' + data._error); + return; +} +``` + +### Auth token extraction (ChatGPT pattern): + +Some platforms embed auth tokens in the page source: + +```javascript +const token = await page.evaluate(` + (() => { + try { + // Look for auth tokens in script tags + const bootstrapEl = document.getElementById('client-bootstrap'); + if (bootstrapEl) { + const data = JSON.parse(bootstrapEl.textContent); + return data.accessToken || null; + } + return null; + } catch { return null; } + })() +`); + +// Use token in API calls +const tokenStr = JSON.stringify(token); +const data = await page.evaluate(` + (async () => { + const resp = await fetch('/backend-api/conversations', { + headers: { 'Authorization': 'Bearer ' + ${tokenStr} } + }); + return await resp.json(); + })() +`); +``` + +### Parallel API calls: + +```javascript +const [profileData, positionsData] = await Promise.all([ + fetchApi('/api/profile'), + fetchApi('/api/positions'), +]); +``` + +### Paginated API calls: + +```javascript +const allItems = []; +let offset = 0; +const limit = 50; + +while (true) { + await page.setProgress({ + phase: { step: 2, total: 3, label: 'Fetching items' }, + message: `Fetched ${allItems.length} items so far...`, + count: allItems.length, + }); + + const data = await fetchApi(`/api/items?offset=${offset}&limit=${limit}`); + if (data._error) break; + + const items = data.elements || []; + allItems.push(...items); + + if (items.length < limit) break; // last page + offset += limit; + await page.sleep(500); // rate limiting +} +``` + +--- + +## Pattern B: Network Capture + +**Use when:** Platform uses GraphQL/XHR that fires during page navigation. You want to capture the raw response. +**Example:** Instagram, Spotify + +### Implementation: + +```javascript +// 1. Register capture BEFORE navigating +await page.captureNetwork({ + urlPattern: 'instagram.com/graphql/query', // URL substring to match + bodyPattern: 'PolarisProfilePage', // Response body substring + key: 'profile_data' // Key for retrieval +}); + +// 2. Navigate to trigger the request +await page.goto('https://www.instagram.com/username/'); +await page.sleep(3000); // wait for requests to fire + +// 3. Retrieve captured response +const response = await page.getCapturedResponse('profile_data'); +if (response) { + const user = response.data?.user; + // Process user data... +} +``` + +### Multiple captures: + +```javascript +// Register multiple captures +await page.captureNetwork({ + urlPattern: '/graphql', + bodyPattern: 'UserProfile', + key: 'user' +}); +await page.captureNetwork({ + urlPattern: '/graphql', + bodyPattern: 'UserMedia', + key: 'media' +}); + +await page.goto('https://platform.com/profile'); +await page.sleep(3000); + +const userResp = await page.getCapturedResponse('user'); +const mediaResp = await page.getCapturedResponse('media'); +``` + +--- + +## Pattern C: DOM Scraping (Fallback) + +**Use when:** No API available, data only exists in rendered HTML. +**Example:** GitHub + +### Selector strategy (critical): + +**DO use:** +- Tag structure: `main > section`, `h2`, `p` +- ARIA roles: `[role="main"]`, `[aria-label*="repositories"]` +- Data attributes: `[data-testid="profile-name"]`, `[itemprop="name"]` +- Semantic HTML: `nav`, `article`, `header`, `aside` +- Text content matching via JS + +**DO NOT use:** +- Obfuscated class names: `.x1lliihq`, `.css-1dbjc4n` +- Frequently-changing class names: `.feed-shared-update-v2__description` +- Framework-generated IDs: `#react-root-0-3-1` + +### Implementation: + +```javascript +const profileData = await page.evaluate(` + (() => { + // Use stable selectors + const name = (document.querySelector('span[itemprop="name"]')?.textContent || '').trim(); + const bio = (document.querySelector('div[data-bio-text]')?.textContent || '').trim(); + + // Use structural selectors as fallback + const stats = document.querySelectorAll('nav a span'); + const followers = stats.length > 0 ? stats[0]?.textContent?.trim() : ''; + + return { name, bio, followers }; + })() +`); +``` + +### Pagination via DOM: + +```javascript +const allItems = []; +let pageNum = 1; +const maxPages = 20; + +while (pageNum <= maxPages) { + await page.goto(`https://platform.com/items?page=${pageNum}`); + await page.sleep(1500); + + const items = await page.evaluate(` + (() => { + const rows = document.querySelectorAll('[data-testid="item-row"]'); + return Array.from(rows).map(row => ({ + title: (row.querySelector('h3')?.textContent || '').trim(), + url: row.querySelector('a')?.href || '', + })); + })() + `); + + if (!items || items.length === 0) break; + allItems.push(...items); + + // Check for next page + const hasNext = await page.evaluate(`!!document.querySelector('a[rel="next"]')`); + if (!hasNext) break; + + pageNum++; + await page.sleep(500); +} +``` + +--- + +## Common Patterns + +### Login detection: + +```javascript +const checkLoginStatus = async () => { + try { + return await page.evaluate(` + (() => { + // Check for login form (NOT logged in) + const hasLoginForm = !!document.querySelector('input[type="password"]') || + !!document.querySelector('form[action*="login"]'); + if (hasLoginForm) return false; + + // Check for challenge/2FA pages + const url = window.location.href; + if (url.includes('/challenge') || url.includes('/checkpoint')) return false; + + // Check for logged-in indicators + const isLoggedIn = !!document.querySelector('LOGGED_IN_SELECTOR'); + return isLoggedIn; + })() + `); + } catch (e) { + return false; + } +}; +``` + +### Dismissing popups/modals: + +```javascript +// Dismiss cookie banners, upgrade prompts, etc. +await page.evaluate(` + (() => { + const dismissSelectors = [ + 'button[aria-label="Close"]', + 'button[aria-label="Dismiss"]', + '[data-testid="close-button"]', + ]; + for (const sel of dismissSelectors) { + const btn = document.querySelector(sel); + if (btn) { btn.click(); break; } + } + })() +`); +await page.sleep(500); +``` + +### Safe text extraction: + +```javascript +// Always guard against null/undefined +const getText = (selector) => `(document.querySelector('${selector}')?.textContent || '').trim()`; + +const name = await page.evaluate(getText('h1.profile-name')); +``` diff --git a/skills/vana-connect/scripts/generate-schemas.cjs b/skills/vana-connect/scripts/generate-schemas.cjs new file mode 100644 index 0000000..ecb7ca4 --- /dev/null +++ b/skills/vana-connect/scripts/generate-schemas.cjs @@ -0,0 +1,78 @@ +#!/usr/bin/env node +/** + * generate-schemas.cjs — Draft JSON schemas from connector output. + * + * Usage: node generate-schemas.cjs [output-dir] + * + * Reads a connector result file, infers schemas from each scoped key, + * writes draft schema files. The agent should review and adjust. + * + * Defaults: output-dir = ./schemas + */ + +const fs = require('fs'); +const path = require('path'); + +const resultPath = process.argv[2]; +const platform = process.argv[3]; +const outputDir = process.argv[4] || './schemas'; + +if (!resultPath || !platform) { + console.error('Usage: node generate-schemas.cjs [output-dir]'); + process.exit(1); +} + +const result = JSON.parse(fs.readFileSync(resultPath, 'utf8')); +const metadataKeys = new Set(['exportSummary', 'timestamp', 'version', 'platform']); + +function inferType(value) { + if (value === null || value === undefined) return { type: 'string' }; + if (Array.isArray(value)) { + if (value.length === 0) return { type: 'array', items: {} }; + return { type: 'array', items: inferType(value[0]) }; + } + if (typeof value === 'object') { + const props = {}; + const required = []; + for (const [k, v] of Object.entries(value)) { + props[k] = inferType(v); + required.push(k); + } + return { type: 'object', properties: props, required }; + } + if (typeof value === 'number') { + return Number.isInteger(value) ? { type: 'integer' } : { type: 'number' }; + } + if (typeof value === 'boolean') return { type: 'boolean' }; + return { type: 'string' }; +} + +fs.mkdirSync(outputDir, { recursive: true }); + +let count = 0; +for (const [key, value] of Object.entries(result)) { + if (metadataKeys.has(key)) continue; + if (!key.includes('.')) continue; + + const scope = key; + const schema = { + name: `${platform} ${scope.split('.')[1]}`, + version: '1.0.0', + scope, + dialect: 'json', + description: `Draft schema for ${scope} — review and adjust.`, + schema: inferType(value), + }; + + const outPath = path.join(outputDir, `${scope}.json`); + fs.writeFileSync(outPath, JSON.stringify(schema, null, 2) + '\n'); + console.log(` ${outPath}`); + count++; +} + +if (count === 0) { + console.error('No scoped keys found in result. Expected keys like "platform.scope".'); + process.exit(1); +} + +console.log(`\n${count} draft schema(s) generated. Review before publishing.`); diff --git a/skills/vana-connect/scripts/register.cjs b/skills/vana-connect/scripts/register.cjs new file mode 100644 index 0000000..42f091b --- /dev/null +++ b/skills/vana-connect/scripts/register.cjs @@ -0,0 +1,85 @@ +#!/usr/bin/env node +/** + * register.cjs — Add a connector to registry.json with checksums. + * + * Usage: node register.cjs [registry-path] + * + * Reads the connector script and metadata JSON (same name, .json), + * computes SHA-256 checksums, and adds/updates the entry in registry.json. + * + * Defaults: registry-path = ./registry.json (repo root) + */ + +const fs = require('fs'); +const path = require('path'); +const crypto = require('crypto'); + +const connectorPath = process.argv[2]; +const registryPath = process.argv[3] || './registry.json'; + +if (!connectorPath) { + console.error('Usage: node register.cjs [registry-path]'); + process.exit(1); +} + +const scriptPath = path.resolve(connectorPath); +const metadataPath = scriptPath.replace(/\.js$/, '.json'); + +if (!fs.existsSync(scriptPath)) { + console.error(`Script not found: ${scriptPath}`); + process.exit(1); +} +if (!fs.existsSync(metadataPath)) { + console.error(`Metadata not found: ${metadataPath}`); + process.exit(1); +} + +const metadata = JSON.parse(fs.readFileSync(metadataPath, 'utf8')); +const scriptHash = crypto.createHash('sha256').update(fs.readFileSync(scriptPath)).digest('hex'); +const metadataHash = crypto.createHash('sha256').update(fs.readFileSync(metadataPath)).digest('hex'); + +// Compute paths relative to registry location +const registryDir = path.dirname(path.resolve(registryPath)); +const relScript = path.relative(registryDir, scriptPath); +const relMetadata = path.relative(registryDir, metadataPath); + +const entry = { + id: metadata.id || path.basename(scriptPath, '.js'), + company: metadata.company || path.basename(path.dirname(scriptPath)), + version: metadata.version || '1.0.0', + name: metadata.name || '', + description: metadata.description || '', + files: { + script: relScript, + metadata: relMetadata, + }, + checksums: { + script: `sha256:${scriptHash}`, + metadata: `sha256:${metadataHash}`, + }, +}; + +// Load or create registry +let registry; +if (fs.existsSync(registryPath)) { + registry = JSON.parse(fs.readFileSync(registryPath, 'utf8')); +} else { + registry = { version: '2.0.0', lastUpdated: '', baseUrl: '', connectors: [] }; +} + +// Update or add entry +const idx = registry.connectors.findIndex(c => c.id === entry.id); +if (idx >= 0) { + registry.connectors[idx] = entry; + console.log(`Updated: ${entry.id}`); +} else { + registry.connectors.push(entry); + console.log(`Added: ${entry.id}`); +} + +registry.lastUpdated = new Date().toISOString().split('T')[0] + 'T00:00:00Z'; + +fs.writeFileSync(registryPath, JSON.stringify(registry, null, 2) + '\n'); +console.log(`Registry: ${registryPath}`); +console.log(` script: sha256:${scriptHash}`); +console.log(` metadata: sha256:${metadataHash}`); diff --git a/skills/vana-connect/scripts/run-connector.cjs b/skills/vana-connect/scripts/run-connector.cjs new file mode 100644 index 0000000..53bbd6f --- /dev/null +++ b/skills/vana-connect/scripts/run-connector.cjs @@ -0,0 +1,295 @@ +#!/usr/bin/env node +/** + * run-connector.cjs — Run a data connector headlessly. + * + * Usage: node run-connector.cjs [start-url] [options] + * + * Options: + * --inputs '{"key":"val"}' Pre-supply credentials/2FA + * --output Result file path (default: ~/.dataconnect/last-result.json) + * --pretty Human-readable colored output instead of JSON + * --runner-dir Path to playwright-runner (auto-detected if not set) + * + * Exit codes: 0 success, 1 error, 2 needs input, 3 legacy auth unsupported. + * + * Default output is line-delimited JSON on stdout: + * need-input, legacy-auth, result, log, error + * + * With --pretty, output is colored human-readable text. + */ + +const { spawn } = require('child_process'); +const fs = require('fs'); +const path = require('path'); +const os = require('os'); + +const homedir = os.homedir(); + +// ─── Arg parsing ───────────────────────────────────────────── + +const rawArgs = process.argv.slice(2); +const positional = []; +let preSuppliedInputs = {}; +let pretty = false; +let outputPath = path.join(homedir, '.dataconnect', 'last-result.json'); +let runnerDir = null; + +for (let i = 0; i < rawArgs.length; i++) { + if (rawArgs[i] === '--inputs' && rawArgs[i + 1]) { + try { preSuppliedInputs = JSON.parse(rawArgs[++i]); } + catch (e) { console.error('Invalid --inputs JSON:', e.message); process.exit(1); } + } else if (rawArgs[i] === '--output' && rawArgs[i + 1]) { + outputPath = rawArgs[++i]; + } else if (rawArgs[i] === '--runner-dir' && rawArgs[i + 1]) { + runnerDir = rawArgs[++i]; + } else if (rawArgs[i] === '--pretty') { + pretty = true; + } else if (!rawArgs[i].startsWith('--')) { + positional.push(rawArgs[i]); + } +} + +const connectorPath = positional[0]; +const startUrl = positional[1] || 'about:blank'; + +if (!connectorPath) { + console.error('Usage: node run-connector.cjs [start-url] [--inputs \'{"key":"val"}\'] [--pretty]'); + process.exit(1); +} + +// ─── Pretty output helpers ─────────────────────────────────── + +const c = { + reset: '\x1b[0m', bold: '\x1b[1m', dim: '\x1b[2m', + red: '\x1b[31m', green: '\x1b[32m', yellow: '\x1b[33m', + blue: '\x1b[34m', magenta: '\x1b[35m', cyan: '\x1b[36m', gray: '\x1b[90m', +}; + +function prettyPrint(color, prefix, msg) { + const ts = new Date().toLocaleTimeString('en-US', { hour12: false }); + console.log(`${c.gray}${ts}${c.reset} ${color}${prefix}${c.reset} ${msg}`); +} + +function emit(obj) { + if (pretty) { + switch (obj.type) { + case 'need-input': + prettyPrint(c.magenta, '[input]', obj.message); + if (obj.schema?.properties) { + prettyPrint(c.dim, ' ', `Fields: ${Object.keys(obj.schema.properties).join(', ')}`); + } + break; + case 'legacy-auth': + prettyPrint(c.yellow, '[auth] ', obj.message); + break; + case 'result': + const size = fs.existsSync(obj.resultPath) + ? (fs.statSync(obj.resultPath).size / 1024).toFixed(1) + ' KB' + : ''; + prettyPrint(c.green, '[result]', `Saved to ${obj.resultPath} ${size ? `(${size})` : ''}`); + break; + case 'log': + prettyPrint(c.gray, '[log] ', obj.message || ''); + break; + case 'error': + prettyPrint(c.red, '[error]', obj.message || ''); + break; + } + } else { + console.log(JSON.stringify(obj)); + } +} + +// ─── Resolve runner ────────────────────────────────────────── + +function resolveRunnerDir() { + if (runnerDir) { + if (fs.existsSync(path.join(runnerDir, 'index.cjs'))) return runnerDir; + console.error(`Runner not found at: ${runnerDir}`); + process.exit(1); + } + + const candidates = [ + path.join(homedir, '.dataconnect', 'playwright-runner'), + process.env.PLAYWRIGHT_RUNNER_DIR, + path.resolve(__dirname, '..', '..', '..', 'data-dt-app', 'playwright-runner'), + ].filter(Boolean); + + for (const dir of candidates) { + if (fs.existsSync(path.join(dir, 'index.cjs'))) return dir; + } + + console.error('Could not find playwright-runner. Set --runner-dir or install via SETUP.md.'); + process.exit(1); +} + +// ─── Main ──────────────────────────────────────────────────── + +const resolvedRunnerDir = resolveRunnerDir(); +const runId = 'run-' + Date.now(); + +if (pretty) { + console.log(`${c.bold}run-connector${c.reset}`); + console.log(` ${c.cyan}Connector:${c.reset} ${connectorPath}`); + console.log(` ${c.cyan}URL:${c.reset} ${startUrl}`); + console.log(` ${c.cyan}Output:${c.reset} ${outputPath}`); + console.log(` ${c.cyan}Runner:${c.reset} ${resolvedRunnerDir}`); + console.log(''); +} + +const runner = spawn(process.execPath, ['index.cjs'], { + cwd: resolvedRunnerDir, + stdio: ['pipe', 'pipe', 'pipe'], +}); + +runner.stderr.on('data', (chunk) => { + if (pretty) { + for (const line of chunk.toString().split('\n').filter(l => l.trim())) { + prettyPrint(c.dim, '[runner]', line.replace('[PlaywrightRunner] ', '')); + } + } else { + process.stderr.write(chunk); + } +}); + +let buffer = ''; +const consumedFields = new Set(); +let waitingForUserDetected = false; + +runner.stdout.on('data', (chunk) => { + buffer += chunk.toString(); + const lines = buffer.split('\n'); + buffer = lines.pop(); + for (const line of lines) { + if (!line.trim()) continue; + try { handleMessage(JSON.parse(line)); } + catch (e) { process.stderr.write('[non-json] ' + line + '\n'); } + } +}); + +function handleMessage(msg) { + switch (msg.type) { + case 'ready': + runner.stdin.write(JSON.stringify({ + type: 'run', runId, + connectorPath: path.resolve(connectorPath), + url: startUrl, headless: true, allowHeaded: false, + }) + '\n'); + if (pretty) prettyPrint(c.green, '[ready]', 'Connected, starting connector...'); + break; + + case 'request-input': { + const { requestId, payload } = msg; + const schema = payload?.schema; + const fields = schema?.properties ? Object.keys(schema.properties) : []; + const available = fields.filter(f => f in preSuppliedInputs && !consumedFields.has(f)); + + if (available.length > 0) { + const data = {}; + for (const f of fields) { + if (f in preSuppliedInputs && !consumedFields.has(f)) { + data[f] = preSuppliedInputs[f]; + consumedFields.add(f); + } + } + runner.stdin.write(JSON.stringify({ + type: 'input-response', runId, requestId, data, + }) + '\n'); + } else { + const previouslyConsumed = [...consumedFields]; + emit({ + type: 'need-input', + message: payload?.message || 'Input required', + schema: schema || {}, + ...(previouslyConsumed.length > 0 && { previousInputs: previouslyConsumed }), + }); + runner.stdin.write(JSON.stringify({ + type: 'input-response', runId, requestId, + error: 'No pre-supplied input available', + }) + '\n'); + quitAndExit(2); + } + break; + } + + case 'log': + emit({ type: 'log', message: msg.message }); + break; + + case 'status': + if (msg.status === 'WAITING_FOR_USER' && !waitingForUserDetected) { + waitingForUserDetected = true; + emit({ + type: 'legacy-auth', + message: 'This connector uses legacy authentication (showBrowser/promptUser) ' + + 'which is not supported in batch mode. Either use a migrated connector ' + + 'that supports requestInput, or establish a session manually first.', + }); + quitAndExit(3); + } else if (msg.status === 'COMPLETE') { + quitGracefully(); + } else if (msg.status === 'ERROR') { + quitGracefully(); + } else if (pretty && typeof msg.status === 'object') { + const s = msg.status; + let detail = s.message || ''; + if (s.phase) detail += ` ${c.dim}(${s.phase.step}/${s.phase.total} ${s.phase.label || ''})${c.reset}`; + if (s.count !== undefined) detail += ` ${c.cyan}[${s.count} items]${c.reset}`; + prettyPrint(c.blue, '[status]', detail); + } + break; + + case 'data': + if (msg.key === 'result') { + fs.writeFileSync(outputPath, JSON.stringify(msg.value, null, 2)); + emit({ type: 'result', resultPath: outputPath }); + } else if (msg.key === 'error') { + emit({ type: 'error', message: String(msg.value) }); + } else if (pretty) { + const val = typeof msg.value === 'string' ? msg.value : JSON.stringify(msg.value); + if (val.startsWith('[DEBUG]')) { + prettyPrint(c.yellow, '[debug]', val.substring(8)); + } else { + prettyPrint(c.cyan, '[data] ', `${msg.key} = ${val}`); + } + } + break; + + case 'result': + fs.writeFileSync(outputPath, JSON.stringify(msg.data, null, 2)); + emit({ type: 'result', resultPath: outputPath }); + break; + + case 'error': + emit({ type: 'error', message: msg.message }); + break; + } +} + +function quitGracefully() { + runner.stdin.write(JSON.stringify({ type: 'quit' }) + '\n'); +} + +function quitAndExit(code) { + setTimeout(() => { + runner.stdin.write(JSON.stringify({ type: 'quit' }) + '\n'); + setTimeout(() => process.exit(code), 2000); + }, 500); +} + +runner.on('close', (code) => { + if (pretty) { + const elapsed = ((Date.now() - startTime) / 1000).toFixed(1); + if (code === 0) console.log(`\n${c.bold}${c.green}Done${c.reset} in ${elapsed}s`); + else console.log(`\n${c.bold}${c.red}Failed${c.reset} (exit code ${code}) in ${elapsed}s`); + } + process.exit(code || 0); +}); + +const startTime = Date.now(); + +// Timeout — kill after 5 minutes +setTimeout(() => { + emit({ type: 'error', message: 'Timeout after 5 minutes' }); + quitAndExit(1); +}, 300000); diff --git a/skills/vana-connect/scripts/scaffold.cjs b/skills/vana-connect/scripts/scaffold.cjs new file mode 100644 index 0000000..386dead --- /dev/null +++ b/skills/vana-connect/scripts/scaffold.cjs @@ -0,0 +1,72 @@ +#!/usr/bin/env node +/** + * scaffold.cjs — Hydrate templates to create connector boilerplate. + * + * Usage: node scaffold.cjs [company] [output-dir] + * + * Defaults: company = platform, output-dir = ~/.dataconnect/connectors + */ + +const fs = require('fs'); +const path = require('path'); + +const platform = process.argv[2]; +const company = process.argv[3] || platform; +const outputDir = process.argv[4] || path.join(require('os').homedir(), '.dataconnect', 'connectors'); + +if (!platform) { + console.error('Usage: node scaffold.cjs [company] [output-dir]'); + process.exit(1); +} + +const templateDir = path.join(__dirname, '..', 'templates'); +const dir = path.join(outputDir, company); + +if (fs.existsSync(path.join(dir, `${platform}-playwright.js`))) { + console.error(`${dir}/${platform}-playwright.js already exists.`); + process.exit(1); +} + +fs.mkdirSync(dir, { recursive: true }); + +const replacements = { + '{{platform}}': platform, + '{{PLATFORM_UPPER}}': platform.toUpperCase(), + '{{PLATFORM_NAME}}': platform.charAt(0).toUpperCase() + platform.slice(1), + '{{PLATFORM_URL}}': `https://${platform}.com`, + '{{LOGIN_URL}}': `https://${platform}.com/login`, + '{{LOGIN_FORM_SELECTOR}}': 'input[type="password"]', + '{{LOGGED_IN_SELECTOR}}': 'TODO_LOGGED_IN_SELECTOR', + '{{Company}}': company.charAt(0).toUpperCase() + company.slice(1), + '{{scope1}}': 'profile', + '{{scope1 description}}': 'User profile data', + '{{scope2}}': 'data', + '{{scope2 description}}': 'Platform data', + '{{API fetch / Network capture / DOM scraping}}': 'TODO', + '{{data description}}': `${platform} data`, + '{{CSS selector only visible when logged in}}': '', + '{{field name to vectorize for semantic search}}': '', + '{{What this scope contains}}': '', + '{{scope1 label}}': 'Profile', +}; + +function hydrate(content) { + for (const [key, val] of Object.entries(replacements)) { + content = content.split(key).join(val); + } + return content; +} + +const files = [ + { template: 'connector-script.js', output: `${platform}-playwright.js` }, + { template: 'connector-metadata.json', output: `${platform}-playwright.json` }, +]; + +for (const { template, output } of files) { + const src = fs.readFileSync(path.join(templateDir, template), 'utf8'); + const dest = path.join(dir, output); + fs.writeFileSync(dest, hydrate(src)); + console.log(` ${dest}`); +} + +console.log(`\nFill in: connectSelector, scopes, login selectors, data fetching logic.`); diff --git a/skills/vana-connect/scripts/validate-connector.cjs b/skills/vana-connect/scripts/validate-connector.cjs new file mode 100644 index 0000000..6edb6ed --- /dev/null +++ b/skills/vana-connect/scripts/validate-connector.cjs @@ -0,0 +1,479 @@ +#!/usr/bin/env node + +/** + * Connector Validator + * + * Validates connector files (structure) and optionally connector output (data quality). + * Returns machine-readable JSON for use by automated agents in the create-test-validate loop. + * + * Usage: + * node scripts/validate-connector.cjs + * node scripts/validate-connector.cjs --check-result ./connector-result.json + * + * Exit codes: + * 0 = all checks passed + * 1 = one or more checks failed + */ + +const fs = require('fs'); +const path = require('path'); + +// ─── Report Builder ───────────────────────────────────────── + +function createReport() { + const report = { + valid: true, + checks: [], + summary: { passed: 0, failed: 0, warnings: 0 }, + }; + + function check(name, passed, message, severity = 'error') { + report.checks.push({ name, passed, message, severity }); + if (passed) { + report.summary.passed++; + } else if (severity === 'error') { + report.summary.failed++; + report.valid = false; + } else { + report.summary.warnings++; + } + } + + return { report, check }; +} + +// ─── Simple JSON Schema Validator ─────────────────────────── + +function validateAgainstSchema(data, schema, prefix = '') { + const errors = []; + + if (data === null || data === undefined) { + errors.push(`${prefix || 'root'}: value is null/undefined`); + return errors; + } + + if (schema.type === 'object') { + if (typeof data !== 'object' || Array.isArray(data)) { + errors.push(`${prefix || 'root'}: expected object, got ${Array.isArray(data) ? 'array' : typeof data}`); + return errors; + } + + // Check required fields + if (schema.required) { + for (const field of schema.required) { + if (data[field] === undefined || data[field] === null) { + errors.push(`${prefix ? prefix + '.' : ''}${field}: missing required field`); + } + } + } + + // Check additionalProperties + if (schema.additionalProperties === false && schema.properties) { + const allowed = new Set(Object.keys(schema.properties)); + for (const key of Object.keys(data)) { + if (!allowed.has(key)) { + errors.push(`${prefix ? prefix + '.' : ''}${key}: unexpected field`); + } + } + } + + // Recursively check properties + if (schema.properties) { + for (const [key, propSchema] of Object.entries(schema.properties)) { + if (data[key] !== undefined && data[key] !== null) { + const propPrefix = prefix ? `${prefix}.${key}` : key; + errors.push(...validateAgainstSchema(data[key], propSchema, propPrefix)); + } + } + } + } else if (schema.type === 'array') { + if (!Array.isArray(data)) { + errors.push(`${prefix || 'root'}: expected array, got ${typeof data}`); + return errors; + } + // Validate first few items against items schema + if (schema.items && data.length > 0) { + const sampleSize = Math.min(data.length, 3); + for (let i = 0; i < sampleSize; i++) { + errors.push(...validateAgainstSchema(data[i], schema.items, `${prefix}[${i}]`)); + } + } + } else if (schema.type === 'string') { + if (typeof data !== 'string') { + errors.push(`${prefix}: expected string, got ${typeof data}`); + } + } else if (schema.type === 'number' || schema.type === 'integer') { + if (typeof data !== 'number') { + errors.push(`${prefix}: expected number, got ${typeof data}`); + } + } else if (schema.type === 'boolean') { + if (typeof data !== 'boolean') { + errors.push(`${prefix}: expected boolean, got ${typeof data}`); + } + } + + return errors; +} + +// ─── Metadata Validation ──────────────────────────────────── + +function validateMetadata(metadataPath, check) { + check('metadata_exists', fs.existsSync(metadataPath), + fs.existsSync(metadataPath) ? `Found: ${path.basename(metadataPath)}` : `Missing: ${metadataPath}`); + + if (!fs.existsSync(metadataPath)) return null; + + let metadata; + try { + metadata = JSON.parse(fs.readFileSync(metadataPath, 'utf-8')); + check('metadata_valid_json', true, 'Valid JSON'); + } catch (e) { + check('metadata_valid_json', false, 'Invalid JSON: ' + e.message); + return null; + } + + const required = ['id', 'version', 'name', 'company', 'description', 'connectURL', 'connectSelector', 'runtime']; + for (const field of required) { + const val = metadata[field]; + check(`metadata_${field}`, !!val, + val ? `${field} = "${String(val).substring(0, 60)}"` : `Missing required field: ${field}`); + } + + check('metadata_runtime_playwright', metadata.runtime === 'playwright', + metadata.runtime === 'playwright' ? 'runtime is "playwright"' : `Unexpected runtime: "${metadata.runtime}"`); + + try { + new URL(metadata.connectURL); + check('metadata_url_valid', true, `connectURL is valid: ${metadata.connectURL}`); + } catch { + check('metadata_url_valid', false, `connectURL is not a valid URL: "${metadata.connectURL}"`); + } + + check('metadata_connect_selector', metadata.connectSelector && metadata.connectSelector.length > 3, + metadata.connectSelector + ? `connectSelector = "${metadata.connectSelector.substring(0, 80)}"` + : 'connectSelector is empty or too short'); + + return metadata; +} + +// ─── Script Validation ────────────────────────────────────── + +function validateScript(scriptPath, check) { + check('script_exists', fs.existsSync(scriptPath), + fs.existsSync(scriptPath) ? `Found: ${path.basename(scriptPath)}` : `Missing: ${scriptPath}`); + + if (!fs.existsSync(scriptPath)) return ''; + + const script = fs.readFileSync(scriptPath, 'utf-8'); + + // IIFE pattern + check('script_iife', + /\(async\s*\(\)\s*=>\s*\{/.test(script), + 'Uses async IIFE wrapper: (async () => { ... })()'); + + // Login detection + check('script_login_detection', + /checkLogin|isLoggedIn|loginStatus|login.*detect/i.test(script), + 'Has login detection logic'); + + // Automated login: reads credentials from process.env + const hasEnvCredentials = /process\.env\.USER_LOGIN|process\.env\.USER_PASSWORD/i.test(script); + check('script_env_credentials', + hasEnvCredentials, + hasEnvCredentials + ? 'Reads credentials from process.env (automated login)' + : 'Does not read credentials from process.env — automated login requires USER_LOGIN_ and USER_PASSWORD_', + 'warning'); + + // Automated login: fills form programmatically + const hasFormFill = /\.value\s*=|nativeInputValueSetter/i.test(script); + check('script_automated_form_fill', + hasFormFill, + hasFormFill + ? 'Has automated form fill logic (sets input values)' + : 'No automated form fill detected — connector may require manual login', + hasEnvCredentials ? 'error' : 'warning'); + + // Manual login (legacy pattern — optional for auto-login connectors) + const hasShowBrowser = /page\.showBrowser/.test(script); + const hasPromptUser = /page\.promptUser/.test(script); + check('script_show_browser', hasShowBrowser, + hasShowBrowser + ? 'Has page.showBrowser() (manual login fallback)' + : 'No page.showBrowser() — OK if using automated login', + hasEnvCredentials ? 'warning' : 'error'); + check('script_prompt_user', hasPromptUser, + hasPromptUser + ? 'Has page.promptUser() (manual login fallback)' + : 'No page.promptUser() — OK if using automated login', + hasEnvCredentials ? 'warning' : 'error'); + + // Phase 2: Go headless + check('script_go_headless', + /page\.goHeadless/.test(script), + 'Calls page.goHeadless() before data collection', + 'warning'); + + // Result building + check('script_set_result', + /page\.setData\s*\(\s*['"]result['"]/.test(script), + 'Calls page.setData("result", ...) to return data'); + + // Error handling + check('script_error_handling', + /page\.setData\s*\(\s*['"]error['"]/.test(script), + 'Has error handling via page.setData("error", ...)', 'warning'); + + // Progress reporting + check('script_progress', + /page\.setProgress/.test(script), + 'Reports progress via page.setProgress()', 'warning'); + + // exportSummary + check('script_export_summary', + /exportSummary/.test(script), + 'Includes exportSummary in result'); + + // Common mistake: function reference in page.evaluate instead of string + // Look for page.evaluate( followed by ( or async without a backtick or quote + const evalLines = script.split('\n').filter(l => l.includes('page.evaluate')); + const badEvals = evalLines.filter(l => { + // Match page.evaluate( NOT followed by ` or ' or " + return /page\.evaluate\s*\(\s*(?:async\s+)?\(/.test(l) && + !/page\.evaluate\s*\(\s*['"`]/.test(l); + }); + check('script_evaluate_uses_strings', + badEvals.length === 0, + badEvals.length === 0 + ? 'page.evaluate() uses string arguments (correct)' + : `CRITICAL: ${badEvals.length} page.evaluate() call(s) may use function references instead of strings. page.evaluate() takes a JS string.`); + + // Obfuscated CSS selectors (e.g., .x1lliihq, .css-1dbjc4n) + const obfuscatedMatches = script.match(/['"]\.(?:[a-z]{1,3}[0-9][a-z0-9_-]{4,}|css-[a-z0-9]+)['"]/g) || []; + check('script_no_obfuscated_selectors', + obfuscatedMatches.length === 0, + obfuscatedMatches.length === 0 + ? 'No obfuscated CSS selectors detected' + : `Found ${obfuscatedMatches.length} potentially obfuscated selector(s): ${obfuscatedMatches.slice(0, 3).join(', ')}`, + 'warning'); + + // Extract scoped result keys from the script + const scopeKeyPattern = /['"]([\w-]+\.[\w-]+)['"]\s*:/g; + const metaKeys = new Set(['exports.default', 'module.exports', 'console.log', 'window.location', + 'process.env', 'e.message', 'resp.status', 'data.user', 'error.message']); + const scopeKeys = new Set(); + let m; + while ((m = scopeKeyPattern.exec(script)) !== null) { + if (!metaKeys.has(m[1]) && !m[1].startsWith('com.')) { + scopeKeys.add(m[1]); + } + } + + check('script_scoped_keys', + scopeKeys.size > 0, + scopeKeys.size > 0 + ? `Found scoped keys: ${[...scopeKeys].join(', ')}` + : 'No scoped result keys (platform.scope) found — new connectors should use scoped keys like "platform.scope"', + scopeKeys.size > 0 ? 'error' : 'warning'); + + return script; +} + +// ─── Schema Validation ────────────────────────────────────── + +function validateSchemas(metadata, connectorDir, check) { + const schemasDir = path.resolve(connectorDir, '..', 'schemas'); + + if (!metadata?.scopes || !Array.isArray(metadata.scopes)) { + check('schemas_declared', false, + 'No scopes array in metadata — cannot validate schemas. Add scopes to metadata or create schemas manually.', + 'warning'); + return; + } + + for (const scope of metadata.scopes) { + const scopeName = scope.scope || scope.name; + if (!scopeName) continue; + + const schemaPath = path.join(schemasDir, `${scopeName}.json`); + check(`schema_exists_${scopeName}`, + fs.existsSync(schemaPath), + fs.existsSync(schemaPath) + ? `Schema found: schemas/${scopeName}.json` + : `Schema missing: schemas/${scopeName}.json`); + + if (fs.existsSync(schemaPath)) { + try { + const schema = JSON.parse(fs.readFileSync(schemaPath, 'utf-8')); + const hasStructure = schema.name && schema.scope && schema.schema; + check(`schema_structure_${scopeName}`, + hasStructure, + hasStructure + ? `Schema has required fields (name, scope, schema)` + : 'Schema missing required top-level fields (name, scope, schema)'); + } catch (e) { + check(`schema_json_${scopeName}`, false, `Schema is not valid JSON: ${e.message}`); + } + } + } +} + +// ─── Output Validation ────────────────────────────────────── + +function validateOutput(resultPath, metadata, connectorDir, check) { + check('result_exists', fs.existsSync(resultPath), + fs.existsSync(resultPath) ? `Found: ${path.basename(resultPath)}` : `Missing: ${resultPath}`); + + if (!fs.existsSync(resultPath)) return; + + let result; + try { + result = JSON.parse(fs.readFileSync(resultPath, 'utf-8')); + check('result_valid_json', true, 'Valid JSON'); + } catch (e) { + check('result_valid_json', false, 'Invalid JSON: ' + e.message); + return; + } + + // Identify scoped keys (keys containing '.' that aren't metadata) + const metaKeyNames = new Set(['exportSummary', 'timestamp', 'version', 'platform']); + const scopedKeys = Object.keys(result).filter(k => k.includes('.') && !metaKeyNames.has(k)); + + check('result_has_scopes', scopedKeys.length > 0, + scopedKeys.length > 0 + ? `${scopedKeys.length} scope(s): ${scopedKeys.join(', ')}` + : 'No scoped keys found in result'); + + // Validate each scope has data + let totalItems = 0; + for (const key of scopedKeys) { + const data = result[key]; + const isEmpty = data === null || data === undefined || + (typeof data === 'object' && Object.keys(data).length === 0); + + check(`result_${key}_not_empty`, !isEmpty, + isEmpty ? `Scope "${key}" is empty` : `Scope "${key}" has data`); + + // Count items in arrays + if (data && typeof data === 'object') { + for (const [field, value] of Object.entries(data)) { + if (Array.isArray(value)) { + totalItems += value.length; + check(`result_${key}_${field}_count`, value.length > 0, + `${key}.${field}: ${value.length} items`, + value.length === 0 ? 'warning' : 'error'); + } + } + } + } + + // Metadata fields + check('result_export_summary', !!result.exportSummary, + result.exportSummary + ? `exportSummary: ${JSON.stringify(result.exportSummary)}` + : 'Missing exportSummary'); + + if (result.exportSummary) { + check('result_summary_count', typeof result.exportSummary.count === 'number' && result.exportSummary.count >= 0, + `count: ${result.exportSummary.count}`); + check('result_summary_label', !!result.exportSummary.label, + `label: "${result.exportSummary.label || ''}"`); + check('result_summary_details', !!result.exportSummary.details, + `details: "${result.exportSummary.details || ''}"`, 'warning'); + } + + check('result_timestamp', !!result.timestamp, + result.timestamp ? `timestamp: ${result.timestamp}` : 'Missing timestamp'); + check('result_version', !!result.version, + result.version ? `version: ${result.version}` : 'Missing version'); + check('result_platform', !!result.platform, + result.platform ? `platform: ${result.platform}` : 'Missing platform'); + + // Schema compliance for each scope + const schemasDir = path.resolve(connectorDir, '..', 'schemas'); + for (const key of scopedKeys) { + const schemaPath = path.join(schemasDir, `${key}.json`); + if (!fs.existsSync(schemaPath)) continue; + + try { + const schemaFile = JSON.parse(fs.readFileSync(schemaPath, 'utf-8')); + if (schemaFile.schema) { + const errors = validateAgainstSchema(result[key], schemaFile.schema); + check(`result_schema_${key}`, errors.length === 0, + errors.length === 0 + ? `${key} conforms to schema` + : `${key} schema violations: ${errors.slice(0, 5).join('; ')}${errors.length > 5 ? ` (+${errors.length - 5} more)` : ''}`); + } + } catch (e) { + check(`result_schema_${key}`, false, `Schema validation error: ${e.message}`, 'warning'); + } + } + + // Sanity check: if we expected scopes from metadata, check they all appeared + if (metadata?.scopes && Array.isArray(metadata.scopes)) { + for (const scope of metadata.scopes) { + const scopeName = scope.scope || scope.name; + if (scopeName) { + check(`result_expected_scope_${scopeName}`, + scopedKeys.includes(scopeName), + scopedKeys.includes(scopeName) + ? `Expected scope "${scopeName}" present in output` + : `Expected scope "${scopeName}" missing from output`); + } + } + } +} + +// ─── Main ─────────────────────────────────────────────────── + +function main() { + const args = process.argv.slice(2); + + if (args.length === 0 || args[0] === '--help' || args[0] === '-h') { + console.log(` +Connector Validator — validates structure and output of data connectors. + +Usage: + node scripts/validate-connector.cjs + node scripts/validate-connector.cjs --check-result + +Flags: + --check-result Also validate connector output data + --help, -h Show this help + +Output: JSON report to stdout. Exit code 0 = valid, 1 = invalid. +`); + process.exit(0); + } + + const connectorPath = path.resolve(args[0]); + let resultPath = null; + + for (let i = 1; i < args.length; i++) { + if (args[i] === '--check-result' && args[i + 1]) { + resultPath = path.resolve(args[++i]); + } + } + + const { report, check } = createReport(); + + // Derive paths + const connectorDir = path.dirname(connectorPath); + const metadataPath = connectorPath.replace(/\.js$/, '.json'); + + // Run validations + const metadata = validateMetadata(metadataPath, check); + validateScript(connectorPath, check); + validateSchemas(metadata, connectorDir, check); + + if (resultPath) { + validateOutput(resultPath, metadata, connectorDir, check); + } + + // Output + console.log(JSON.stringify(report, null, 2)); + process.exit(report.valid ? 0 : 1); +} + +main(); diff --git a/skills/vana-connect/templates/connector-metadata.json b/skills/vana-connect/templates/connector-metadata.json new file mode 100644 index 0000000..463ee6d --- /dev/null +++ b/skills/vana-connect/templates/connector-metadata.json @@ -0,0 +1,22 @@ +{ + "id": "{{name}}-playwright", + "version": "1.0.0", + "name": "{{Platform Name}}", + "company": "{{Company}}", + "iconURL": "icons/{{platform}}.svg", + "description": "Exports your {{data description}} using Playwright browser automation.", + "connectURL": "https://{{platform}}.com/login", + "connectSelector": "{{CSS selector only visible when logged in}}", + "exportFrequency": "weekly", + "runtime": "playwright", + "scopes": [ + { + "scope": "{{platform}}.{{scope1}}", + "label": "Your {{scope1 label}}", + "description": "{{What this scope contains}}" + } + ], + "vectorize_config": { + "documents": "{{field name to vectorize for semantic search}}" + } +} diff --git a/skills/vana-connect/templates/connector-script.js b/skills/vana-connect/templates/connector-script.js new file mode 100644 index 0000000..0d5d068 --- /dev/null +++ b/skills/vana-connect/templates/connector-script.js @@ -0,0 +1,198 @@ +/** + * {{PLATFORM_NAME}} Connector (Playwright) + * + * Exports: + * - {{platform}}.{{scope1}} — {{scope1 description}} + * - {{platform}}.{{scope2}} — {{scope2 description}} + * + * Extraction method: {{API fetch / Network capture / DOM scraping}} + */ + +// ─── Credentials ───────────────────────────────────────────── + +let PLATFORM_LOGIN = process.env.USER_LOGIN_{{PLATFORM_UPPER}} || ''; +let PLATFORM_PASSWORD = process.env.USER_PASSWORD_{{PLATFORM_UPPER}} || ''; + +// ─── Login Detection ───────────────────────────────────────── + +const checkLoginStatus = async () => { + try { + return await page.evaluate(` + (() => { + // Check for login form (means NOT logged in) + const hasLoginForm = !!document.querySelector('{{LOGIN_FORM_SELECTOR}}'); + if (hasLoginForm) return false; + + // Check for challenge/2FA pages + const url = window.location.href; + if (url.includes('/challenge') || url.includes('/checkpoint')) return false; + + // Check for logged-in indicators + const isLoggedIn = !!document.querySelector('{{LOGGED_IN_SELECTOR}}'); + return isLoggedIn; + })() + `); + } catch (e) { + return false; + } +}; + +// ─── Automated Login ───────────────────────────────────────── + +const performLogin = async () => { + const loginStr = JSON.stringify(PLATFORM_LOGIN); + const passwordStr = JSON.stringify(PLATFORM_PASSWORD); + + await page.goto('{{LOGIN_URL}}'); + await page.sleep(2000); + + // Fill and submit login form + await page.evaluate(` + (() => { + const emailInput = document.querySelector('input[name="username"], input[name="email"], input[type="email"]'); + const passwordInput = document.querySelector('input[name="password"], input[type="password"]'); + + if (emailInput) { + emailInput.focus(); + emailInput.value = ${loginStr}; + emailInput.dispatchEvent(new Event('input', { bubbles: true })); + emailInput.dispatchEvent(new Event('change', { bubbles: true })); + } + if (passwordInput) { + passwordInput.focus(); + passwordInput.value = ${passwordStr}; + passwordInput.dispatchEvent(new Event('input', { bubbles: true })); + passwordInput.dispatchEvent(new Event('change', { bubbles: true })); + } + })() + `); + await page.sleep(500); + + await page.evaluate(` + (() => { + const submitBtn = document.querySelector('button[type="submit"], input[type="submit"]'); + if (submitBtn) submitBtn.click(); + })() + `); + await page.sleep(3000); +}; + +// ─── Data Fetching Helpers ─────────────────────────────────── + +// For API-based connectors: +const fetchApi = async (endpoint) => { + const endpointStr = JSON.stringify(endpoint); + try { + return await page.evaluate(` + (async () => { + try { + const resp = await fetch(${endpointStr}, { + credentials: 'include' + }); + if (!resp.ok) return { _error: resp.status }; + return await resp.json(); + } catch(e) { return { _error: e.message }; } + })() + `); + } catch (e) { + return { _error: e.message || String(e) }; + } +}; + +// ─── Main Export Flow ──────────────────────────────────────── + +(async () => { + const TOTAL_STEPS = 3; + + // ═══ PHASE 1: Automated Login ═══ + await page.setData('status', 'Checking login status...'); + await page.goto('{{PLATFORM_URL}}'); + await page.sleep(2000); + + let isLoggedIn = await checkLoginStatus(); + + if (!isLoggedIn) { + // Try .env credentials first, fall back to requestInput + if (!PLATFORM_LOGIN || !PLATFORM_PASSWORD) { + const creds = await page.requestInput({ + message: 'Enter your {{PLATFORM_NAME}} credentials', + schema: { + type: 'object', + properties: { + username: { type: 'string', title: 'Email or username' }, + password: { type: 'string', title: 'Password' } + }, + required: ['username', 'password'] + } + }); + PLATFORM_LOGIN = creds.username; + PLATFORM_PASSWORD = creds.password; + } + await page.setData('status', 'Logging in...'); + await performLogin(); + await page.sleep(2000); + + isLoggedIn = await checkLoginStatus(); + if (!isLoggedIn) { + await page.sleep(3000); + isLoggedIn = await checkLoginStatus(); + } + if (!isLoggedIn) { + await page.setData('error', 'Automated login failed. Check credentials or login flow.'); + return; + } + await page.setData('status', 'Login successful'); + } else { + await page.setData('status', 'Session restored from previous login'); + } + + // ═══ PHASE 2: Data Collection ═══ + await page.goHeadless(); + + // ═══ STEP 1: Fetch primary data ═══ + await page.setProgress({ + phase: { step: 1, total: TOTAL_STEPS, label: 'Fetching data' }, + message: 'Loading {{scope1}} data...', + }); + + // TODO: Fetch data here + // const data = await fetchApi('/api/endpoint'); + // if (data._error) { + // await page.setData('error', 'Failed to fetch data: ' + data._error); + // return; + // } + + // ═══ STEP 2: Process data ═══ + await page.setProgress({ + phase: { step: 2, total: TOTAL_STEPS, label: 'Processing' }, + message: 'Processing data...', + }); + + // TODO: Transform raw data into scope-specific shapes + + // ═══ STEP 3: Build result ═══ + await page.setProgress({ + phase: { step: 3, total: TOTAL_STEPS, label: 'Finalizing' }, + message: 'Building export...', + }); + + const result = { + '{{platform}}.{{scope1}}': { + // TODO: scope1 data + }, + '{{platform}}.{{scope2}}': { + // TODO: scope2 data + }, + exportSummary: { + count: 0, // TODO: total item count + label: 'items', + details: 'X scope1 items, Y scope2 items', // TODO: breakdown + }, + timestamp: new Date().toISOString(), + version: '1.0.0-playwright', + platform: '{{platform}}', + }; + + await page.setData('result', result); + await page.setData('status', 'Complete! Exported ' + result.exportSummary.details); +})(); diff --git a/skills/vana-connect/templates/schema.json b/skills/vana-connect/templates/schema.json new file mode 100644 index 0000000..5f79127 --- /dev/null +++ b/skills/vana-connect/templates/schema.json @@ -0,0 +1,26 @@ +{ + "name": "{{Platform}} {{Scope Name}}", + "version": "1.0.0", + "scope": "{{platform}}.{{scope}}", + "dialect": "json", + "description": "{{Description of the exported data}}", + "schema": { + "type": "object", + "properties": { + "items": { + "type": "array", + "items": { + "type": "object", + "properties": { + "id": { "type": "string" }, + "title": { "type": "string" } + }, + "required": ["id", "title"], + "additionalProperties": false + } + } + }, + "required": ["items"], + "additionalProperties": false + } +} diff --git a/test-connector.cjs b/test-connector.cjs deleted file mode 100755 index 46730af..0000000 --- a/test-connector.cjs +++ /dev/null @@ -1,338 +0,0 @@ -#!/usr/bin/env node - -/** - * Standalone Connector Test Runner - * - * Runs a connector JS file directly using the playwright-runner without the Tauri app. - * Spawns the playwright-runner's `index.cjs` as a child process. - * - * Requires PLAYWRIGHT_RUNNER_DIR env var pointing to the playwright-runner directory, - * or auto-detects from common locations (../data-dt-app/playwright-runner). - * - * Usage: - * node test-connector.cjs ./linkedin/linkedin-playwright.js [options] - * - * Options: - * --headless Run without visible browser (default: headed) - * --url URL Override the initial URL (default: from metadata JSON) - * --output FILE Where to save result JSON (default: ./connector-result.json) - */ - -const { spawn } = require('child_process'); -const fs = require('fs'); -const path = require('path'); -const readline = require('readline'); -const os = require('os'); - -// ─── ANSI Colors ──────────────────────────────────────────── -const c = { - reset: '\x1b[0m', - bold: '\x1b[1m', - dim: '\x1b[2m', - red: '\x1b[31m', - green: '\x1b[32m', - yellow: '\x1b[33m', - blue: '\x1b[34m', - magenta: '\x1b[35m', - cyan: '\x1b[36m', - gray: '\x1b[90m', -}; - -function print(color, prefix, msg) { - const ts = new Date().toLocaleTimeString('en-US', { hour12: false }); - console.log(`${c.gray}${ts}${c.reset} ${color}${prefix}${c.reset} ${msg}`); -} - -// ─── Resolve Playwright Runner ────────────────────────────── -function resolveRunnerDir() { - // 1. Explicit env var - if (process.env.PLAYWRIGHT_RUNNER_DIR) { - const dir = process.env.PLAYWRIGHT_RUNNER_DIR; - if (fs.existsSync(path.join(dir, 'index.cjs'))) return dir; - console.error(`${c.red}PLAYWRIGHT_RUNNER_DIR set but index.cjs not found in: ${dir}${c.reset}`); - process.exit(1); - } - - // 2. Common locations relative to this script - const candidates = [ - // Sibling repo (same parent directory) - path.resolve(__dirname, '..', 'data-dt-app', 'playwright-runner'), - // Home directory common paths - path.join(os.homedir(), 'Documents', 'GitHub', 'data-dt-app', 'playwright-runner'), - path.join(os.homedir(), 'Documents', 'Github', 'data-dt-app', 'playwright-runner'), - path.join(os.homedir(), 'code', 'data-dt-app', 'playwright-runner'), - path.join(os.homedir(), 'src', 'data-dt-app', 'playwright-runner'), - ]; - - for (const dir of candidates) { - if (fs.existsSync(path.join(dir, 'index.cjs'))) return dir; - } - - console.error(`${c.red}Could not find playwright-runner. Set PLAYWRIGHT_RUNNER_DIR env var.${c.reset}`); - console.error(`${c.dim}Looked in:${c.reset}`); - for (const dir of candidates) { - console.error(` ${c.dim}${dir}${c.reset}`); - } - process.exit(1); -} - -// ─── CLI Argument Parsing ─────────────────────────────────── -function parseArgs() { - const args = process.argv.slice(2); - const parsed = { - connectorPath: null, - headless: false, - url: null, - output: './connector-result.json', - }; - - for (let i = 0; i < args.length; i++) { - if (args[i] === '--headless') { - parsed.headless = true; - } else if (args[i] === '--url' && args[i + 1]) { - parsed.url = args[++i]; - } else if (args[i] === '--output' && args[i + 1]) { - parsed.output = args[++i]; - } else if (args[i] === '--help' || args[i] === '-h') { - printUsage(); - process.exit(0); - } else if (!args[i].startsWith('--')) { - parsed.connectorPath = args[i]; - } - } - - if (!parsed.connectorPath) { - printUsage(); - process.exit(1); - } - - return parsed; -} - -function printUsage() { - console.log(` -${c.bold}Connector Test Runner${c.reset} - -${c.cyan}Usage:${c.reset} - node test-connector.cjs [options] - -${c.cyan}Options:${c.reset} - --headless Run without visible browser (default: headed) - --url URL Override the initial URL - --output FILE Result JSON path (default: ./connector-result.json) - --help, -h Show this help - -${c.cyan}Environment:${c.reset} - PLAYWRIGHT_RUNNER_DIR Path to the playwright-runner directory (auto-detected if not set) - -${c.cyan}Examples:${c.reset} - node test-connector.cjs ./linkedin/linkedin-playwright.js --headed - node test-connector.cjs ./linkedin/linkedin-playwright.js --url https://linkedin.com/feed -`); -} - -// ─── Metadata Resolution ──────────────────────────────────── -function loadMetadata(connectorPath) { - const metadataPath = connectorPath.replace(/\.js$/, '.json'); - if (!fs.existsSync(metadataPath)) { - return null; - } - try { - return JSON.parse(fs.readFileSync(metadataPath, 'utf-8')); - } catch (e) { - return null; - } -} - -// ─── Message Formatting ───────────────────────────────────── -function formatStatus(status) { - if (typeof status === 'string') { - switch (status) { - case 'COMPLETE': return `${c.bold}${c.green}COMPLETE${c.reset}`; - case 'ERROR': return `${c.bold}${c.red}ERROR${c.reset}`; - case 'STOPPED': return `${c.yellow}STOPPED${c.reset}`; - default: return `${c.blue}${status}${c.reset}`; - } - } - - if (typeof status === 'object') { - const type = status.type || ''; - const msg = status.message || ''; - const phase = status.phase; - const count = status.count; - - let prefix = ''; - switch (type) { - case 'STARTED': prefix = `${c.blue}STARTED${c.reset}`; break; - case 'COLLECTING': prefix = `${c.yellow}COLLECTING${c.reset}`; break; - case 'WAITING_FOR_USER': prefix = `${c.magenta}WAITING${c.reset}`; break; - default: prefix = `${c.blue}${type}${c.reset}`; - } - - let detail = msg; - if (phase) { - detail += ` ${c.dim}(${phase.step}/${phase.total} ${phase.label || ''})${c.reset}`; - } - if (count !== undefined) { - detail += ` ${c.cyan}[${count} items]${c.reset}`; - } - - return `${prefix} ${detail}`; - } - - return JSON.stringify(status); -} - -function handleMessage(msg, resultRef) { - switch (msg.type) { - case 'ready': - // Handled by the caller to send run command - break; - - case 'status': - print(c.blue, '[status]', formatStatus(msg.status)); - break; - - case 'log': - print(c.gray, '[log] ', msg.message || ''); - break; - - case 'data': { - const val = typeof msg.value === 'string' ? msg.value : JSON.stringify(msg.value); - // Highlight [DEBUG] messages differently - if (val.startsWith('[DEBUG]')) { - print(c.yellow, '[debug] ', val.substring(8)); - } else if (msg.key === 'error') { - print(c.red, '[error] ', val); - } else { - print(c.cyan, '[data] ', `${msg.key} = ${val}`); - } - break; - } - - case 'result': - resultRef.data = msg.data; - break; - - case 'error': - print(c.red, '[ERROR] ', msg.message || JSON.stringify(msg)); - break; - - case 'network-captured': - print(c.dim, '[net] ', `Captured: ${msg.key} (${msg.url || ''})`); - break; - - default: - print(c.gray, '[???] ', JSON.stringify(msg)); - } -} - -// ─── Main ─────────────────────────────────────────────────── -async function main() { - const args = parseArgs(); - const connectorPath = path.resolve(args.connectorPath); - - if (!fs.existsSync(connectorPath)) { - console.error(`${c.red}Error: Connector file not found: ${connectorPath}${c.reset}`); - process.exit(1); - } - - const runnerDir = resolveRunnerDir(); - const metadata = loadMetadata(connectorPath); - const connectUrl = args.url || (metadata && metadata.connectURL) || 'about:blank'; - const connectorName = metadata?.name || path.basename(connectorPath, '.js'); - const connectorVersion = metadata?.version || 'unknown'; - - console.log(''); - console.log(`${c.bold}Connector Test Runner${c.reset}`); - console.log(`${'─'.repeat(50)}`); - console.log(` ${c.cyan}Connector:${c.reset} ${connectorPath}`); - console.log(` ${c.cyan}Name:${c.reset} ${connectorName} v${connectorVersion}`); - console.log(` ${c.cyan}URL:${c.reset} ${connectUrl}`); - console.log(` ${c.cyan}Mode:${c.reset} ${args.headless ? 'headless' : 'headed (visible browser)'}`); - console.log(` ${c.cyan}Output:${c.reset} ${args.output}`); - console.log(` ${c.cyan}Runner:${c.reset} ${runnerDir}`); - console.log(`${'─'.repeat(50)}`); - console.log(''); - - const startTime = Date.now(); - const resultRef = { data: null }; - - // Spawn the playwright runner - const child = spawn('node', ['index.cjs'], { - cwd: runnerDir, - stdio: ['pipe', 'pipe', 'pipe'], - }); - - // Handle stdout (JSON protocol) - const stdoutRL = readline.createInterface({ input: child.stdout }); - stdoutRL.on('line', (line) => { - try { - const msg = JSON.parse(line); - - if (msg.type === 'ready') { - // Send run command - const runCmd = JSON.stringify({ - type: 'run', - runId: `test-${Date.now()}`, - connectorPath, - url: connectUrl, - headless: args.headless, - forceHeaded: !args.headless, - }); - child.stdin.write(runCmd + '\n'); - print(c.green, '[runner]', 'Connected, starting connector...'); - return; - } - - handleMessage(msg, resultRef); - } catch (e) { - // Non-JSON output, print as-is - if (line.trim()) { - console.log(`${c.dim}${line}${c.reset}`); - } - } - }); - - // Handle stderr (PlaywrightRunner debug output) - const stderrRL = readline.createInterface({ input: child.stderr }); - stderrRL.on('line', (line) => { - if (line.trim()) { - print(c.dim, '[runner]', line.replace('[PlaywrightRunner] ', '')); - } - }); - - // Wait for process to exit - const exitCode = await new Promise((resolve) => { - child.on('exit', (code) => resolve(code || 0)); - }); - - const elapsed = ((Date.now() - startTime) / 1000).toFixed(1); - - console.log(''); - console.log(`${'─'.repeat(50)}`); - - // Save result - if (resultRef.data) { - const outputPath = path.resolve(args.output); - fs.writeFileSync(outputPath, JSON.stringify(resultRef.data, null, 2)); - const size = (fs.statSync(outputPath).size / 1024).toFixed(1); - print(c.green, '[result]', `Saved to ${outputPath} (${size} KB)`); - } else { - print(c.red, '[result]', 'No result data returned'); - } - - if (exitCode === 0) { - console.log(`${c.bold}${c.green}Done${c.reset} in ${elapsed}s`); - } else { - console.log(`${c.bold}${c.red}Failed${c.reset} (exit code ${exitCode}) in ${elapsed}s`); - } - console.log(''); - - process.exit(exitCode); -} - -main().catch((err) => { - console.error(`${c.red}Fatal error: ${err.message}${c.reset}`); - process.exit(1); -});