Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions skills/vana-connect/CREATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Build a data connector for a platform that isn't in the registry yet.
- `reference/PAGE-API.md` -- full `page` object API
- `reference/PATTERNS.md` -- data extraction approaches and code examples

All `node scripts/...` commands refer to `skills/vana-connect/scripts/` in the data-connectors repo. `run-connector.cjs` is at `~/.dataconnect/run-connector.cjs` (installed by SETUP.md).
All `node scripts/...` commands refer to `skills/vana-connect/scripts/` in the data-connectors repo. Use the `vana` CLI to exercise connectors; only fall back to raw scripts when debugging connector internals.

## Connector Format

Expand Down Expand Up @@ -167,7 +167,7 @@ Run the connector and validate in one step:

```bash
node scripts/validate.cjs <company>/<name>-playwright.js && \
node ~/.dataconnect/run-connector.cjs <company>/<name>-playwright.js [start-url] && \
vana connect <platform> && \
node scripts/validate.cjs <company>/<name>-playwright.js --check-result ~/.dataconnect/last-result.json
```

Expand Down
2 changes: 1 addition & 1 deletion skills/vana-connect/RECIPES.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ console.log([header, ...rows].join('\n'));
Run the connector on a schedule (cron, agent heartbeat, etc.) and timestamp each export:

```bash
node run-connector.cjs <connector> <url>
vana connect <platform>
cp ~/.dataconnect/last-result.json ~/backups/<platform>-$(date +%Y-%m-%d).json
```

Expand Down
105 changes: 78 additions & 27 deletions skills/vana-connect/SETUP.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,114 @@
# Connect -- Setup

Skip if `~/.dataconnect/playwright-runner/index.cjs` and `~/.dataconnect/run-connector.cjs` both exist.
This setup exists to let the skill use a real installed `vana` CLI when available, with the published canary CLI as the fallback.

## Prerequisites
## Preferred path

- Node.js v18+
- Git
If `vana` is already on `PATH`, use it directly:

## Install
```bash
command -v vana
```

Run the setup script from the data-connectors repo root:
Then use:

```bash
bash skills/vana-connect/scripts/setup.sh
vana
```

Skip runtime setup if `vana status --json` reports `"runtime":"installed"` or `"runtime":{"installed":true,...}`.

## Fallback path

If `vana` is not installed yet, prefer the official installer so the user gets a real installed CLI:

```bash
curl -fsSL https://raw.githubusercontent.com/vana-com/vana-connect/main/install/install.sh | sh
```

Then verify:

```bash
vana --help
```

If the installer path is unavailable or the released CLI does not yet contain the needed behavior, use the published canary package:

```bash
npx -y @opendatalabs/connect@canary
```

Skip runtime setup if `npx -y @opendatalabs/connect@canary status --json` reports `"runtime":"installed"` or `"runtime":{"installed":true,...}`.

## Verify the published CLI

```bash
npx -y @opendatalabs/connect@canary --help
```

This installs the playwright-runner, Chromium, and run-connector.cjs in a single step. If the user needs to approve commands, this is one approval instead of many.
## Verify an installed CLI

**Before running**, tell the user: setup will download a browser engine and some Node.js dependencies into `~/.dataconnect/`. This is a one-time step.
```bash
vana --help
```

## Manual install
## Local development fallback

If the setup script doesn't work for your environment, follow these steps individually:
From `/home/tnunamak/code/vana-connect`:

```bash
mkdir -p ~/.dataconnect/connectors
cd ~/.dataconnect
pnpm install
pnpm build
```

Verify:

git clone --depth 1 --filter=blob:none --sparse --branch main \
https://github.com/vana-com/data-connect.git _data-connect
cd _data-connect && git sparse-checkout set playwright-runner
cp -r playwright-runner ../playwright-runner
cd .. && rm -rf _data-connect
cd ~/.dataconnect/playwright-runner && npm install
npx playwright install chromium
```bash
ls /home/tnunamak/code/vana-connect/dist/cli/bin.js
```

Then copy run-connector.cjs from the skill's scripts/ directory:
## Install the runtime

Use the installed CLI when possible:

```bash
cp skills/vana-connect/scripts/run-connector.cjs ~/.dataconnect/run-connector.cjs
vana setup --yes
```

> **Do not** use `curl` to fetch this file from GitHub — the repo root contains a symlink that GitHub raw serves as a text pointer, not the actual script.
If `vana` is not installed, use the published canary fallback:

```bash
npx -y @opendatalabs/connect@canary --help
npx -y @opendatalabs/connect@canary setup --yes
```

Before running, tell the user this downloads a browser engine and some dependencies into `~/.dataconnect/`. This is a one-time step.

## Verify

```bash
ls ~/.dataconnect/playwright-runner/index.cjs ~/.dataconnect/run-connector.cjs
vana status
```

Both files should exist.
You should see `Runtime: installed`. If `vana` is unavailable, run `npx -y @opendatalabs/connect@canary status` instead.
If setup still fails, inspect the log path surfaced by the CLI and only fall back to the older script-level flow if the CLI setup path is blocked.

## Legacy fallback

Only use this if the CLI setup path is broken and you are debugging the underlying runtime:

```bash
bash skills/vana-connect/scripts/setup.sh
```

## File Locations

| Path | Purpose |
|------|---------|
| `~/.dataconnect/playwright-runner/` | Runner process |
| `~/.dataconnect/run-connector.cjs` | Batch-mode runner wrapper |
| `vana` | Preferred installed CLI entrypoint |
| `npx -y @opendatalabs/connect@canary` | Published canary CLI entrypoint |
| `/home/tnunamak/code/vana-connect/dist/cli/bin.js` | Local development fallback |
| `~/.dataconnect/connectors/` | Connector scripts |
| `~/.dataconnect/browser-profiles/` | Persistent sessions (cookies) |
| `~/.dataconnect/last-result.json` | Most recent result |
| `~/.dataconnect/logs/` | Setup and run logs surfaced by the CLI |
Loading