Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 116 additions & 55 deletions .cursor/plan.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,116 @@
## AI Essay → Review → Revision Pipeline

### Goal

Implement a **Bun-friendly TypeScript CLI** (`bun run index.ts`) that:

- Prompts the user for an essay topic.
- Uses **model A (OpenRouter via Vercel AI SDK)** to generate an essay.
- Uses **model B** to review that essay and produce feedback.
- Calls **model A again** with the feedback to produce a revised essay.
- Saves all three artifacts as **markdown files** on disk in a consistent location, with the `runs/` directory ignored by git.

### High-level Design

- **Runtime & entrypoint**: Keep using Bun with `index.ts` as the main CLI entrypoint.
- **AI client setup**:
- Add `ai` and the **OpenRouter provider for the Vercel AI SDK** as dependencies (no separate `openai` package needed since we are using OpenRouter directly).
- Configure a small `aiClient.ts` module (or keep logic inline in `index.ts` if very small) that wires the AI SDK to OpenRouter using an `OPENROUTER_API_KEY` env var.
- Hard-code two model IDs (e.g. one for essay generation, one for review) with clear `const` names so you can easily change them later.
- **Pipeline orchestration**:
- Implement a `runEssayPipeline()` function that:
- Reads the prompt from stdin (simple interactive question).
- Calls the **essay model** with a system prompt + user prompt to generate the initial essay.
- Calls the **review model** with system instructions plus the essay content to generate feedback.
- Calls the **essay model** again with the original prompt and the feedback to produce a revised essay.
- Keep everything **strongly typed** with small TypeScript interfaces for the pipeline results.
- **Markdown file output**:
- Decide on a simple folder and naming scheme (e.g. `runs/<timestamp>-essay.md`, `runs/<timestamp>-review.md`, `runs/<timestamp>-revision.md`).
- Use Bun / Node fs APIs in a small utility to write each step as a separate markdown file.
- Include basic front-matter or headings (e.g. `# Original Essay`, `# Review Feedback`, `# Revised Essay`) for easy inspection in an editor.
- Ensure `runs/` is added to `.gitignore` so generated artifacts don’t clutter git history.

### Implementation Steps

- **setup-deps**: Add `ai` and the OpenRouter provider for the AI SDK to `package.json` and document the required `OPENROUTER_API_KEY` env var in `README.md`.
- **ai-client**: Create a small AI client configuration that:
- Instantiates the AI SDK with the OpenRouter provider.
- Exposes typed helpers like `generateEssay(prompt)`, `reviewEssay(essay)`, and `reviseEssay(prompt, essay, feedback)`.
- **pipeline-logic**: Implement `runEssayPipeline()` in `index.ts` that:
- Interactively asks for a prompt via stdin.
- Runs the three AI steps in sequence (no streaming needed) with clear logging to the console.
- Returns a typed result object containing the three text outputs.
- **file-output**: Add a small utility function to:
- Create a `runs/` directory if it doesn’t exist.
- Write three markdown files with timestamped names and simple headings.
- Confirm that `runs/` is listed in `.gitignore`.
- **polish-types**: Ensure all public functions are type-safe (typed params and return types where helpful) and that the code compiles under the existing `tsconfig`.

### Todos

- **setup-deps**: Add and configure Vercel AI SDK (`ai`) and the OpenRouter provider, and document `OPENROUTER_API_KEY`.
- **ai-client**: Implement the AI client helper(s) for essay generation, review, and revision using hard-coded OpenRouter model IDs.
- **pipeline-logic**: Implement the CLI flow in `index.ts` to run the generation → review → revision pipeline.
- **file-output**: Implement markdown file-writing utilities (create `runs/` directory, timestamped filenames, headings) and ensure `runs/` is in `.gitignore`.
- **polish-types**: Run TypeScript checks and tighten any loose types if needed.
# Writing Quality Arena

## Models Configuration

Use the provided `modelsToRun` array in `constants.ts`:

```ts
export type RunnableModel = {
name: string;
llm: LanguageModelV1;
reasoning: boolean;
};

export const modelsToRun: RunnableModel[] = [
{
name: "claude-4.5-opus-reasoning",
llm: openrouter("anthropic/claude-opus-4.5"),
reasoning: true,
},
// ... 11 models total
];

export const PARALLEL_LIMIT = 5; // Configurable concurrency
```

## Execution Flow (4 Phases)

### Phase 1: Essay Generation

Each model writes an essay on the topic. **N calls**.

### Phase 2: All-to-All Review

Every model reviews EVERY essay (including their own). **N × N calls**.

### Phase 3: Per-Reviewer Revisions

Each model creates a separate revised essay for EACH piece of feedback received. **N × N revisions**.

### Phase 4: Scoring

Every model scores EVERY essay (N originals + N×(N-1) revisions). Use `generateObject` with Zod schema:

```ts
const ScoreSchema = z.object({
score: z.number().min(1).max(10),
justification: z.string(),
});
```

**N × (N + N×(N-1)) = N × N² = N³ calls**.

## API Call Summary (N=11 models)

| Phase | Formula | Calls |

|-------|---------|-------|

| Essays | N | 11 |

| Feedback | N×(N-1) | 110 |

| Revisions | N×(N-1) | 110 |

| Scores | N³ | 1331 |

| **Total** | | **1562** |

## Rankings

**Essay Ranking**: All essays (original + revised) ranked by average score across all judges.

**Reviewer Ranking**: For each reviewer, calculate avg improvement = mean(revision_score - original_score) for all revisions that used their feedback.

## File Structure

```
results/{timestamp}/
├── essays/{model-name}.md
├── feedback/{reviewer}-on-{author}.md
├── revisions/{author}-revised-by-{reviewer}.md
├── results.json
└── summary.md
```

## File Changes

| File | Change |

|------|--------|

| `constants.ts` | Add `RunnableModel` type, `modelsToRun` array, `PARALLEL_LIMIT` |

| `types.ts` | Already has appropriate types; verify alignment |

| `aiClient.ts` | Update functions to accept `RunnableModel`, add `scoreEssay()` using `generateObject` |

| `index.ts` | Rewrite with 4-phase arena orchestration, parallel execution via `p-limit`, `confirmRun()` |

| `fileUtils.ts` | Rewrite for arena folder structure (`results/` dir, essays/, feedback/, revisions/, results.json, summary.md) |

## CLI Confirmation

Display call counts and prompt before running:

```ts
async function confirmRun(): Promise<boolean> {
const n = modelsToRun.length;
const essays = n;
const feedback = n * (n - 1);
const revisions = n * (n - 1);
const scores = n * n * n;
const total = essays + feedback + revisions + scores;
// ... display and prompt Y/n
}
```
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,4 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json

# Generated essay runs
runs/
results/
Loading