Skip to content

Conversation

@t3dotgg
Copy link
Contributor

@t3dotgg t3dotgg commented Nov 30, 2025

No description provided.

@greptile-apps
Copy link

greptile-apps bot commented Nov 30, 2025

Greptile Overview

Greptile Summary

Transforms the essay pipeline into a "Battle Arena" where multiple AI models compete by writing, reviewing, and revising essays, then evaluating each other's work through two test modes: scoring-test (1-10 ratings) and 1v1 (head-to-head comparisons).

Key Changes

  • New 4-phase architecture: Essay generation → peer review → revisions → evaluation (scoring or comparison)
  • Parallel execution: Uses p-limit for controlled concurrency (30 concurrent requests)
  • Two evaluation modes: scoring-test for numerical rankings, 1v1 for tournament-style comparisons
  • Cost tracking: Extracts and aggregates API costs from OpenRouter metadata across all phases
  • Comprehensive results: Structured output with essays, feedback, revisions, rankings, and markdown summaries

Issues Found

  • Duplicate/overlapping topics in constants.ts (lines 22, 30)
  • Fragile regex parsing in aiClient.ts for extracting scores and winners from model outputs
  • Silent fallback behavior when parsing fails (defaults to score=5 or winner=tie)

Confidence Score: 4/5

  • Safe to merge with minor issues that should be addressed
  • Score reflects solid architecture with comprehensive error handling and parallel execution, but has fragile parsing logic that could cause subtle bugs (silent defaults) and duplicate topic configuration that needs cleanup
  • Pay attention to constants.ts (duplicate topics) and aiClient.ts (parsing logic could fail silently on unexpected model output formats)

Important Files Changed

File Analysis

Filename Score Overview
constants.ts 4/5 New file defining models, topics, parallelism config; has duplicate topic entries
aiClient.ts 4/5 Expanded AI client with essay scoring and comparison functions; score parsing uses fragile regex
index.ts 4/5 Complete rewrite implementing 4-phase arena with scoring-test and 1v1 modes; complex orchestration logic
fileUtils.ts 5/5 Rewritten file utilities for arena results structure with proper directory organization

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as index.ts
    participant AI as aiClient.ts
    participant FS as fileUtils.ts
    participant Models as OpenRouter Models
    
    User->>CLI: Run with --test=scoring-test or --test=1v1
    CLI->>User: Display test type selection
    User->>CLI: Confirm test execution
    
    Note over CLI,FS: Phase 1: Essay Generation
    CLI->>FS: initArenaRun(testType)
    FS->>FS: Create results/{testType}/{timestamp}/
    loop For each model
        CLI->>AI: generateEssay(model, topic)
        AI->>Models: Generate essay via OpenRouter
        Models-->>AI: Essay text + usage data
        AI-->>CLI: EssayResult
        CLI->>FS: writeEssay(topicDir, model, essay)
    end
    
    Note over CLI,FS: Phase 2: Feedback Generation
    loop For each reviewer model
        loop For each other author model
            CLI->>AI: reviewEssay(reviewer, essay, topic)
            AI->>Models: Review essay
            Models-->>AI: Feedback text + usage
            AI-->>CLI: ReviewResult
            CLI->>FS: writeFeedback(topicDir, reviewer, author, feedback)
        end
    end
    
    Note over CLI,FS: Phase 3: Revisions
    loop For each author model
        loop For each reviewer's feedback
            CLI->>AI: reviseEssay(author, topic, essay, feedback)
            AI->>Models: Revise essay
            Models-->>AI: Revised text + usage
            AI-->>CLI: RevisionResult
            CLI->>FS: writeRevision(topicDir, author, reviewer, revision)
        end
    end
    
    alt Scoring Test Mode
        Note over CLI,FS: Phase 4a: Scoring
        loop For each judge model
            loop For each essay (original + revised)
                CLI->>AI: scoreEssay(judge, essay, topic)
                AI->>Models: Score essay 1-10
                Models-->>AI: Score + justification + usage
                AI-->>CLI: ScoreResult (parsed from text)
            end
        end
        CLI->>CLI: calculateScoringRankings(scores)
        CLI->>FS: writeResultsJson(baseDir, results)
        CLI->>FS: writeSummary(baseDir, results)
    else 1v1 Test Mode
        Note over CLI,FS: Phase 4b: Head-to-Head Comparisons
        loop For each judge model
            loop For each pair of essays
                CLI->>AI: compareEssays(judge, essayA, essayB, topic)
                AI->>Models: Compare essays A vs B
                Models-->>AI: Winner (A/B/tie) + reasoning + usage
                AI-->>CLI: CompareResult (parsed from text)
                CLI->>FS: writeComparison(topicDir, judge, essayA, essayB, winner)
            end
        end
        CLI->>CLI: calculateOneVsOneRankings(comparisons)
        CLI->>FS: writeOneVsOneResultsJson(baseDir, results)
        CLI->>FS: writeOneVsOneSummary(baseDir, results)
    end
    
    CLI->>User: Display final rankings and cost summary
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +22 to +30
"The ethics of artificial intelligence",
"How social media reshapes human connection",
// "The value of slow living in a fast world",
// "Why we should embrace uncertainty",
// "The hidden costs of convenience",
// "What makes a good explanation",
// "The relationship between creativity and constraint",
// "Why some ideas spread and others don't",
"the negative impacts on society from artificial intelligence",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: duplicate topic "the negative impacts on society from artificial intelligence" is similar to line 22 "The ethics of artificial intelligence"

Suggested change
"The ethics of artificial intelligence",
"How social media reshapes human connection",
// "The value of slow living in a fast world",
// "Why we should embrace uncertainty",
// "The hidden costs of convenience",
// "What makes a good explanation",
// "The relationship between creativity and constraint",
// "Why some ideas spread and others don't",
"the negative impacts on society from artificial intelligence",
"The ethics of artificial intelligence",
"How social media reshapes human connection",
] as const;
Prompt To Fix With AI
This is a comment left during a code review.
Path: constants.ts
Line: 22:30

Comment:
**logic:** duplicate topic "the negative impacts on society from artificial intelligence" is similar to line 22 "The ethics of artificial intelligence"

```suggestion
  "The ethics of artificial intelligence",
  "How social media reshapes human connection",
] as const;
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +164 to +165
const scoreMatch = result.text.match(/Score:\s*(\d+(?:\.\d+)?)\s*\/\s*10/i);
const score = scoreMatch?.[1] ? parseFloat(scoreMatch[1]) : 5; // Default to 5 if parsing fails
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: regex parsing fragile - if model outputs "Score: 7.5 / 10" (with spaces) or "Score 8/10" (no colon), parsing fails silently and defaults to 5

Prompt To Fix With AI
This is a comment left during a code review.
Path: aiClient.ts
Line: 164:165

Comment:
**logic:** regex parsing fragile - if model outputs "Score: 7.5 / 10" (with spaces) or "Score 8/10" (no colon), parsing fails silently and defaults to 5

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +224 to +231
const winnerMatch = result.text.match(/Winner:\s*(A|B|Tie)/i);
let winner: "A" | "B" | "tie" = "tie";
if (winnerMatch) {
const parsed = winnerMatch[1]!.toUpperCase();
if (parsed === "A") winner = "A";
else if (parsed === "B") winner = "B";
else winner = "tie";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: winner parsing defaults to "tie" on parse failure - should handle parse errors explicitly since ties should be rare

Prompt To Fix With AI
This is a comment left during a code review.
Path: aiClient.ts
Line: 224:231

Comment:
**style:** winner parsing defaults to "tie" on parse failure - should handle parse errors explicitly since ties should be rare

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants