-
Notifications
You must be signed in to change notification settings - Fork 7
Battle arena #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Battle arena #2
Conversation
Greptile OverviewGreptile SummaryTransforms the essay pipeline into a "Battle Arena" where multiple AI models compete by writing, reviewing, and revising essays, then evaluating each other's work through two test modes: scoring-test (1-10 ratings) and 1v1 (head-to-head comparisons). Key Changes
Issues Found
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant CLI as index.ts
participant AI as aiClient.ts
participant FS as fileUtils.ts
participant Models as OpenRouter Models
User->>CLI: Run with --test=scoring-test or --test=1v1
CLI->>User: Display test type selection
User->>CLI: Confirm test execution
Note over CLI,FS: Phase 1: Essay Generation
CLI->>FS: initArenaRun(testType)
FS->>FS: Create results/{testType}/{timestamp}/
loop For each model
CLI->>AI: generateEssay(model, topic)
AI->>Models: Generate essay via OpenRouter
Models-->>AI: Essay text + usage data
AI-->>CLI: EssayResult
CLI->>FS: writeEssay(topicDir, model, essay)
end
Note over CLI,FS: Phase 2: Feedback Generation
loop For each reviewer model
loop For each other author model
CLI->>AI: reviewEssay(reviewer, essay, topic)
AI->>Models: Review essay
Models-->>AI: Feedback text + usage
AI-->>CLI: ReviewResult
CLI->>FS: writeFeedback(topicDir, reviewer, author, feedback)
end
end
Note over CLI,FS: Phase 3: Revisions
loop For each author model
loop For each reviewer's feedback
CLI->>AI: reviseEssay(author, topic, essay, feedback)
AI->>Models: Revise essay
Models-->>AI: Revised text + usage
AI-->>CLI: RevisionResult
CLI->>FS: writeRevision(topicDir, author, reviewer, revision)
end
end
alt Scoring Test Mode
Note over CLI,FS: Phase 4a: Scoring
loop For each judge model
loop For each essay (original + revised)
CLI->>AI: scoreEssay(judge, essay, topic)
AI->>Models: Score essay 1-10
Models-->>AI: Score + justification + usage
AI-->>CLI: ScoreResult (parsed from text)
end
end
CLI->>CLI: calculateScoringRankings(scores)
CLI->>FS: writeResultsJson(baseDir, results)
CLI->>FS: writeSummary(baseDir, results)
else 1v1 Test Mode
Note over CLI,FS: Phase 4b: Head-to-Head Comparisons
loop For each judge model
loop For each pair of essays
CLI->>AI: compareEssays(judge, essayA, essayB, topic)
AI->>Models: Compare essays A vs B
Models-->>AI: Winner (A/B/tie) + reasoning + usage
AI-->>CLI: CompareResult (parsed from text)
CLI->>FS: writeComparison(topicDir, judge, essayA, essayB, winner)
end
end
CLI->>CLI: calculateOneVsOneRankings(comparisons)
CLI->>FS: writeOneVsOneResultsJson(baseDir, results)
CLI->>FS: writeOneVsOneSummary(baseDir, results)
end
CLI->>User: Display final rankings and cost summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 3 comments
| "The ethics of artificial intelligence", | ||
| "How social media reshapes human connection", | ||
| // "The value of slow living in a fast world", | ||
| // "Why we should embrace uncertainty", | ||
| // "The hidden costs of convenience", | ||
| // "What makes a good explanation", | ||
| // "The relationship between creativity and constraint", | ||
| // "Why some ideas spread and others don't", | ||
| "the negative impacts on society from artificial intelligence", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: duplicate topic "the negative impacts on society from artificial intelligence" is similar to line 22 "The ethics of artificial intelligence"
| "The ethics of artificial intelligence", | |
| "How social media reshapes human connection", | |
| // "The value of slow living in a fast world", | |
| // "Why we should embrace uncertainty", | |
| // "The hidden costs of convenience", | |
| // "What makes a good explanation", | |
| // "The relationship between creativity and constraint", | |
| // "Why some ideas spread and others don't", | |
| "the negative impacts on society from artificial intelligence", | |
| "The ethics of artificial intelligence", | |
| "How social media reshapes human connection", | |
| ] as const; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: constants.ts
Line: 22:30
Comment:
**logic:** duplicate topic "the negative impacts on society from artificial intelligence" is similar to line 22 "The ethics of artificial intelligence"
```suggestion
"The ethics of artificial intelligence",
"How social media reshapes human connection",
] as const;
```
How can I resolve this? If you propose a fix, please make it concise.| const scoreMatch = result.text.match(/Score:\s*(\d+(?:\.\d+)?)\s*\/\s*10/i); | ||
| const score = scoreMatch?.[1] ? parseFloat(scoreMatch[1]) : 5; // Default to 5 if parsing fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: regex parsing fragile - if model outputs "Score: 7.5 / 10" (with spaces) or "Score 8/10" (no colon), parsing fails silently and defaults to 5
Prompt To Fix With AI
This is a comment left during a code review.
Path: aiClient.ts
Line: 164:165
Comment:
**logic:** regex parsing fragile - if model outputs "Score: 7.5 / 10" (with spaces) or "Score 8/10" (no colon), parsing fails silently and defaults to 5
How can I resolve this? If you propose a fix, please make it concise.| const winnerMatch = result.text.match(/Winner:\s*(A|B|Tie)/i); | ||
| let winner: "A" | "B" | "tie" = "tie"; | ||
| if (winnerMatch) { | ||
| const parsed = winnerMatch[1]!.toUpperCase(); | ||
| if (parsed === "A") winner = "A"; | ||
| else if (parsed === "B") winner = "B"; | ||
| else winner = "tie"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: winner parsing defaults to "tie" on parse failure - should handle parse errors explicitly since ties should be rare
Prompt To Fix With AI
This is a comment left during a code review.
Path: aiClient.ts
Line: 224:231
Comment:
**style:** winner parsing defaults to "tie" on parse failure - should handle parse errors explicitly since ties should be rare
How can I resolve this? If you propose a fix, please make it concise.
No description provided.