Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 133 additions & 5 deletions game.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { generateText } from "ai";
import { createOpenRouter } from "@openrouter/ai-sdk-provider";
import { mkdirSync, appendFileSync } from "node:fs";
import { join } from "node:path";
import { extractJSON } from "./llm-json-fixer";

// ── Models ──────────────────────────────────────────────────────────────────

Expand Down Expand Up @@ -89,6 +90,9 @@ const openrouter = createOpenRouter({
},
});

const EXPERIMENTAL_VERBAL_SAMPLE_COT =
process.env.EXPERIMENTAL_VERBAL_SAMPLE_COT === "1";

// ── Logger ──────────────────────────────────────────────────────────────────

const LOGS_DIR = join(import.meta.dir, "logs");
Expand Down Expand Up @@ -199,20 +203,144 @@ ${examples.map((p) => `- ${p}`).join("\n")}
Come up with something ORIGINAL — don't copy these examples.`;
}

function buildVerbalSampleCotSystem(): string {
const examples = shuffle([...ALL_PROMPTS]).slice(0, 80);
return `You are a comedy writer for the game Quiplash. Generate 5 funny fill-in-the-blank prompts that players will try to answer.

Think in a verbal, observational stand-up style and explain your thought process.

Output ONLY a single valid JSON object in this exact shape:
{
"reasoning": "string",
"jokes": [
{ "joke": "string", "probability": 0.0 }
]
}

Rules:
- "reasoning" must be a single string with your step-by-step verbal creative process.
- "jokes" must contain exactly 5 items.
- Each "joke" must be a single Quiplash-style fill-in-the-blank prompt under 15 words.
- Each "probability" must be a number between 0 and 1.
- Be highly varied in prompt formats. Do NOT overuse "The worst thing to..."
- Be original and do not copy examples.

Style examples:
${examples.map((p) => `- ${p}`).join("\n")}`;
}

export async function callGeneratePrompt(model: Model): Promise<string> {
log("INFO", `prompt:${model.name}`, "Calling API", { modelId: model.id });
const system = buildPromptSystem();
const { text, usage, reasoning } = await generateText({
if (!EXPERIMENTAL_VERBAL_SAMPLE_COT) {
const system = buildPromptSystem();
const { text, usage, reasoning } = await generateText({
model: openrouter.chat(model.id),
system,
prompt:
"Generate a single original Quiplash prompt. Be creative and don't repeat common patterns.",
});

log("INFO", `prompt:${model.name}`, "Raw response", {
rawText: text,
usage,
});
return cleanResponse(text);
}

const system = buildVerbalSampleCotSystem();
const { text, usage } = await generateText({
model: openrouter.chat(model.id),
system,
prompt:
"Generate a single original Quiplash prompt. Be creative and don't repeat common patterns.",
prompt: "Generate 5 original Quiplash prompts and return only the JSON object.",
});

log("INFO", `prompt:${model.name}`, "Raw response", {
log("INFO", `prompt:${model.name}`, "Raw verbal sample CoT response", {
rawText: text,
usage,
});

const parsed = extractJSON(text) as {
reasoning?: unknown;
jokes?: unknown;
};

if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) {
throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items");
}
Comment on lines +267 to +269
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Strict === 5 check is brittle for LLM output.

LLMs don't always follow instructions exactly — they may return 4 or 6 items. The downstream map/filter at lines 271–295 already discards invalid entries and checks for at least one valid candidate. Consider relaxing this to a minimum-length check (e.g., < 1) rather than requiring exactly 5.

♻️ Suggested change
-  if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) {
-    throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items");
+  if (!Array.isArray(parsed.jokes) || parsed.jokes.length === 0) {
+    throw new Error("Invalid verbal sample CoT output: jokes array is empty or missing");
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) {
throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items");
}
if (!Array.isArray(parsed.jokes) || parsed.jokes.length === 0) {
throw new Error("Invalid verbal sample CoT output: jokes array is empty or missing");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@game.ts` around lines 267 - 269, The strict check throws when
parsed.jokes.length !== 5; relax this to allow variable-length LLM output by
validating parsed.jokes is an array and has at least one item (e.g.,
parsed.jokes.length < 1) instead of requiring exactly 5; reference the existing
downstream validation in the mapping/filtering logic around the jokes processing
(the map/filter block following parsed.jokes) to ensure invalid entries are
still discarded and at least one valid candidate is present.


const candidates = parsed.jokes
.map((item) => {
if (!item || typeof item !== "object") {
return null;
}
const jokeValue = (item as { joke?: unknown }).joke;
const probValue = (item as { probability?: unknown }).probability;
if (typeof jokeValue !== "string") {
return null;
}
const joke = cleanResponse(jokeValue);
if (!joke) {
return null;
}
const probability =
typeof probValue === "number" && Number.isFinite(probValue)
? Math.max(0, Math.min(1, probValue))
: 0;
return { joke, probability };
})
.filter((item): item is { joke: string; probability: number } => item !== null);

if (!candidates.length) {
throw new Error("Invalid verbal sample CoT output: no valid joke candidates");
}

const selected = await callSelectBestPrompt(
model,
candidates.map((c) => c.joke),
);

const matched = candidates.find((c) => c.joke === selected);
if (matched) {
return matched.joke;
}
Comment on lines +297 to +305
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exact-match comparison with LLM output is fragile.

callSelectBestPrompt asks the model to echo back the chosen prompt, then line 302 does an exact string match against the candidates. LLMs frequently introduce minor deviations — a leading number, trailing period, extra whitespace, or slight rewording — causing the match to silently fail and fall through to the probability-based fallback every time.

Consider a fuzzy match (e.g., normalized/trimmed comparison, or includes/Levenshtein) or have the model return just the index number instead:

♻️ Option A: Have the model return the index
-    prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the exact prompt text, nothing else:\n\n${jokes
+    prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the number (1-${jokes.length}), nothing else:\n\n${jokes
       .map((joke, i) => `${i + 1}. ${joke}`)
       .join("\n")}`,

Then parse the returned number and index into the candidates array.

♻️ Option B: Normalize before matching
-  const matched = candidates.find((c) => c.joke === selected);
+  const normalize = (s: string) => s.replace(/^\d+[\.\)]\s*/, "").trim().toLowerCase();
+  const normalizedSelected = normalize(selected);
+  const matched = candidates.find((c) => normalize(c.joke) === normalizedSelected);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const selected = await callSelectBestPrompt(
model,
candidates.map((c) => c.joke),
);
const matched = candidates.find((c) => c.joke === selected);
if (matched) {
return matched.joke;
}
const selected = await callSelectBestPrompt(
model,
candidates.map((c) => c.joke),
);
const normalize = (s: string) => s.replace(/^\d+[\.\)]\s*/, "").trim().toLowerCase();
const normalizedSelected = normalize(selected);
const matched = candidates.find((c) => normalize(c.joke) === normalizedSelected);
if (matched) {
return matched.joke;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@game.ts` around lines 297 - 305, callSelectBestPrompt returns model text that
may not exactly match candidate.joke, so the exact equality check (selected ===
c.joke) in the matched lookup is fragile; update selection handling in game.ts
to robustly map the model output to a candidate by either (A) changing the
prompt in callSelectBestPrompt to instruct the model to return a single index
and then parse that index to pick from candidates, or (B) perform a normalized
fuzzy match after receiving selected — e.g., trim/normalize whitespace and
punctuation and compare lowercased strings or use a small Levenshtein/similarity
threshold to find the closest candidate.joke before falling back to the
probability-based path; apply this logic where matched is computed so
matched.joke reliably resolves to the intended candidate.


const fallback = candidates.reduce((best, current) => {
if (!best || current.probability > best.probability) {
return current;
}
return best;
}, null as { joke: string; probability: number } | null);

if (!fallback) {
throw new Error("Failed to select prompt from verbal sample CoT candidates");
}

return fallback.joke;
}

export async function callSelectBestPrompt(
model: Model,
jokes: string[],
): Promise<string> {
log("INFO", `prompt-select:${model.name}`, "Calling API", {
modelId: model.id,
candidateCount: jokes.length,
});

const { text, usage } = await generateText({
model: openrouter.chat(model.id),
system:
"Step into the mind of a world-class stand-up comic about to headline a sold-out arena. Trust only your battle-tested instinct for what makes real humans explode with laughter. Choose and deliver the one joke you know, from years of reading crowds, will absolutely destroy the room.",
prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the exact prompt text, nothing else:\n\n${jokes
.map((joke, i) => `${i + 1}. ${joke}`)
.join("\n")}`,
});

log("INFO", `prompt-select:${model.name}`, "Raw response", {
rawText: text,
usage,
});

return cleanResponse(text);
}

Expand Down
Loading