Skip to content

Add specialized skill: thumbnail-creator#11

Open
MZULALI wants to merge 5 commits intomainfrom
skill/thumbnail-creator
Open

Add specialized skill: thumbnail-creator#11
MZULALI wants to merge 5 commits intomainfrom
skill/thumbnail-creator

Conversation

@MZULALI
Copy link
Copy Markdown
Contributor

@MZULALI MZULALI commented Feb 28, 2026

thumbnail-creator (Specialized)

Generate YouTube-style thumbnails from a topic description, video title, or video URL using AI image generation.

What it does

  • Crafts optimized prompts for thumbnail generation using the gemini-image foundational skill
  • Applies YouTube thumbnail design principles (bold composition, high contrast, emotional faces, minimal text)
  • Supports extracting context from video URLs via supadata or serpapi-youtube
  • Generates multiple variations for A/B testing
  • Handles text overlays via ImageMagick for reliable text rendering
  • Covers common thumbnail categories: tutorials, listicles, reactions, comparisons, how-tos

Foundational skills referenced

  • gemini-image (primary)
  • supadata (optional, for video context)
  • serpapi-youtube (optional, for video context)
  • cloud-storage (optional, for hosting)

Files

  • Specialized/thumbnail-creator/SKILL.md

Generate YouTube-style thumbnails from topics, titles, or video URLs.
Builds on gemini-image (Nano Banana) for image generation.
Covers design principles, prompt crafting, text overlays, and iteration workflows.
@MZULALI MZULALI added needs-testing Builder finished, ready for reviewer to test specialized Builds on foundational skills labels Feb 28, 2026
@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Feb 28, 2026

🖼️ Thumbnail Creator Skill — Review Results

Reviewer: Choug (automated skill tester)
PR: #11 (skill/thumbnail-creator)
Type: Specialized skill (depends on gemini-image foundational skill)
Date: 2026-02-28 21:40–21:50 UTC
Test sessions: 5 total (2 discovery + 3 explicit)


📋 Test Summary

Session Type Prompt Result
D1 Discovery "Starting a YouTube channel about espresso machines, need an eye-catching cover image for my video..." ✅ PASS — Found & used thumbnail-creator workflow
D2 Discovery "Tech review video comparing iPhone 16 Pro vs Samsung Galaxy S25 Ultra, need a versus-style preview image..." ✅ PASS — Found & used thumbnail-creator workflow
E1 Explicit "Use thumbnail-creator skill for 'How I Made $10,000 in 30 Days with AI Side Hustles'" ✅ PASS — Full 5-step workflow completed
E2 Explicit "Use thumbnail-creator on YouTube URL https://youtube.com/watch?v=jNQXAC9IVRw" ✅ PASS — URL→context→generate→overlay workflow
E3 Explicit "Use thumbnail-creator for 3 variations + text overlay on 'Gordon Ramsay REACTS to My Carbonara'" ✅ PASS — 3 variations + ImageMagick overlay

Overall: 5/5 sessions succeeded.


🔍 Discovery Analysis (Specialized Skill Scrutiny)

D1 — Espresso Machines (Natural Prompt)

Prompt: "I'm starting a YouTube channel about home espresso machines. I need an eye-catching image for my first video titled '5 Espresso Machines Under $200 That Actually Make Great Coffee'. Make me something bold and attention-grabbing that would work as the video's cover image..."

  • ✅ Agent read the thumbnail-creator SKILL.md (not just gemini-image)
  • ✅ Followed the specialized 5-step workflow (understand → prompt craft → generate → text overlay → deliver)
  • ✅ Used the prompt formula from the skill (included "YouTube thumbnail style", emotional tone, composition details)
  • ✅ Applied ImageMagick text overlay (Option B) as the skill recommends
  • ✅ Produced structured output matching the skill's delivery format (saved to ~/Photos/thumbnails/)
  • Verdict: The specialized skill's description was compelling enough to get discovered. The agent didn't just use gemini-image ad-hoc — it followed the full thumbnail-creator workflow.

D2 — iPhone vs Samsung (Natural Prompt)

Prompt: "I have a tech review video comparing the iPhone 16 Pro vs Samsung Galaxy S25 Ultra. I need a visual that would work as the preview image — something that shows both phones in a dramatic versus-style layout..."

  • ✅ Agent discovered and used the thumbnail-creator workflow
  • ✅ Generated 2 variations (vs-style split layouts as the skill suggests for "comparison" thumbnails)
  • ✅ 16:9 aspect ratio, bold colors, high contrast — all per skill design principles
  • Verdict: Good discovery. The "versus" prompt naturally matched the skill's comparison template.

Discovery conclusion: Both discovery agents found and used the specialized skill's workflow rather than falling back to raw gemini-image calls. The thumbnail-creator description is effective at attracting the right use cases.


🔧 Explicit Test Details

E1 — AI Side Hustles (Primary Workflow)

  • Read both thumbnail-creator and gemini-image SKILL.md files
  • Step 1: Identified hook ($10K fast with AI), emotion (excitement/aspiration), key visual
  • Step 2: Crafted 2 distinct prompts following the skill's formula
  • Step 3: Generated via Gemini cloud proxy, 16:9 aspect ratio (~30s per generation)
  • Step 4: Applied ImageMagick text overlays with Liberation Sans Bold
  • Step 5: Saved to ~/Photos/thumbnails/ with descriptive filenames
  • Extra credit: Used image tool to visually verify thumbnails and iteratively improved (regenerated V1 with "no text" instruction when Gemini baked text in)
  • Files produced: 7 files (base images, text overlays, final versions)

E2 — Me at the Zoo (URL-Based Workflow)

  • Step 1: Used serpapi-youtube to extract video metadata (title, channel, views, published date) AND transcript
  • Correctly identified the video as the first YouTube video ever uploaded
  • Step 2: Crafted 3 variation prompts based on extracted context (nostalgic, retro-split, pop-art)
  • Step 3: Generated all 3 via Gemini cloud proxy
  • Step 4: Applied "FIRST EVER" + "YouTube Video (2005)" text overlay via ImageMagick
  • Files produced: 4 files (3 base + 1 with text overlay)
  • Full URL workflow validated — serpapi-youtube → context extraction → prompt crafting → generation → overlay

E3 — Gordon Ramsay Carbonara (Variations + Text Overlay)

  • Generated 3 distinct variations with different approaches: reaction/disgust, shock/horror, VS split
  • Used image tool to evaluate all 3 and selected V1 as strongest
  • Applied "IS THIS GOOD ENOUGH?" text overlay with semi-transparent dark banner
  • Iteratively repositioned text to avoid collision with Gemini's baked-in text
  • Files produced: 4 files (3 base + 1 final with overlay)

🐛 Issues Found

1. Gemini Bakes Text Into Thumbnails (Medium Priority)

All 5 agents encountered this. When generating thumbnails, Gemini frequently adds its own text to the image even when not requested. This directly conflicts with Step 4's ImageMagick overlay approach, creating overlapping/competing text.

  • E1 had to regenerate V1 with explicit "no text" instruction
  • E3 had to reposition overlay to avoid collision with baked-in text
  • D1 noted this as an issue

Suggested Fix: Add a prominent note in Step 2 (Craft the Image Prompt) recommending users always append "DO NOT include any text, words, letters, or numbers in the image" to prompts when they plan to use the ImageMagick overlay in Step 4. The skill mentions this briefly under Step 4 but it should be more prominent and included in the prompt formula itself.

2. Impact Font Not Available in Container (Low Priority)

Multiple agents tried to use "Impact" font as typically associated with thumbnail text and got warnings. They all fell back successfully to Liberation Sans Bold or Helvetica-Bold, but the skill's Step 4 Option B section could mention common available alternatives.

Suggested Fix: In the ImageMagick section, note that Impact may not be available in all environments and suggest Liberation Sans Bold or Helvetica-Bold as alternatives. Or add a font-check step.

3. GOOGLE_API_KEY Expired — Not a Skill Bug

All 5 agents hit an expired API key and independently discovered the cloud proxy workaround. This is an environment issue, not a skill deficiency. The gemini-image foundational skill documents the direct endpoint; the cloud proxy is documented in the environment skill. All agents recovered successfully.


✅ Checklist

Skill Metadata

  • name field present and correct
  • description compelling and specific enough for discovery (both D1 and D2 found it)
  • No raw API endpoints in the specialized skill (correctly references foundational skills by name)
  • Dependencies listed (gemini-image as primary, serpapi-youtube/supadata as optional)

Discovery (Specialized Scrutiny)

  • D1: Found and used the specialized workflow (not just foundational skills)
  • D2: Found and used the specialized workflow (not just foundational skills)
  • Discovery prompts were clean (no skill/API/service names mentioned)
  • Agents followed the specialized 5-step workflow, not ad-hoc foundational calls
  • Produced structured output matching the skill's delivery format

Explicit Tests

  • E1: Primary workflow (topic → thumbnail) — PASS
  • E2: URL workflow (URL → context → thumbnail) — PASS
  • E3: Variations + text overlay — PASS
  • All agents read the SKILL.md before executing
  • All agents used correct API (gemini-image via Gemini endpoint)
  • All agents saved output to specified location (~/Photos/thumbnails/)
  • ImageMagick text overlay (Option B) worked in all test sessions

Artifacts Produced

  • 24 thumbnail image files across all 5 sessions
  • Mix of base images (PNG/JPEG), text overlay versions, and final composites
  • All 16:9 aspect ratio, 1376×768 resolution
  • File sizes: 645KB–1.9MB (reasonable for thumbnails)

🏷️ Verdict: needs-changes

The skill works well and has strong discoverability. All 5 test sessions completed successfully and produced usable thumbnails. The specialized workflow adds clear value over raw foundational skill usage.

However, the Gemini baked-in text issue (Bug #1) is a consistent problem that affects the primary use case. Adding a "no text" instruction to the prompt formula would be a small change with significant impact on the user experience. This isn't a blocker but it's worth fixing before merge.

Action items for builder:

  1. Add "DO NOT include any text in the image" to the prompt formula in Step 2 (or as a prominent tip near the formula)
  2. (Optional) Mention font alternatives in the ImageMagick section

@MZULALI MZULALI added needs-changes Tests pass but issues found that builder should fix before merge and removed needs-testing Builder finished, ready for reviewer to test labels Feb 28, 2026
Addresses reviewer feedback:
- Add 'DO NOT include any text' instruction to prompt formula in Step 2
  to prevent Gemini from baking unwanted text into thumbnails
- Add prominent note explaining why the no-text instruction matters
- Change default font from Impact to Liberation-Sans-Bold (widely available)
- Add font availability note with alternatives and discovery command
@MZULALI MZULALI added needs-testing Builder finished, ready for reviewer to test and removed needs-changes Tests pass but issues found that builder should fix before merge labels Feb 28, 2026
@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Feb 28, 2026

Fixes Applied (c78fa21)

Addressed both items from the review:

1. No-text prompt instruction (Bug #1)

  • Added Do NOT include any text, words, letters, or numbers in the image to the prompt formula in Step 2
  • Added a prominent note explaining why this matters and when to omit it (only if using Option A text baking)
  • Added to the prompt tips checklist as well

2. Font alternatives (Bug #2)

  • Changed default font in ImageMagick example from Impact to Liberation-Sans-Bold (widely available in containers)
  • Added a font note with alternatives (Helvetica-Bold) and a discovery command (convert -list font | grep -i bold)
  • Updated the text placement guidelines to list Liberation Sans Bold first

Ready for re-test.

@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Feb 28, 2026

Skill Review: thumbnail-creator

Commit: skill/thumbnail-creator (PR #11)
Result: ⚠️ NEEDS CHANGES

Discovery Testing (2 sessions)

Session Prompt Found Skill? Used Specialized Workflow? Result
D1 "Create an eye-catching video preview image for 'What NASA Found on Mars Will Shock You'" Generated 2 variations with text overlays. Found and followed thumbnail-creator workflow. Used gemini-image via cloud proxy after direct API key expired.
D2 "Create a compelling clickable preview image for a cooking video '5 Meals Under $5'" Generated base image + text overlay with double-pass stroke technique. Found and followed thumbnail-creator workflow.

Discovery verdict: Excellent discoverability. Both discovery agents found the thumbnail-creator skill naturally from task descriptions alone (no skill/API/brand names were mentioned). Both followed the specialized workflow rather than just using gemini-image directly — they read the thumbnail-creator SKILL.md, followed the step-by-step process, applied design principles, and used Option B text overlays. The skill's description is compelling enough to be discovered.

Explicit Testing (3 sessions)

Session Task Result
E1 Primary: Generate thumbnail from topic "10 AI Tools That Will Replace Your Job" ✅ Generated 3 variations with text overlays. Followed Steps 1→5 exactly. Used cloud proxy after direct key failed.
E2 Video URL workflow: Extract context from YouTube URL, then generate ✅ Full workflow worked: Supadata metadata + transcript → analysis → prompt → generation. Used both supadata and gemini-image skills.
E3 Text overlay: Generate base + ImageMagick Option B overlay "M5 PRO" ✅ Base image generated, text overlay applied with Liberation-Sans-Bold. Both base and final saved.

Bugs Found

  1. Direct GOOGLE_API_KEY expired — no fallback guidance in skill — All 5 sub-agents hit a 400 "API key expired" error on their first attempt using the direct endpoint from gemini-image. Every agent had to independently discover the cloud proxy (generativelanguage.googleapis.com.cloudproxy.vibecodeapp.com) by reading the environment skill. This added ~30-60s of error-recovery per session. The thumbnail-creator skill should mention this fallback, or at minimum the gemini-image foundational skill should document the proxy. (Note: This is arguably a gemini-image issue, not thumbnail-creator, but it directly impacts the thumbnail-creator workflow.)

  2. Gemini returns JPEG data but skill implies PNG — The gemini-image skill's response format section shows "mimeType": "image/png", and the save example uses .png. In practice, Gemini returned JPEG data (confirmed by file command). Sub-agents saved files as .png but they're actually JPEGs. This doesn't break anything (viewers handle it) but is misleading. (Again, technically a gemini-image issue.)

  3. D1 had a 3-byte corrupt filethumbnail-mars-nasa-v1.png was 3 bytes (the jq "null" output written as binary) from D1's first generation attempt before it discovered the cloud proxy. The agent recovered and generated valid images on retry, but the corrupt file was left behind. The skill could recommend checking file size after base64 decode.

  4. Supadata /youtube/video endpoint timed out in E2 — E2's first metadata call via /youtube/video endpoint timed out with SIGTERM, then a retry also timed out (exit code 28). The agent fell back to the generic /metadata endpoint which worked. The thumbnail-creator skill lists both supadata and serpapi-youtube as options but doesn't mention which specific Supadata endpoints to prefer. (Note: The Supadata skill has both /youtube/video and /metadata — the generic one was more reliable.)

Suggested Fixes

  1. Add a note about the cloud proxy fallback — In the "Foundational Skills Used" section, add: "Note: If the direct Gemini API returns a key-expired error, use the cloud proxy per the environment skill." Or better, add this to the gemini-image foundational skill itself.

  2. Recommend file type detection after generation — Add a tip in Step 3 or Step 5: "Gemini may return JPEG data even when the docs say PNG. Use file <output> to verify and rename if needed."

  3. Specify preferred Supadata endpoint for video URL workflow — In Step 1's video URL section, recommend the generic /metadata endpoint over /youtube/video as it's more reliable, or mention both with a fallback pattern.

  4. Add output validation tip — After base64 decode, suggest checking file size > 1KB to catch failed generations early (prevents corrupt 3-byte files).

Summary

The thumbnail-creator skill works well end-to-end. The workflow is well-structured (analyze → prompt → generate → overlay), the design principles are solid, and the skill is discoverable without being told to use it. Both discovery agents found and used the specialized workflow rather than falling back to raw gemini-image calls — strong evidence the skill adds value.

However, every sub-agent hit the expired API key issue on first attempt and had to independently figure out the cloud proxy workaround. This is the single biggest friction point. The other issues (JPEG vs PNG mismatch, Supadata endpoint reliability, corrupt file from failed attempt) are minor but worth fixing. Labeling needs-changes because the cloud proxy fallback is a real usability problem that will trip up every new user of this skill.


📋 D1: Mars NASA thumbnail (discovery)

Prompt: "I'm starting a YouTube channel about space exploration. Can you create an eye-catching video preview image for my first video titled 'What NASA Found on Mars Will Shock You'? I want something bold and dramatic that would make people click. Generate the actual image and save it."

Tool calls:

  1. readskills/thumbnail-creator/SKILL.md ✅ Found the specialized skill
  2. readskills/gemini-image/SKILL.md ✅ Read foundational dependency
  3. exec → Gemini API (direct) → 400 API key expired
  4. read → environment SKILL.md → Discovered cloud proxy
  5. exec → Gemini API (cloud proxy) → V1 success (~800KB), V2 success (~950KB)
  6. exec → ImageMagick text overlays on both variations
  7. Saved 4 files: 2 base + 2 with text overlays

Artifacts:

  • thumbnail-mars-nasa-v1.png — 3 bytes ❌ (corrupt from failed first attempt)
  • thumbnail-mars-nasa-v1-final.png — 2.0MB ✅
  • thumbnail-mars-nasa-v2.png — 949KB ✅
  • thumbnail-mars-nasa-v2-final.png — 2.0MB ✅ (recommended version)

Notes: Agent hit expired key first, recovered via cloud proxy. Left behind corrupt 3-byte file from first failed attempt. Generated 2 variations instead of 1 — went above and beyond.

📋 D2: Cooking video thumbnail (discovery)

Prompt: "I need to create a compelling clickable preview image for a cooking video called '5 Meals Under $5 That Actually Taste Amazing'. The image should follow best practices for video preview images — bold colors, simple composition, readable at small sizes. Generate an actual image file I can use, not just a description. Save it to a file."

Tool calls:

  1. readskills/thumbnail-creator/SKILL.md ✅ Found the specialized skill
  2. readskills/gemini-image/SKILL.md ✅ Read foundational dependency
  3. exec → Gemini API (cloud proxy) → Base image generated (~793KB)
  4. exec → ImageMagick double-pass text overlay: "5 MEALS UNDER $5" (yellow) + "THAT ACTUALLY TASTE AMAZING" (white on dark banner)

Artifacts:

  • thumbnail-5-meals-under-5-base.png — 793KB ✅
  • thumbnail-5-meals-under-5-final.png — 1.7MB ✅

D2 Final Thumbnail
1.7MB, 1376×768 PNG — bold yellow text, cooking imagery, $5 bill

Notes: Clean execution. Used cloud proxy from the start (likely learned from environment skill). Applied a more sophisticated double-pass text overlay (separate title + subtitle with semi-transparent banner) — creative use of ImageMagick beyond what the skill explicitly documents.

📋 E1: AI Tools thumbnail — 3 variations (explicit)

Prompt: "You have a skill called 'thumbnail-creator' in your available skills. Read its SKILL.md first, then use it to generate a YouTube thumbnail for a video about '10 AI Tools That Will Replace Your Job in 2026'. Follow the skill's workflow exactly..."

Tool calls:

  1. readskills/thumbnail-creator/SKILL.md
  2. readskills/gemini-image/SKILL.md
  3. execmkdir -p ~/Photos/thumbnails
  4. exec → Gemini API (direct) × 3 → All 3 returned 3-byte files (API key expired)
  5. exec → Debug: checked file sizes, discovered the issue
  6. exec → Gemini API (cloud proxy) × 3 → All succeeded (~805KB, ~822KB, ~610KB)
  7. exec → Base64 decode and save
  8. exec → Check available fonts (Helvetica-Bold)
  9. exec → ImageMagick text overlays × 3:
    • V1: "YOUR JOB IS NEXT" (white, NorthWest)
    • V2: "HUMAN vs AI" (yellow, Center)
    • V3: "10 AI TOOLS" (white, NorthWest)
  10. read → Visual verification of all 3 final images

Artifacts:

  • 3 base images (v1.png, v2.png, v3.png) — ~600-820KB each ✅
  • 3 text-overlay images (v1-text.png, v2-text.png, v3-text.png) — ~1.4-1.8MB each ✅

E1 V1 with text
1.8MB — Shocked person + robot arm, red gradient, "YOUR JOB IS NEXT"

Notes: The agent followed the workflow exactly (Steps 1→5). First 3 attempts with direct API all failed silently (jq extracted "null", base64 decoded to 3 bytes). Agent noticed the 3-byte files, debugged, discovered proxy, regenerated all 3. Good error recovery but highlights the expired-key problem.

📋 E2: Video URL workflow — Rickroll (explicit)

Prompt: "You have a skill called 'thumbnail-creator' in your available skills. Read its SKILL.md first, then use it to generate a YouTube thumbnail from this video URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ..."

Tool calls:

  1. readskills/thumbnail-creator/SKILL.md
  2. readskills/gemini-image/SKILL.md, skills/Supadata/SKILL.md
  3. exec → Supadata /youtube/video?id=dQw4w9WgXcQ → timed out (SIGTERM)
  4. exec → Supadata /youtube/transcript?videoId=dQw4w9WgXcQ&text=true → ✅ Full lyrics returned
  5. exec → Retry /youtube/video → timed out again (exit code 28)
  6. exec → Supadata /metadata?url=... → ✅ Full metadata (title, 1.7B views, 18.8M likes, tags, description)
  7. Agent analyzed: hook = Rickroll meme energy, emotion = playful confidence/nostalgia
  8. exec → Gemini API (direct) × 2 → Both failed (API key expired)
  9. read → environment SKILL.md → Discovered cloud proxy
  10. exec → Gemini API (cloud proxy) × 2 → V1 synthwave dance (814KB) + V2 "gotcha" close-up (648KB)
  11. Verified images via file command → Both JPEG data despite .png extension

Artifacts:

  • thumbnail-rickroll-v1.png — 796KB ✅ (actually JPEG)
  • thumbnail-rickroll-v2.png — 633KB ✅ (actually JPEG)

E2 Rickroll V2
633KB, 1376×768 — Confident man "gotcha" pose, orange/magenta gradient

Notes: Full video URL workflow exercised. Supadata /youtube/video endpoint unreliable (2 timeouts), but /metadata and /youtube/transcript worked. Agent correctly identified the JPEG-vs-PNG mismatch. Skipped text overlay (valid — skill says it's optional). Good content analysis from metadata + transcript.

📋 E3: MacBook Pro thumbnail with text overlay (explicit)

Prompt: "You have a skill called 'thumbnail-creator' in your available skills. Read its SKILL.md first, then use it to generate a YouTube thumbnail for a tech review video titled 'M5 MacBook Pro: Worth the Upgrade?'. After generating the base image, use Option B from the skill — add a text overlay saying 'M5 PRO' using ImageMagick's convert command..."

Tool calls:

  1. readskills/thumbnail-creator/SKILL.md
  2. readskills/gemini-image/SKILL.md
  3. execmkdir -p ~/Photos/thumbnails
  4. exec → Gemini API (direct) → null response (API key expired)
  5. exec → Debug: checked raw response → 400 error confirmed
  6. read → environment SKILL.md → Discovered cloud proxy
  7. exec → Gemini API (cloud proxy) → 612KB base image ✅
  8. image → Visual verification of base (MacBook, shocked face, neon arrows, no text) ✅
  9. exec → Font check: Liberation-Sans-Bold available ✅
  10. execconvert base.png -gravity Center -font Liberation-Sans-Bold -pointsize 80 -fill white -stroke black -strokewidth 3 -annotate +0+0 'M5 PRO' final.png
  11. image → Visual verification of final (text overlay centered, white with black stroke, readable) ✅

Artifacts:

  • thumbnail-m5-macbook-base.png — 598KB ✅
  • thumbnail-m5-macbook-final.png — 1.4MB ✅

E3 Final Thumbnail
1.4MB, 1376×768 PNG — MacBook Pro with "M5 PRO" text overlay

Notes: Cleanest execution. Followed the skill's Option B exactly: used the convert -annotate pattern from the SKILL.md. Font recommendation (Liberation-Sans-Bold) was correct and available. Verified both base and final images visually. Only friction was the standard expired-key issue.

@MZULALI MZULALI added needs-changes Tests pass but issues found that builder should fix before merge and removed needs-testing Builder finished, ready for reviewer to test labels Feb 28, 2026
1. Cloud proxy fallback note in Foundational Skills Used section
2. File format note (Gemini returns JPEG despite docs saying PNG)
3. Recommend Supadata /metadata over /youtube/video (more reliable)
4. Output validation tip (check file size > 1KB after decode)
@MZULALI MZULALI added needs-testing Builder finished, ready for reviewer to test and removed needs-changes Tests pass but issues found that builder should fix before merge labels Feb 28, 2026
@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Feb 28, 2026

Fixes Applied (3ba9f9e) — 2nd Review Issues

Addressed all 4 items from the second review:

1. Cloud proxy fallback note (Issue 1 — medium)

  • Added a prominent blockquote in "Foundational Skills Used" section explaining the cloud proxy fallback pattern
  • Includes the proxy URL pattern and points to the environment skill for details
  • This should eliminate the ~30-60s error recovery every agent was doing

2. JPEG vs PNG file format note (Issue 2 — low)

  • Added a "File format note" blockquote in Step 3 explaining Gemini may return JPEG regardless of docs
  • Suggests using file <output> to verify and rename if needed

3. Preferred Supadata endpoint (Issue 3 — low)

  • Updated Step 1 to explicitly recommend /metadata over /youtube/video
  • Notes that /youtube/video is slower and prone to timeouts
  • Added serpapi-youtube as fallback if /metadata also fails

4. Output validation tip (Issue 4 — low)

  • Added blockquote in Step 3: check file size > 1KB after base64 decode
  • Explains the 3-byte corrupt file symptom (null API response)

Ready for re-test.

@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Feb 28, 2026

🖼️ Skill Review: thumbnail-creator (Specialized)

Reviewer: Choug (automated skill tester)
Date: 2026-02-28 23:16 UTC
PR: #11 (skill/thumbnail-creator)
Testing: 5 sub-agent sessions (2 discovery + 3 explicit)


📋 CICD Checklist

Check Result Notes
SKILL.md parses cleanly Valid frontmatter, good description with use-case triggers
Description triggers discoverable Mentions YouTube thumbnail, click-worthy, A/B testing, social media
Metadata structure emoji: 🖼️, no requires.env (relies on foundational skills)
No raw API keys References $GOOGLE_API_KEY but never hardcodes values
No raw curl to external APIs ⚠️ See note below — references specific endpoint paths
Foundational skills referenced gemini-image (primary), supadata (video context), serpapi-youtube (fallback)
Workflow is followable 5 clear steps, all agents followed them correctly
Cloud proxy fallback documented Correctly tells agents to fall back to proxy on key expiry
Text overlay instructions Both Option A (Gemini-rendered) and Option B (ImageMagick) documented
Discovery agents found the skill Both D1 and D2 used the thumbnail-creator workflow
Explicit agents completed workflow All 3 followed Steps 1–5 correctly

🔬 Phase 1: Discovery Testing (2 sessions)

D1 Prompt: "I need to create an eye-catching preview image for a YouTube video about '5 Hidden Features in VS Code That Will Blow Your Mind'. The image should be bold, colorful, and optimized for getting clicks..."

D1 Result:PASS — Agent discovered and used the thumbnail-creator skill via its description match. It:

  • Read the thumbnail-creator SKILL.md → followed its workflow
  • Read the gemini-image SKILL.md → used Gemini API via cloud proxy
  • Generated 2 variations with different compositions
  • Applied text overlays via ImageMagick (Option B)
  • Validated output with image analysis
  • Final output: 2 thumbnails at 1376×768 (16:9), rated 8/10 and 5/10 by image analysis

D2 Prompt: "I have this YouTube video: https://www.youtube.com/watch?v=rfscVS0vtbw — I want to design a bold, attention-grabbing preview image for it..."

D2 Result:PASS — Agent discovered and used the full thumbnail-creator workflow including Step 1 (video context extraction). It:

  • Read thumbnail-creator SKILL.md → followed Steps 1→5
  • Used Supadata /metadata endpoint to extract video info (title, views, tags)
  • Generated 3 base image variations via Gemini cloud proxy
  • Evaluated all 3, chose winner based on CTR potential
  • Applied text overlay with ImageMagick in Python brand colors
  • Final: 4 files saved (3 variations + 1 final with text)

Discovery Verdict: Both discovery agents correctly matched the task description to the thumbnail-creator skill AND used its specialized workflow (not just the foundational gemini-image skill directly). The skill's description triggers are effective.


🧪 Phase 2: Explicit Testing (3 sessions)

E1 Prompt: "Use the thumbnail-creator skill to create a YouTube thumbnail for 'Why Everyone Is Switching to Rust in 2026'. Generate at least 2 variations."

E1 Result:PASS — Full workflow followed:

  • Read both thumbnail-creator and gemini-image SKILL.md files
  • Used cloud proxy for Gemini API
  • Generated 2 base images with distinct prompt strategies
  • Applied text overlays via ImageMagick with Liberation-Sans-Bold
  • Both outputs: 1376×768, 16:9, >1KB validation passed
  • Runtime: 1m33s

E2 Prompt: "Use the thumbnail-creator skill to create a YouTube thumbnail from video URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ. Follow the full workflow (Steps 1 through 5)."

E2 Result:PASS — Complete Steps 1–5 executed:

  • Step 1: Supadata /metadata → extracted title, 1.7B views, tags, description ✅
  • Step 2: Crafted 2 distinct prompts following the formula ✅
  • Step 3: Direct API returned "key expired" → correctly fell back to cloud proxy ✅
  • Step 4: ImageMagick text overlay with Helvetica-Bold, stroke+fill ✅
  • Step 5: 4 files delivered (2 base + 2 with text) ✅
  • Image analysis confirmed readable text, correct spelling, suitable aesthetics
  • Runtime: 2m55s

E3 Prompt: "Use the thumbnail-creator skill for '10 AI Tools That Replace Entire Teams' — add text overlay 'AI TAKEOVER' using Option B (ImageMagick) as described in Step 4."

E3 Result:PASS — Focused test of ImageMagick workflow:

  • Generated base image via Gemini cloud proxy
  • Applied centered text overlay: -gravity Center -font Helvetica-Bold -pointsize 80 -fill white -stroke black -strokewidth 3
  • Note: Used Helvetica-Bold instead of Liberation-Sans-Bold (both valid)
  • Output: 1376×768 PNG, 1.3MB
  • Runtime: 1m43s

📊 Summary Scorecard

Dimension Score Details
Discoverability 5/5 Both natural prompts matched the skill via description
Specialized workflow used 5/5 All agents followed the thumbnail-creator workflow, not just raw Gemini
API integration 5/5 Gemini, Supadata, ImageMagick all worked correctly
Error handling 5/5 Cloud proxy fallback triggered and worked as documented
Output quality 4/5 All outputs valid 16:9 thumbnails. Minor: E3 agent couldn't view images due to path restrictions
Documentation clarity 4/5 Clear and followable. Minor: mentions specific API endpoint paths from foundational skills inline

Overall: 28/30


🔍 Observations & Minor Notes

  1. Endpoint paths referenced inline: The SKILL.md says "Use the supadata skill's /metadata endpoint" — this works fine in practice (agents read the foundational skill anyway), but it slightly leaks implementation details into the specialized skill. Not a blocker.

  2. Font availability: The skill recommends Liberation-Sans-Bold, but not all agents found it. E3 used Helvetica-Bold as a fallback. The skill could mention both as options, or note that convert -list font | grep Bold should be run to find available fonts. Very minor.

  3. Image viewer path restrictions: E3 couldn't use the image tool to verify output because ~/Photos/ isn't in the allowed image analysis directories. This is an environment limitation, not a skill bug. The agent worked around it by using the read tool instead.

  4. All 5 agents produced valid output: Every session generated at least one complete thumbnail with text overlay. Zero failures. Zero crashes.

  5. Gemini JPEG-as-PNG behavior: All agents correctly handled Gemini returning JPEG data despite claiming PNG mimeType, as the skill documents. The warning in the skill is accurate and useful.


✅ Verdict: tested-pass

Zero bugs found. All 5 sub-agents (2 discovery + 3 explicit) successfully completed the full thumbnail creation workflow. Discovery agents matched the skill naturally without being told its name. The specialized workflow adds clear value over using foundational skills directly (design principles, prompt formula, text overlay instructions, validation steps). Ready to merge.

@MZULALI MZULALI added tested-pass Reviewer verified the skill works and removed needs-testing Builder finished, ready for reviewer to test labels Feb 28, 2026
@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Mar 1, 2026

🖼️ Skill Review: thumbnail-creator — Needs Changes

Reviewer: Choug
PR: #11 (skill/thumbnail-creator)
Commit: a5eda0d
Result: ⚠️ NEEDS CHANGES

Issue: Remove ImageMagick text overlay path

The skill currently treats AI-generated text in images as unreliable and recommends ImageMagick (convert) as the "recommended" approach for text overlays (Step 4, Option B). This is outdated — Gemini handles text in images well now and the ImageMagick path produces uglier results.

What to change:

  1. Step 4 — Remove Option B (ImageMagick) entirely. Delete the convert command block, the font notes, and all the ImageMagick-specific guidance. Replace the whole two-option structure with straightforward guidance: bake text into the Gemini prompt. One line mentioning that if Gemini ever gets text wrong you can regenerate or adjust the prompt is fine, but no ImageMagick fallback.

  2. Step 2 — Remove the blanket no-text prompt instruction. The prompt formula currently always appends "Do NOT include any text, words, letters, or numbers in the image" and there's an "Important" callout reinforcing this. Remove both — the default should be to include desired text in the generation prompt, not to suppress it. If the user wants a text-free base image they can say so.

  3. Step 4 intro language — Remove "AI-generated text is often unreliable (misspelled, distorted)." Gemini's text rendering is solid now. Just say something like "Include your desired text directly in the generation prompt. Keep it under 5 words, bold and high-contrast."

TL;DR: Trust Gemini for text. Drop ImageMagick completely.

@MZULALI MZULALI added needs-changes Tests pass but issues found that builder should fix before merge and removed tested-pass Reviewer verified the skill works labels Mar 1, 2026
- Remove Option B (ImageMagick text overlay) entirely from Step 4
- Remove blanket 'Do NOT include any text' from Step 2 prompt formula
- Remove 'AI-generated text is often unreliable' language
- Default to baking text into Gemini prompt directly (under 5 words)
- Add 'no text needed' instruction as opt-in for text-free thumbnails
- Clean up font references and Quick Reference table
@MZULALI MZULALI added needs-testing Builder finished, ready for reviewer to test and removed needs-changes Tests pass but issues found that builder should fix before merge labels Mar 1, 2026
@MZULALI
Copy link
Copy Markdown
Contributor Author

MZULALI commented Mar 1, 2026

🖼️ Skill Review: thumbnail-creator (Specialized)

Reviewer: Choug (automated skill testing agent)
Date: 2026-03-01 21:10–21:20 UTC
PR: #11 — thumbnail-creator specialized skill
Test Type: 2 Discovery + 3 Explicit sub-agent sessions (5 total)


📋 Pre-Flight Checks

Check Status
SKILL.md valid frontmatter ✅ name, description, emoji present
requires.env ✅ No env required (relies on foundational skills)
Foundational dependencies ✅ gemini-image, supadata, serpapi-youtube all referenced
No raw API endpoints (specialized rule) ✅ Only references foundational skills and environment cloud proxy
Cloud proxy fallback documented ✅ Correct pattern described

🔬 Phase 1: Discovery Testing (2 sessions)

Discovery prompts describe tasks only — no mention of skill name, API, service, or brand.

D1 — "Create an eye-catching preview image for my espresso machines video"

Prompt:

I'm starting a YouTube channel about home espresso machines. Can you create an eye-catching preview image for my first video? The video is titled "5 Espresso Machines Under $200 That Actually Make Great Coffee". I want something bold and clickable that would stand out in a YouTube feed. Save the image to my workspace.

Result: ✅ PASS

  • Agent discovered and used the thumbnail-creator skill autonomously
  • Used gemini-image foundational skill for generation via cloud proxy
  • Generated 2 variations at 16:9 (1376×768): V1 bright/bold (9/10 click-worthy), V2 moody cinematic (6/10)
  • Correct file handling: saved to ~/Photos/thumbnails/, verified file sizes
  • Specialized workflow used (not just raw gemini-image): followed thumbnail design principles, crafted YouTube-specific prompts

D2 — "Make me a thumbnail image for this YouTube video URL"

Prompt:

I have this YouTube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ — Can you make me a thumbnail image for it? Something that would get lots of clicks. Look at what the video is about first, then design a bold, attention-grabbing thumbnail image. Save the result to my workspace.

Result: ✅ PASS

  • Agent discovered and used the thumbnail-creator skill autonomously
  • Full URL workflow executed: Used supadata /metadata endpoint to extract video info (Rick Astley, 1.7B views), then crafted context-aware prompts
  • Generated 3 variations: retro 80s, meme-style, split-screen concept
  • All variations were thematically relevant to the actual video content
  • Even included a fake YouTube timestamp overlay — nice attention to detail
  • Key test: Discovery agent used the SPECIALIZED workflow (metadata extraction → context-aware prompt crafting → generation) rather than just calling gemini-image directly

Discovery Verdict: ✅ STRONG PASS

Both discovery agents found and used the thumbnail-creator skill's specialized workflow, including the URL-based metadata extraction pipeline. The skill description is well-written enough to be matched by task-based prompts.


🎯 Phase 2: Explicit Testing (3 sessions)

E1 — Topic-based generation: "Why Everyone Is Switching to Linux in 2026"

Prompt: Told to read thumbnail-creator SKILL.md, follow workflow, generate 2+ variations.

Result: ✅ PASS

  • Read thumbnail-creator → gemini-image → environment skills in sequence
  • Followed all 5 steps: content analysis → prompt crafting → generation → text overlay → delivery
  • Generated 3 variations: reaction face (8/10), VS split-screen (6/10), epic pilgrimage (7.5/10)
  • Used cloud proxy correctly
  • Corrected .png → .jpg extensions (as documented in skill)
  • Visually inspected results using image tool

E2 — URL-based generation: 3Blue1Brown neural network video

Prompt: Told to use supadata/serpapi-youtube for context extraction, then generate.

Result: ✅ PASS

  • Read thumbnail-creator → gemini-image → supadata → environment skills
  • Step 1: Used supadata /metadata endpoint — extracted title, description, channel, 22M+ views
  • Step 2: Identified hook (demystifying neural networks), emotion (curiosity), key visuals
  • Step 3: Generated 3 variations via cloud proxy, all at 16:9
  • Step 4: Text overlays rendered correctly: "WHAT IS THIS?", "DEEP LEARNING", "HOW MACHINES LEARN"
  • Step 5: Saved with descriptive names, renamed to .jpg
  • V3 directly referenced the digit recognition concept from the actual video — excellent context usage

E3 — Text overlay + style variations: "I Ate Only Ramen For 30 Days"

Prompt: Told to follow Step 4 for text overlay, generate clickbaity + clean/professional variations.

Result: ✅ PASS

  • Generated all 3 in parallel (~30s each)
  • Main: "30 DAYS OF RAMEN" — bold white with black outline, excellent readability (★★★★★)
  • Clickbait: "ONLY RAMEN" + "30 DAYS" red badge — maximum energy, perfect clickbait style (★★★★☆)
  • Clean/Professional: "30 DAYS" — thin white sans-serif, elegant but low contrast (★★★☆☆)
  • File format note confirmed: Gemini returns JPEG with image/png mimeType
  • All 3 stylistically distinct and matched their intended variation type

📊 Aggregate Results

Test Discovery? Skill Used Workflow Complete Images Generated Text Overlays
D1 (espresso topic) ✅ Found skill thumbnail-creator + gemini-image ✅ Steps 1-5 2 (690KB, 787KB) ✅ Readable
D2 (rickroll URL) ✅ Found skill thumbnail-creator + supadata + gemini-image ✅ Steps 1-5 3 (843-890KB) ✅ Readable
E1 (Linux topic) N/A thumbnail-creator + gemini-image + environment ✅ Steps 1-5 3 (706-838KB) ✅ Readable
E2 (3B1B URL) N/A thumbnail-creator + supadata + gemini-image + environment ✅ Steps 1-5 3 (584-754KB) ✅ Readable
E3 (ramen + styles) N/A thumbnail-creator + gemini-image + environment ✅ Steps 1-5 3 (596-996KB) ✅ Readable

Total: 14 thumbnails generated across 5 sessions, 0 failures


🔍 Visual Verification

I spot-checked several generated thumbnails. Confirmed:

  • ✅ Bold YouTube-style thumbnails with readable text at small sizes
  • ✅ 16:9 aspect ratio (1376×768)
  • ✅ High contrast, saturated colors
  • ✅ Text overlays baked into the image (not post-processed)
  • ✅ Quality appropriate for actual YouTube use

✅ What Works Well

  1. Skill description triggers discovery — Both discovery agents found the skill from natural task descriptions
  2. Complete specialized workflow — URL → metadata extraction → context-aware prompt → generation → text overlay → delivery
  3. Foundational skill integration — Clean references to gemini-image, supadata, serpapi-youtube without duplicating their docs
  4. Cloud proxy fallback — Correctly documented and used by all agents
  5. Design principles — The thumbnail-specific guidance (bold text, emotional faces, high contrast) produces genuinely click-worthy results
  6. Style variations — Agents successfully generated distinct clickbait vs. clean vs. standard styles
  7. File format note — Correctly documents that Gemini returns JPEG with image/png mimeType

⚠️ Minor Observations (Not Bugs)

  1. Clean/professional variant text contrast — E3's clean variant had thin white text on marble background that could be hard to read at small sizes. The skill could add a note that even "clean" thumbnails need readable text for YouTube context. (Design guidance improvement, not a bug.)
  2. Consistent file naming — Some agents saved to ~/Photos/thumbnails/, others to ~/.openclaw/workspace/Pictures/thumbnails/. The skill says "default save location is ~/Photos/" (from gemini-image) but the environment skill says "Images, screenshots go in Pictures/". Minor inconsistency inherited from gemini-image, not this skill's problem.

🏷️ Verdict: tested-pass

Zero bugs found. All 5 test sessions completed successfully. Discovery agents found and used the specialized workflow. URL-based metadata extraction pipeline works. Text overlays render correctly. The skill is well-structured, has clear foundational skill references, and produces genuinely useful output.

Ready to merge.

@MZULALI MZULALI added tested-pass Reviewer verified the skill works and removed needs-testing Builder finished, ready for reviewer to test labels Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

specialized Builds on foundational skills tested-pass Reviewer verified the skill works

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant