[prompt-clustering] 🔬 Copilot Agent Prompt Clustering Analysis - November 2025 #4433
Replies: 1 comment 2 replies
-
|
/q update agentic workflow to ignore changes in workflow lock files (*.lock.yml) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🔬 Copilot Agent Prompt Clustering Analysis - November 2025
NLP-based clustering analysis of 1,054 copilot agent task prompts to identify patterns, success factors, and optimization opportunities.
Overview
This analysis uses machine learning (TF-IDF vectorization and K-means clustering) to automatically categorize copilot agent tasks from the last 30 days. By clustering similar prompts together, we can identify which types of tasks the agent excels at and where improvements are needed.
Analysis Period: Last 30 days (October 21 - November 20, 2025)
Total Tasks Analyzed: 1,054 copilot-created PRs
Clusters Identified: 6 distinct task categories
Overall Success Rate: 75.7% (798/1,054 PRs merged)
Silhouette Score: 0.058 (moderate cluster separation)
Quick Insights
Methodology & Technical Details
Methodology
Data Collection
NLP Pipeline
Text Preprocessing:
Feature Extraction:
Clustering:
Metrics Collected
Cluster Analysis
Cluster 1: Documentation Updates (34.6% of tasks)
Size: 365 tasks | Success Rate: 75.1% (274 merged) | Avg Files: 18.9 | Avg Lines: +673
Characteristics:
update,add,job,remove,json,make,error,commandSample Tasks:
Insight: Documentation tasks are the most common but also the most complex, touching many files. The 75% success rate suggests these tasks are generally well-handled, but the high file count indicates they may benefit from being broken into smaller subtasks.
Cluster 2: Bug Fixes & Workflow Files (17.2% of tasks)
Size: 181 tasks | Success Rate: 75.7% (137 merged) | Avg Files: 12.2 | Avg Lines: +389
Characteristics:
workflows,aw,gh,gh aw,md,workflow,github,githubnextSample Tasks:
Insight: Bug fixes in workflow files have similar success rates to documentation but are more focused. The agent handles these well with an average success rate.
Cluster 3: Code Refactoring & Package Work (15.8% of tasks) ⭐
Size: 167 tasks | Success Rate: 80.8% (135 merged) | Avg Files: 8.8 | Avg Lines: +486
Characteristics:
pkg,pkg workflow,workflow,functions,code,validationSample Tasks:
Insight: 🎯 This is the agent's sweet spot! Code refactoring tasks have the highest success rate (80.8%) and lowest average file count (8.8). These focused, technical tasks play to the agent's strengths.
Cluster 4: New Features & Agentic Workflows (14.8% of tasks)
Size: 156 tasks | Success Rate: 73.1% (114 merged) | Avg Files: 10.5 | Avg Lines: +1,613
Characteristics:
agentic,workflow,workflows,update,add,daily,create,githubSample Tasks:
Insight: New feature additions have slightly lower success (73.1%) but create the most code. These ambitious tasks may benefit from more detailed requirements or iterative implementation.
Cluster 5: CLI & Version Management (11.0% of tasks)
Size: 116 tasks | Success Rate: 72.4% (84 merged) | Avg Files: 18.9 | Avg Lines: +376
Characteristics:
version,cli,issue,section,release,update,help,changesSample Tasks:
Insight: CLI and versioning tasks show the lowest success rate (72.4%). These may require more domain-specific knowledge or careful testing considerations.
Cluster 6: Agent Instructions & MCP (6.5% of tasks)
Size: 69 tasks | Success Rate: 78.3% (54 merged) | Avg Files: 16.3 | Avg Lines: +1,046
Characteristics:
agent,copilot,instructions,docs,mcp,make,context,githubSample Tasks:
Insight: Self-improvement tasks (agent working on agent infrastructure) have good success rate (78.3%), showing the agent can effectively enhance its own capabilities.
Comparative Analysis
Success Rate by Cluster
Complexity vs Success
Lines Added vs Success
Small Changes (< 500 lines): Cluster 5 - 72.4%
Medium Changes (500-700 lines): Cluster 1,2,3 - 75-81%
Large Changes (> 1,000 lines): Cluster 4,6 - 73-78%
Key Finding: Medium-sized changes (500-700 lines) have the best success rates. Very small changes may lack context, while very large changes increase complexity.
Recommendations
✅ Leverage Success Patterns
Code Refactoring Tasks Work Best (80.8% success)
pkgcode, function extraction, and validationAgent Self-Improvement Is Effective (78.3% success)
Bug Fixes Are Reliable (75.7% success)
🎯 Optimization Opportunities
Break Down Complex Documentation Tasks
Improve CLI/Versioning Task Success (Currently 72.4%)
Refine Feature Addition Prompts (Currently 73.1%)
📊 General Insights
Optimal Task Size:
Prompt Engineering Tips:
When to Use Copilot Agent (based on success data):
Statistical Details
Overall Metrics
Distribution
Cluster Size Distribution
Appendix: Notable Examples
Largest Successful Changes
Representative Sample Across Clusters
Next Steps
For Prompt Writers
For Tool Developers
For Future Analysis
Analysis Generated: 2025-11-20 19:29 UTC
Tools Used: Python 3.12, scikit-learn 1.6.0, pandas, numpy
Methodology: TF-IDF vectorization → K-means clustering (k=6) → Theme inference
Data Source: githubnext/gh-aw repository, copilot-created PRs from 2024-10-21 to 2025-11-20
Reference:
Beta Was this translation helpful? Give feedback.
All reactions