This evaluation system is designed to measure the quality of both the Blueprint (input) and Code Implementation (output). The goal is to ensure AI has enough context to work effectively and the result meets production standards.
Scale: 0 - 100 Goal: > 85 points to start coding with AI.
- Clear Project Name (2pts)
- Vision Statement: Clear, concise, highlights value (4pts)
- Target Audience: Clearly defined users (2pts)
- Success Metrics: Measurable success indicators (2pts)
- Frontend: Framework, UI library, State management (4pts)
- Backend: Language, Framework, API style (REST/GraphQL) (4pts)
- Database: DB type, ORM/Query builder (4pts)
- Deployment/Infra: Hosting, CI/CD, Services (3pts)
- System Diagram/Flow: Data flow description (5pts)
- Authentication: Clear auth mechanism (5pts)
- Integrations: List of 3rd party services (5pts)
- Security: Basic security measures (5pts)
- Tables: Complete list of tables (5pts)
- Columns: Data types, constraints (NOT NULL, UNIQUE) (5pts)
- Relationships: FKs, 1:1, 1:N, N:M relationships (5pts)
- SQL: Sample CREATE TABLE statements (5pts)
- Indexes: Index definitions for performance (5pts)
- Phasing: Clear Weeks/Sprints (5pts)
- Logical Order: Foundation -> Core -> Polish (5pts)
- Specific Tasks: Detailed checklist for each phase (5pts)
- User Tiers/Roles: Clear permissions (5pts)
- Limits/Constraints: System limits (5pts)
- Business Logic: Data processing rules (5pts)
Scale: 0 - 100 Goal: > 90 points to deploy.
- Buildable: Builds and starts without errors (10pts)
- Functional: Meets requirements in Blueprint (10pts)
- Logic: Handles edge cases correctly (10pts)
- Type Safety: No
any, full types (TypeScript) (10pts) - Clean Code: Clear variable names, concise functions, DRY (5pts)
- Structure: Logical folder structure, modular (5pts)
- Comments: JSDoc for complex functions (5pts)
- Unit Tests: Coverage > 80% for core logic (10pts)
- Integration Tests: Test main flows (API, DB) (5pts)
- E2E Tests: Test critical user flows (5pts)
- Performance: No N+1 queries, lazy loading (5pts)
- Security: Input sanitization, Auth checks, RLS (5pts)
- Best Practices: Correct usage of framework features (5pts)
- README: Setup and run instructions (5pts)
- API Docs: Endpoint descriptions (if applicable) (5pts)
This metric measures the effectiveness of using AI in the project.
-
Zero-shot Accuracy (%): Percentage of code generated correctly on the first try.
-
80%: Excellent
- 60-80%: Good
- < 60%: Blueprint needs improvement
-
-
Iteration Count: Number of re-prompts needed to complete 1 task.
- 1 time: Excellent
- 2-3 times: Normal
-
3 times: Poor Prompt or Context
-
Human Intervention (%): Percentage of code manually fixed.
- < 10%: Excellent
- 10-30%: Acceptable
-
30%: Process needs review
Based on average of Blueprint and Code scores:
| Grade | Avg Score | Description | Action |
|---|---|---|---|
| S (Elite) | 95-100 | Perfect, ready to scale | Deploy immediately |
| A (High) | 85-94 | High quality, minor issues | Light review & Deploy |
| B (Good) | 70-84 | MVP standard, needs polish | Refactor weak parts |
| C (Fair) | 50-69 | Functional but risky | Major refactor needed |
| D (Poor) | < 50 | Substandard | Rewrite Blueprint |
We are developing sbbl evaluate tool for automated scoring:
# Evaluate Blueprint
sbbl evaluate blueprint ./BLUEPRINT.md
# Evaluate Codebase
sbbl evaluate code ./src
# Evaluate entire project
sbbl evaluate project .Sample Output:
📊 SBBL Evaluation Report
-------------------------
Blueprint Score: 92/100 (Grade A)
Code Score: 88/100 (Grade A)
-------------------------
Overall Grade: A (High Quality)
Recommendations:
- [Code] Add unit tests for `auth.service.ts` (+5 pts)
- [Blueprint] Add indexes for `orders` table (+3 pts)