[feat]add Agent Skill graders and skills evaluation cookbook by helloml0326 · Pull Request #162 · agentscope-ai/OpenJudge

helloml0326 · 2026-04-07T11:45:24Z

OpenJudge Version

[The version of OpenJudge you are working on, e.g. import openjudge; print(openjudge.__version__)]

Description

[Please describe the background, purpose, changes made, and how to test this PR]

Checklist

Please check the following items before code is ready to be reviewed.

Code has been formatted with pre-commit run --all-files command
All tests are passing
Docstrings are in Google style
Related documentation has been updated (e.g. links, examples, etc.)
Code is ready for review

Introduce SkillThreatAnalysisGrader, SkillDeclarationAlignmentGrader, SkillDesignGrader, and refresh completeness/relevance graders. Remove legacy comprehensive, pairwise, safety, and structure skill graders and their tests. Add cookbooks/skills_evaluation with SkillsGradingRunner, loader models, and README. Document skill graders in docs/built_in_graders/skills.md and link from overview. Announce Skill Graders in README and README_zh. Made-with: Cursor

- completeness/design/relevance: suppress W0613 for script_contents and reference_contents kept for API parity (consumed via SkillsGradingRunner) - declaration_alignment: disable too-many-lines (1138 lines) and move unused injection_fix into the findings dict (was W0612) - test_skill_completeness: catch openai.RateLimitError in consistency test and skip rather than fail - test_skill_design: gate test_accuracy_vs_expected behind RUN_ACCURACY_TESTS (strong-model-only) to prevent false failures with qwen3.5-plus; add RateLimitError skip guard to both quality tests - .pre-commit-config.yaml: use .venv/bin/python -m pytest so pre-commit picks up the project venv where pytest is installed Made-with: Cursor

- Reorder imports and wrap SkillDeclarationAlignmentGrader imports - Reflow textwrap.dedent prompt strings in threat_analysis grader - Apply Black-style line breaks in runner, skill_models, evaluate_skills - Expand long literals in declaration_alignment and threat_analysis tests Made-with: Cursor

Restore `python -m pytest` instead of `.venv/bin/python -m pytest`. Made-with: Cursor

helloml0326 added 4 commits April 7, 2026 18:27

chore: revert pre-commit pytest hook to system python

ec635b0

Restore `python -m pytest` instead of `.venv/bin/python -m pytest`. Made-with: Cursor

XiaoBoAI approved these changes Apr 7, 2026

View reviewed changes

XiaoBoAI merged commit 22436a5 into agentscope-ai:main Apr 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat]add Agent Skill graders and skills evaluation cookbook #162

[feat]add Agent Skill graders and skills evaluation cookbook #162
XiaoBoAI merged 4 commits intoagentscope-ai:mainfrom
helloml0326:zhuohua/skills_graders

helloml0326 commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

helloml0326 commented Apr 7, 2026

OpenJudge Version

Description

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants