Skip to content

[feat]add Agent Skill graders and skills evaluation cookbook #162

Merged
XiaoBoAI merged 4 commits intoagentscope-ai:mainfrom
helloml0326:zhuohua/skills_graders
Apr 7, 2026
Merged

[feat]add Agent Skill graders and skills evaluation cookbook #162
XiaoBoAI merged 4 commits intoagentscope-ai:mainfrom
helloml0326:zhuohua/skills_graders

Conversation

@helloml0326
Copy link
Copy Markdown
Collaborator

OpenJudge Version

[The version of OpenJudge you are working on, e.g. import openjudge; print(openjudge.__version__)]

Description

[Please describe the background, purpose, changes made, and how to test this PR]

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with pre-commit run --all-files command
  • All tests are passing
  • Docstrings are in Google style
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

Introduce SkillThreatAnalysisGrader, SkillDeclarationAlignmentGrader,
SkillDesignGrader, and refresh completeness/relevance graders. Remove legacy
comprehensive, pairwise, safety, and structure skill graders and their tests.

Add cookbooks/skills_evaluation with SkillsGradingRunner, loader models, and
README. Document skill graders in docs/built_in_graders/skills.md and link
from overview. Announce Skill Graders in README and README_zh.

Made-with: Cursor
- completeness/design/relevance: suppress W0613 for script_contents and
  reference_contents kept for API parity (consumed via SkillsGradingRunner)
- declaration_alignment: disable too-many-lines (1138 lines) and move unused
  injection_fix into the findings dict (was W0612)
- test_skill_completeness: catch openai.RateLimitError in consistency test
  and skip rather than fail
- test_skill_design: gate test_accuracy_vs_expected behind RUN_ACCURACY_TESTS
  (strong-model-only) to prevent false failures with qwen3.5-plus; add
  RateLimitError skip guard to both quality tests
- .pre-commit-config.yaml: use .venv/bin/python -m pytest so pre-commit
  picks up the project venv where pytest is installed

Made-with: Cursor
- Reorder imports and wrap SkillDeclarationAlignmentGrader imports
- Reflow textwrap.dedent prompt strings in threat_analysis grader
- Apply Black-style line breaks in runner, skill_models, evaluate_skills
- Expand long literals in declaration_alignment and threat_analysis tests

Made-with: Cursor
Restore `python -m pytest` instead of `.venv/bin/python -m pytest`.

Made-with: Cursor
@XiaoBoAI XiaoBoAI merged commit 22436a5 into agentscope-ai:main Apr 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants