we need a bench helper that can evaluate a skill against a structured set of eval fixtures and compare another variant of the skill against the same set.
- support skill-scoped eval fixtures in a structured mechanism
- likely support positive and negative finding directories per skill
- enable testing and comparison of a skill variant against the eval set
Action taken on behalf of David Cramer.
we need a
benchhelper that can evaluate a skill against a structured set of eval fixtures and compare another variant of the skill against the same set.Action taken on behalf of David Cramer.