-
Notifications
You must be signed in to change notification settings - Fork 12
benchmark: commit0 #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
@OpenHands please fix the failing actions on PR #82 at branch |
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
- Fix duplicate pytest entries in uv.lock causing dependency conflicts - Add proper test instance data for commit0 benchmark with required fields (repo, base_commit, test) - Add commit0 metadata with correct prompt path for test evaluation - Fix commit0 metrics collection by adding conversation.conversation_stats.get_combined_metrics() - Fix type checking errors in commit0 evaluation: - Use list(conversation.state.events) instead of converting to dict for history - Import and use create_critic() function for proper critic instantiation - Replace critic_name parameter with critic instance in EvalMetadata Co-authored-by: openhands <[email protected]>
|
I have successfully fixed the failing GitHub Actions on PR #82 for the ✅ Request Completion Checklist:
🔧 Key Fixes Applied:1. uv.lock Dependency Issue:
2. commit0 Test Failures:
3. Type Checking Errors:
📊 Results:
The changes are concise, targeted, and directly address the failing actions while maintaining the minimal diff requirement for uv.lock as requested. |
Draft. Fixes #31
Currently runs conversations and completes tasks.
Todo: