-
Notifications
You must be signed in to change notification settings - Fork 27
[Feat.] PR evaluation workflow with automatic robustness evaluation #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
393e9d4
prep_datasets is ready
RixinLiu 76e0d9c
generate_prediction_file.py is ready
RixinLiu a589e93
check_config_prediction_files.py is ready
RixinLiu 0d783fa
llm_evaluation/run.py is ready
RixinLiu a77f91e
pr-automation is ready to test
RixinLiu 2285c91
[Debug]
RixinLiu 865f500
[Feat.] PR evaluation workflow with automatic robustness evaluation
RixinLiu 9712ce2
[GEMINI SUGGESTION] Update try-catch in automation/process_pr_submiss…
RixinLiu 56e09c3
[GEMINI SUGGESTION] Update try-catch in llm_evaluation/run.py
RixinLiu ee47815
[GEMINI SUGGESTION] Fix typo in scripts/process_datasets/prep_dataset…
RixinLiu 83e741f
Refactor robustness score compute logic
RixinLiu dbcdcf8
Refine compute_robustness_score implementation
RixinLiu d783823
Handle robustness CLI errors by raising exceptions
RixinLiu ef5f776
Remove type ignore
RixinLiu 6366111
Solve conflict between local utils and global utils
RixinLiu 4120894
Replace arg --calculate-robustness-score with robustness
RixinLiu 407aa2e
Should pass pre-commit
RixinLiu fd8e765
Refine code
RixinLiu 2c24823
Update scripts/process_datasets/prep_datasets.py
RixinLiu 94e22f3
Update router_inference/check_config_prediction_files.py
RixinLiu 6883d6e
Ready to merge
RixinLiu 1bc8176
Ready to merge
RixinLiu 9e60ff7
Remove incorrect files
RixinLiu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.