[CLI][Sugestion] Adding flags for evals to return only failed and print to output file #678
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding to
arcade evals:-f or --failed-only: shows only failed evals-o or --output: write to an output txt fileNote
Adds failed-only filtering and file output to
arcade evals, refactors result display to support console/file with accurate summaries, and adds tests; bumps version to 1.6.0.evals:--failed-only,-fand--output,-o.utils.filter_failed_evaluations, passing original counts to display._display_results_to_console; supportfailed_onlydisclaimer and original-count summary.output_filewriting (creates parent dirs) with same formatted output as console.filter_failed_evaluations(all_evaluations)returning filtered data and(total, passed, failed, warned).libs/tests/cli/test_display.pyandlibs/tests/cli/test_main_evals.pycovering display details, failed-only mode, file output, and filtering.1.6.0.Written by Cursor Bugbot for commit baad441. This will update automatically on new commits. Configure here.