Skip to content

fix(cli): allow evaluation-only invocation without dummy seed path#27

Open
shuofengzhang wants to merge 1 commit into
skydiscover-ai:mainfrom
shuofengzhang:fix-cli-evaluation-only-positional
Open

fix(cli): allow evaluation-only invocation without dummy seed path#27
shuofengzhang wants to merge 1 commit into
skydiscover-ai:mainfrom
shuofengzhang:fix-cli-evaluation-only-positional

Conversation

@shuofengzhang

Copy link
Copy Markdown
Contributor

What changed

  • Reworked CLI positional parsing to accept either:
    • skydiscover-run <evaluation_file> (evaluation-only mode)
    • skydiscover-run <initial_program> <evaluation_file> (seeded mode)
  • Replaced the ambiguous positional pair (initial_program? + required evaluation_file) with a single positional list and explicit normalization logic.
  • Added targeted tests in tests/test_cli_positional_paths.py for:
    • single-path invocation
    • two-path invocation
    • rejection of invalid 3+ positional paths

Why

  • The previous argparse shape made the documented optional-initial-program workflow fail in normal usage: if users passed only one positional path, argparse bound it to initial_program and still required evaluation_file.
  • This produced a confusing UX where an "optional" positional argument could not actually be omitted.

Insight / Why this matters

  • Root cause: optional positional arguments that appear before a required positional are not skippable in typical CLI syntax. Argparse greedily assigns the first positional token to the first positional argument.
  • Why easy to miss: the parser declaration looked logically correct (nargs='?' for initial program + required evaluation file), but argparse semantics make this pattern brittle.
  • Tradeoff: parsing now performs an explicit cardinality check (1 or 2 positional paths) instead of relying on implicit positional binding.
  • Impact: users can reliably run evaluation-only workflows without dummy arguments or parser workarounds, reducing startup friction for benchmark/evaluator-centric runs.

Testing

  • source .venv/bin/activate && pytest -q
    • 14 passed in 0.59s

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical usability issue in the command-line interface by overhauling how positional arguments are parsed. The change allows for more intuitive and reliable invocation of the tool, particularly for evaluation-only workflows, by moving from an ambiguous optional argument setup to explicit path handling. Additionally, it enhances the robustness of checkpoint discovery, contributing to overall system stability.

Highlights

  • CLI Positional Argument Parsing Reworked: The command-line interface's positional argument parsing has been refactored to explicitly handle one or two paths, resolving an issue where an optional initial program could not be omitted due to argparse's greedy behavior.
  • Flexible Invocation Modes: Users can now reliably invoke the CLI in two modes: skydiscover-run <evaluation_file> for evaluation-only, or skydiscover-run <initial_program> <evaluation_file> for seeded runs, without needing dummy arguments.
  • Robust Checkpoint Discovery: The internal _find_latest_checkpoint utility has been improved to more robustly identify the latest checkpoint directory by correctly parsing iteration numbers and ignoring non-numeric directory names.
  • Comprehensive Testing: New, targeted tests have been added for both the CLI's positional path parsing logic and the _find_latest_checkpoint function, ensuring the new behaviors are correctly implemented and maintained.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • skydiscover/cli.py
    • Replaced separate initial_program and evaluation_file positional arguments with a single paths argument using nargs='+'.
    • Implemented custom logic to interpret the paths argument, assigning values to initial_program (optional) and evaluation_file based on the number of provided paths.
    • Added error handling for invalid numbers of positional paths (more than two).
    • Refactored _find_latest_checkpoint to use a more robust parsing method for checkpoint directory names, ensuring only valid, numerically-suffixed directories are considered.
  • tests/test_cli_checkpoint_discovery.py
    • Added new test cases for _find_latest_checkpoint to verify correct identification of the highest iteration checkpoint.
    • Included tests to ensure _find_latest_checkpoint ignores non-numeric directories.
    • Added a test to confirm _find_latest_checkpoint returns None when no valid checkpoints are found.
  • tests/test_cli_positional_paths.py
    • Added new test cases for parse_args to validate single-path invocation correctly assigns to evaluation_file.
    • Included tests to verify two-path invocation correctly assigns to initial_program and evaluation_file.
    • Added a test to confirm that parse_args rejects invocations with more than two positional paths.
Activity
  • The author implemented a rework of CLI positional parsing to accept either one or two paths.
  • The author replaced the ambiguous positional pair with a single positional list and explicit normalization logic.
  • The author added targeted tests in tests/test_cli_positional_paths.py for single-path, two-path, and invalid 3+ positional path invocations.
  • The author confirmed that all tests passed with 14 passed in 0.59s.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves the issue with the CLI's positional argument parsing, allowing for a more intuitive evaluation-only invocation. The approach of using a single paths argument with post-parsing normalization is a clean and robust solution. The accompanying refactor of _find_latest_checkpoint is also a great improvement, fixing a subtle bug and making the logic more resilient. Both changes are supported by clear and targeted tests. I have one minor suggestion to improve naming consistency for better maintainability.

Comment thread skydiscover/cli.py Outdated
Comment on lines 285 to 287
def parse_iteration(path: str) -> Optional[int]:
try:
return int(path.rsplit("_", 1)[-1])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency with the calling code on line 296 (iteration = parse_iteration(name)), the parameter path in the nested function parse_iteration should be renamed to name. This makes it clearer that the function operates on a directory name from os.listdir, not a full file system path.

Suggested change
def parse_iteration(path: str) -> Optional[int]:
try:
return int(path.rsplit("_", 1)[-1])
def parse_iteration(name: str) -> Optional[int]:
try:
return int(name.rsplit("_", 1)[-1])

@shuofengzhang shuofengzhang force-pushed the fix-cli-evaluation-only-positional branch from 9f9967c to cfb913c Compare March 15, 2026 19:15
@lynnliu030

Copy link
Copy Markdown
Collaborator

@shuofengzhang Thanks for the catch! I'm thinking whether this can be done by simply changing the argument order from <initial_solution> <eval> to <eval> <initial_solution>? Then the argument name can be more meaningful than paths, because paths can be output folders, and many other stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants