Skip to content

feat: write evolution_trace.json at each checkpoint#28

Open
shubham3-ucb wants to merge 1 commit into
skydiscover-ai:mainfrom
shubham3-ucb:feat/evolution-trace-json
Open

feat: write evolution_trace.json at each checkpoint#28
shubham3-ucb wants to merge 1 commit into
skydiscover-ai:mainfrom
shubham3-ucb:feat/evolution-trace-json

Conversation

@shubham3-ucb

@shubham3-ucb shubham3-ucb commented Mar 16, 2026

Copy link
Copy Markdown
Collaborator

Closes #17

Writes evolution_trace.json into each checkpoint directory alongside the existing metadata.json. The file lists every program in iteration order with id, score, metrics, parent_id, timestamp, and solution — so the full run history is readable from a single file without walking programs/*.json.

No changes to existing behaviour or artefacts. 16 tests added.

Every checkpoint directory now contains evolution_trace.json alongside
metadata.json. The file lists every program the search generated, sorted
by iteration_found, with id, score, metrics, parent_id, timestamp, and
solution — making it straightforward to plot score trajectories, inspect
lineage, or replay a run without walking individual program files.

Closes skydiscover-ai#17
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the checkpointing mechanism by introducing an evolution_trace.json file within each checkpoint directory. This new file provides a comprehensive, sorted history of all programs generated during a run, including their scores, metrics, and lineage. The addition significantly simplifies the analysis of program evolution and score trajectories, offering a single source of truth for the entire discovery process without altering existing functionality.

Highlights

  • New evolution_trace.json file: Introduced a new evolution_trace.json file within each checkpoint directory, capturing the full history of generated programs, sorted by iteration.
  • Simplified Analysis: This new file centralizes program evolution data, making it trivial to reconstruct score trajectories, inspect lineage, and replay runs, which previously required manual aggregation.
  • Additive Implementation: The change is purely additive, implemented via a new private method _write_evolution_trace in CheckpointManager, ensuring no impact on existing runner, API, or behavior.
  • Comprehensive Testing: Extensive unit and integration tests were added to validate the trace format, sorting, field correctness, parent ID preservation, score handling, and proper overwriting behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • skydiscover/search/utils/checkpoint_manager.py
    • Added a call to _write_evolution_trace within the save method to generate the evolution trace file.
    • Implemented the _write_evolution_trace method to collect and serialize program data into evolution_trace.json.
  • tests/search/test_checkpoint_manager.py
    • Added a new test file containing unit tests for the evolution_trace.json functionality, covering file creation, field correctness, sorting, parent ID preservation, and overwrite behavior.
  • tests/search/test_evolution_trace_integration.py
    • Added a new test file with integration tests to verify the end-to-end generation and content of evolution_trace.json across multiple checkpoints during a simulated discovery run.
Activity
  • 7 unit tests were added to cover various aspects of the evolution_trace.json format, sorting, field accuracy, parent ID preservation, handling of None scores, and overwriting behavior on re-saves.
  • 9 integration tests were implemented to run the full pipeline with a mocked LLM over three iterations, confirming that the trace is correctly written in every checkpoint, scores match the evaluator, program lineage is intact, and all existing checkpoint artifacts remain present.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@shubham3-ucb shubham3-ucb requested a review from lynnliu030 March 16, 2026 09:03

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature by writing an evolution_trace.json file at each checkpoint, which will greatly simplify the analysis of discovery runs. The implementation is clean, and it's accompanied by a comprehensive set of unit and integration tests, ensuring the new functionality is robust. I have one minor suggestion to make the code more idiomatic.

@shubham3-ucb shubham3-ucb requested review from akrentsel and mert-cemri and removed request for lynnliu030 March 16, 2026 09:03
@skydiscover-ai skydiscover-ai deleted a comment from gemini-code-assist Bot Mar 16, 2026
@shubham3-ucb shubham3-ucb removed the request for review from akrentsel March 16, 2026 09:05
@lynnliu030 lynnliu030 self-requested a review March 16, 2026 09:27

@mert-cemri mert-cemri left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks mostly good, some concerns:

  1. Including the full solution (source code) for every program makes the trace file potentially very large. For a run with hundreds of programs, this could be megabytes of JSON. Is this intended? Alternatively, consider making solution inclusion optional or providing a separate "compact" trace without solutions.

  2. Test: test_trace_no_metrics_score_is_none creates Program with no required defaults
    prog = Program(id="x", solution="pass", metrics={})
    This relies on all other Program fields having defaults. It works today but is fragile if Program adds a required field later. Using the _make_program helper (with score=0) would be more consistent.

@shubham3-ucb

Copy link
Copy Markdown
Collaborator Author

@mert-cemri thanks for the review!

  1. Solution size — intentional for now. The trace is meant for full-run replay and lineage inspection, so having the solution inline is useful. Happy to add a compact mode (omit solutions) in a follow-up if it becomes an issue in practice.

  2. Test fragility — good catch. Fixed in 92303b9 — now uses dataclasses.replace(_make_program(...), metrics={}) so the test won't break if Program gains new required fields.

@passing2961

Copy link
Copy Markdown

I noticed that the current implementation does not persist the user prompt and system message. To address this, I have modified the code based on the changes proposed in this PR.

Below is an example of the updated JSON structure (values omitted for brevity):

{
  "id": "...",
  "iteration_found": "...",
  "generation": "...",
  "score": "...",
  "metrics": {
    "c5_bound": "...",
    "combined_score": "...",
    "n_points": "...",
    "eval_time": "..."
  },
  "parent_id": "...",
  "timestamp": "...",
  "solution": "...",
  "prompts": {
    "diff_user_message": {
      "system": "...",
      "user": "...",
      "responses": "..."
    }
  }
}

By the way, is there any specific reason why you do not save the user prompt?

@lynnliu030

Copy link
Copy Markdown
Collaborator

No particular reason: please feel free to add it!

@passing2961 if you’d like to open a new PR, that’s totally welcome. We can close this one and merge the restructured version instead. Since you’re more familiar with this feature, happy to defer to your implementation 🙂

@passing2961

Copy link
Copy Markdown

@lynnliu030 I will open a new PR as soon as possible!

@lynnliu030

lynnliu030 commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

@passing2961 Thanks!! any progress on this?

@ohadeytan

ohadeytan commented Apr 14, 2026

Copy link
Copy Markdown

Is this still planned? I think it's a needed feature. @passing2961 @lynnliu030

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Question About Saving Evolution Trajectories in JSON

5 participants