feat: write evolution_trace.json at each checkpoint by shubham3-ucb · Pull Request #28 · skydiscover-ai/skydiscover

shubham3-ucb · 2026-03-16T09:01:44Z

Closes #17

Writes evolution_trace.json into each checkpoint directory alongside the existing metadata.json. The file lists every program in iteration order with id, score, metrics, parent_id, timestamp, and solution — so the full run history is readable from a single file without walking programs/*.json.

No changes to existing behaviour or artefacts. 16 tests added.

Every checkpoint directory now contains evolution_trace.json alongside metadata.json. The file lists every program the search generated, sorted by iteration_found, with id, score, metrics, parent_id, timestamp, and solution — making it straightforward to plot score trajectories, inspect lineage, or replay a run without walking individual program files. Closes skydiscover-ai#17

gemini-code-assist · 2026-03-16T09:02:04Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the checkpointing mechanism by introducing an evolution_trace.json file within each checkpoint directory. This new file provides a comprehensive, sorted history of all programs generated during a run, including their scores, metrics, and lineage. The addition significantly simplifies the analysis of program evolution and score trajectories, offering a single source of truth for the entire discovery process without altering existing functionality.

Highlights

New evolution_trace.json file: Introduced a new evolution_trace.json file within each checkpoint directory, capturing the full history of generated programs, sorted by iteration.
Simplified Analysis: This new file centralizes program evolution data, making it trivial to reconstruct score trajectories, inspect lineage, and replay runs, which previously required manual aggregation.
Additive Implementation: The change is purely additive, implemented via a new private method _write_evolution_trace in CheckpointManager, ensuring no impact on existing runner, API, or behavior.
Comprehensive Testing: Extensive unit and integration tests were added to validate the trace format, sorting, field correctness, parent ID preservation, score handling, and proper overwriting behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

skydiscover/search/utils/checkpoint_manager.py
- Added a call to _write_evolution_trace within the save method to generate the evolution trace file.
- Implemented the _write_evolution_trace method to collect and serialize program data into evolution_trace.json.
tests/search/test_checkpoint_manager.py
- Added a new test file containing unit tests for the evolution_trace.json functionality, covering file creation, field correctness, sorting, parent ID preservation, and overwrite behavior.
tests/search/test_evolution_trace_integration.py
- Added a new test file with integration tests to verify the end-to-end generation and content of evolution_trace.json across multiple checkpoints during a simulated discovery run.

Activity

7 unit tests were added to cover various aspects of the evolution_trace.json format, sorting, field accuracy, parent ID preservation, handling of None scores, and overwriting behavior on re-saves.
9 integration tests were implemented to run the full pipeline with a mocked LLM over three iterations, confirming that the trace is correctly written in every checkpoint, scores match the evaluator, program lineage is intact, and all existing checkpoint artifacts remain present.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable feature by writing an evolution_trace.json file at each checkpoint, which will greatly simplify the analysis of discovery runs. The implementation is clean, and it's accompanied by a comprehensive set of unit and integration tests, ensuring the new functionality is robust. I have one minor suggestion to make the code more idiomatic.

mert-cemri

It looks mostly good, some concerns:

Including the full solution (source code) for every program makes the trace file potentially very large. For a run with hundreds of programs, this could be megabytes of JSON. Is this intended? Alternatively, consider making solution inclusion optional or providing a separate "compact" trace without solutions.
Test: test_trace_no_metrics_score_is_none creates Program with no required defaults
prog = Program(id="x", solution="pass", metrics={})
This relies on all other Program fields having defaults. It works today but is fragile if Program adds a required field later. Using the _make_program helper (with score=0) would be more consistent.

shubham3-ucb · 2026-03-17T08:29:54Z

@mert-cemri thanks for the review!

Solution size — intentional for now. The trace is meant for full-run replay and lineage inspection, so having the solution inline is useful. Happy to add a compact mode (omit solutions) in a follow-up if it becomes an issue in practice.
Test fragility — good catch. Fixed in 92303b9 — now uses dataclasses.replace(_make_program(...), metrics={}) so the test won't break if Program gains new required fields.

passing2961 · 2026-03-19T02:46:55Z

I noticed that the current implementation does not persist the user prompt and system message. To address this, I have modified the code based on the changes proposed in this PR.

Below is an example of the updated JSON structure (values omitted for brevity):

{
  "id": "...",
  "iteration_found": "...",
  "generation": "...",
  "score": "...",
  "metrics": {
    "c5_bound": "...",
    "combined_score": "...",
    "n_points": "...",
    "eval_time": "..."
  },
  "parent_id": "...",
  "timestamp": "...",
  "solution": "...",
  "prompts": {
    "diff_user_message": {
      "system": "...",
      "user": "...",
      "responses": "..."
    }
  }
}

By the way, is there any specific reason why you do not save the user prompt?

lynnliu030 · 2026-03-19T03:37:09Z

No particular reason: please feel free to add it!

@passing2961 if you’d like to open a new PR, that’s totally welcome. We can close this one and merge the restructured version instead. Since you’re more familiar with this feature, happy to defer to your implementation 🙂

passing2961 · 2026-03-19T14:06:58Z

@lynnliu030 I will open a new PR as soon as possible!

lynnliu030 · 2026-04-07T06:47:18Z

@passing2961 Thanks!! any progress on this?

ohadeytan · 2026-04-14T10:13:43Z

Is this still planned? I think it's a needed feature. @passing2961 @lynnliu030

shubham3-ucb requested a review from lynnliu030 March 16, 2026 09:03

gemini-code-assist Bot reviewed Mar 16, 2026

View reviewed changes

shubham3-ucb requested review from akrentsel and mert-cemri and removed request for lynnliu030 March 16, 2026 09:03

skydiscover-ai deleted a comment from gemini-code-assist Bot Mar 16, 2026

shubham3-ucb removed the request for review from akrentsel March 16, 2026 09:05

lynnliu030 self-requested a review March 16, 2026 09:27

mert-cemri reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: write evolution_trace.json at each checkpoint#28

feat: write evolution_trace.json at each checkpoint#28
shubham3-ucb wants to merge 1 commit into
skydiscover-ai:mainfrom
shubham3-ucb:feat/evolution-trace-json

shubham3-ucb commented Mar 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mert-cemri left a comment

Uh oh!

shubham3-ucb commented Mar 17, 2026

Uh oh!

passing2961 commented Mar 19, 2026

Uh oh!

lynnliu030 commented Mar 19, 2026

Uh oh!

passing2961 commented Mar 19, 2026

Uh oh!

lynnliu030 commented Apr 7, 2026 •

edited

Loading

Uh oh!

ohadeytan commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

shubham3-ucb commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mert-cemri left a comment

Choose a reason for hiding this comment

Uh oh!

shubham3-ucb commented Mar 17, 2026

Uh oh!

passing2961 commented Mar 19, 2026

Uh oh!

lynnliu030 commented Mar 19, 2026

Uh oh!

passing2961 commented Mar 19, 2026

Uh oh!

lynnliu030 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ohadeytan commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

shubham3-ucb commented Mar 16, 2026 •

edited

Loading

lynnliu030 commented Apr 7, 2026 •

edited

Loading

ohadeytan commented Apr 14, 2026 •

edited

Loading