Example: Async evaluation runs #22

LeoRoccoBreedt · 2025-06-26T09:04:38Z

Description

Include a summary of the changes and the related issue.

Related to: <ClickUp/JIRA task name>

Any expected test failures?

Add a [X] to relevant checklist items

❔ This change

adds a new feature
fixes breaking code
is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

Refactored code (sourcery)
Tested code locally
Precommit installed and run before pushing changes
Added code to GitHub tests (notebooks, scripts)
Updated GitHub README
Updated the projects overview page on Notion

🧪 Test Configuration

OS: Windows
Python version: 3.12
Neptune version: 0.14.0
Affected libraries with version: torch, torchvision

Summary by Sourcery

Add an "evaluate-long-runs" how-to guide that includes a training script, an asynchronous evaluation script, and a notebook example to demonstrate logging, checkpointing, and splitting long-running evaluation tasks with Neptune.

New Features:

Add train.py script to train a ResNet18 on CIFAR10, log training metrics and save comprehensive checkpoints to Neptune
Add async_evals.py script to poll saved checkpoints, compute validation accuracy, and log evaluation metrics back to the same Neptune run
Include a Jupyter notebook example demonstrating splitting evaluation runs into even and odd steps and resuming Neptune runs

Documentation:

Introduce a new how-to guide "evaluate-long-runs" with accompanying scripts and notebook for asynchronous evaluation

…e_evaluations

… from seperate terminals

sourcery-ai · 2025-06-26T09:04:42Z

Reviewer's Guide

Demonstrates asynchronous long-run evaluation by providing a notebook walkthrough, a training script with metric logging and checkpoint persistence, and a polling-based evaluation script that resumes runs to log validation accuracy.

Sequence diagram for async evaluation and metric logging

sequenceDiagram
    actor User
    participant TrainScript as Training Script
    participant Checkpoint as Checkpoint File
    participant EvalScript as Evaluation Script
    participant Neptune as Neptune Run

    User->>TrainScript: Start training
    TrainScript->>Checkpoint: Save checkpoint (per epoch)
    TrainScript->>Neptune: Log training metrics
    User->>EvalScript: Start evaluation script
    loop Poll for new checkpoints
        EvalScript->>Checkpoint: Detect new checkpoint
        EvalScript->>EvalScript: Load model from checkpoint
        EvalScript->>EvalScript: Evaluate on validation data
        EvalScript->>Neptune: Log evaluation metrics (resume run)
    end

Class diagram for checkpoint structure and Neptune Run usage

classDiagram
    class Run {
        +add_tags(tags)
        +log_metrics(data, step, ...)
        +get_run_url()
        +close()
        _run_id
    }
    class Checkpoint {
        run_id
        epoch
        model_state_dict
        optimizer_state_dict
        loss
        accuracy
        global_step
        model_config
        training_config
    }
    Run <.. Checkpoint : run_id
    Checkpoint o-- model_config : contains
    Checkpoint o-- training_config : contains

File-Level Changes

Change	Details	Files
Introduce notebook demonstrating async run evaluation split across runs	Add notebook with even-step logging cell Add cell resuming run to log odd steps	`how-to-guides/evaluate-long-runs/evaluation_runs.ipynb`
Implement training script with Neptune logging and checkpointing	Configure ResNet18 training on CIFAR10 Log loss, accuracy, epoch via global_step Save detailed checkpoints including run and training metadata	`how-to-guides/evaluate-long-runs/train.py`
Add asynchronous evaluation script polling checkpoints and logging eval metrics	Poll checkpoint directory for new model_state files Evaluate each model on validation set to compute accuracy Resume Neptune run by run_id and log eval/accuracy at the matching step	`how-to-guides/evaluate-long-runs/async_evals.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

LeoRoccoBreedt added 3 commits May 14, 2025 14:34

feat: add script to demonstrate running async evaluations

9e10d72

Merge commit 'e532b59d18de831fd309e9a55c079225780ce4fa' into lb/creat…

4f44098

…e_evaluations

feat: add scripts for running async training loop and evaluation loop…

89e12c8

… from seperate terminals

LeoRoccoBreedt added 3 commits June 27, 2025 12:25

feat: include OOD dataset for async evals

874c058

chore: add initial readme

100f61f

update async evals for logging custom error bands

4e7bd6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example: Async evaluation runs #22

Example: Async evaluation runs #22

Uh oh!

LeoRoccoBreedt commented Jun 26, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jun 26, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Example: Async evaluation runs #22

Are you sure you want to change the base?

Example: Async evaluation runs #22

Uh oh!

Conversation

LeoRoccoBreedt commented Jun 26, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

❔ This change

✔️ Pre-merge checklist

🧪 Test Configuration

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for async evaluation and metric logging

Class diagram for checkpoint structure and Neptune Run usage

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LeoRoccoBreedt commented Jun 26, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jun 26, 2025 •

edited

Loading