Add job resubmission feature for failed datasets by Copilot · Pull Request #424 · PocketCoffea/PocketCoffea

Copilot · 2025-11-04T21:37:37Z

Implements automatic tracking and selective resubmission of failed jobs when using --process-separately. Failed dataset/group names are saved to failed_jobs.json for reprocessing without re-running successful jobs.

Changes

pocket_coffea/utils/utils.py: Added save_failed_jobs() and load_failed_jobs() functions with error handling for I/O and JSON parsing
pocket_coffea/scripts/runner.py:
- New --resubmit-failed CLI flag (requires --process-separately)
- Track failed jobs during processing and save to failed_jobs.json
- Filter filesets to only failed groups/datasets when resubmitting
tests/test_runner.py: Added tests for tracking behavior and validation requirements

Usage

# Initial run - failed jobs automatically saved
pocket-coffea run --cfg config.py -o output/ --process-separately
# WARNING: 2 job(s) failed. Failed jobs saved to output/failed_jobs.json

# Resubmit only failed jobs
pocket-coffea run --cfg config.py -o output/ --process-separately --resubmit-failed
# INFO: Resubmitting 2 failed jobs: ['dataset_A', 'group_XYZ']

The failed_jobs.json contains a simple array of failed dataset/group names:

["dataset_A", "group_XYZ"]

Original prompt

This section details on the original issue you should resolve

<issue_title>Job resubmission feature</issue_title>
<issue_description>It would be nice to have a feature that exploits the "try and log error" logic (in conjunction with the --process-separately functionality) recently implemented in the runner.py script in order to save a json file failed_jobs.json in the output folder containing the names of the datasets or groups that have failed.
An argument --resubmit-failed can be introduced in the runner such that, if the failed_jobs.json file exists, the processor is run again, but only on the failed datasets/groups listed in the json file.
In this sense, the --resubmit-failed should filter the filesets dictionary by restricting only to the keys of the failed datasets.</issue_description>

<agent_instructions>You should implement the new feature described in the issue and provide a PR.
You should modify as least files as possible: modify the runner.py first. You can define external functions of general interest in the utils/ if needed, to reduce the amount of code in the runner.py.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes Job resubmission feature #423

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: mmarchegiani <37423409+mmarchegiani@users.noreply.github.com>

valsdav · 2025-12-15T14:02:23Z

@mmarchegiani should we merge this?

Initial plan

5ad219f

Copilot AI assigned Copilot and mmarchegiani Nov 4, 2025

Copilot started work on behalf of mmarchegiani November 4, 2025 21:37 View session

Copilot AI and others added 3 commits November 4, 2025 21:44

Add job resubmission feature with failed_jobs.json tracking

18af389

Co-authored-by: mmarchegiani <37423409+mmarchegiani@users.noreply.github.com>

Address code review: use constant for filename and avoid redundant I/O

21d14ea

Co-authored-by: mmarchegiani <37423409+mmarchegiani@users.noreply.github.com>

Add error handling and use FAILED_JOBS_FILENAME constant consistently

116bb18

Co-authored-by: mmarchegiani <37423409+mmarchegiani@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add job resubmission feature to runner~~ Add job resubmission feature for failed datasets Nov 4, 2025

Copilot AI requested a review from mmarchegiani November 4, 2025 21:53

Copilot finished work on behalf of mmarchegiani November 4, 2025 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add job resubmission feature for failed datasets#424

Add job resubmission feature for failed datasets#424
Copilot wants to merge 4 commits intomainfrom
copilot/add-job-resubmission-feature

Copilot AI commented Nov 4, 2025 •

edited

Loading

Uh oh!

valsdav commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Usage

Comments on the Issue (you are @copilot in this section)

Uh oh!

valsdav commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Nov 4, 2025 •

edited

Loading