-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Context
From PR #1121 review — @mihow suggested simplifying the logic that determines if a Job is in the FAILURE state:
I am thinking we should simplify the logic determining if a Job is in the FAILURE state. Let's just show the counts. Really we need a new state like "COMPLETED" instead of Celery's SUCCESS & FAILURE states. "Completed with errors". Then we can remove a number of checks related to the stage status & overall status.
Problem
Currently jobs use Celery's SUCCESS and FAILURE states, but real-world ML processing jobs often finish with some images failing (bad crops, missing files, timeouts) while the majority succeed. The current approach uses a failure ratio threshold to decide between SUCCESS and FAILURE, which requires threading a complete_state parameter through the progress stages and adds complexity.
Proposal
Add a COMPLETED (or COMPLETED_WITH_ERRORS) state to the Job status choices. A job that finishes processing all images would be COMPLETED regardless of individual failures. The UI would show the actual counts (processed, failed, detections, classifications) and let the user judge the outcome.
This would allow removing:
- The failure ratio threshold logic in
ami/jobs/tasks.py - The
complete_stateparameter threading through_update_job_progress - Various checks related to per-stage status determining overall status
Related
- PR PSv2: Track and display image count progress and state #1121 — PSv2 track image count progress
ami/jobs/tasks.py—_update_job_progress, failure ratio logicami/jobs/models.py— Job model, status field choices