Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline retries #908

Merged
merged 5 commits into from
Mar 24, 2025
Merged

Pipeline retries #908

merged 5 commits into from
Mar 24, 2025

Conversation

bkorycki
Copy link
Contributor

The pipeline runner (which used by run-job and run-csv-items) now will keep retrying calls to SUTs and annotators until it gets a successful response. It logs exceptions and retry counts so that a user can manually stop the command if they see that it is retrying an impossible exception a million times.

@bkorycki bkorycki requested review from wpietri and rogthefrog March 21, 2025 19:31
@bkorycki bkorycki requested a review from a team as a code owner March 21, 2025 19:31
Copy link

github-actions bot commented Mar 21, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

response = annotator.annotate(request)
break
except Exception as e:
logger.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a pause before the retry?

response = sut.evaluate(request)
break
except Exception as e:
logger.warning(f"Exception calling SUT {sut.uid} on attempt {tries}: {e}\nRetrying.....", exc_info=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pause before retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 10 seconds enough you think?

@@ -239,6 +242,22 @@ def test_annotator_worker_cache_different_annotators(annotators, tmp_path):
assert annotators["annotator_dict"].annotate_calls == 1


def test_annotator_worker_retries_until_success():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@bkorycki bkorycki temporarily deployed to Scheduled Testing March 21, 2025 20:41 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 21, 2025 20:41 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 21, 2025 20:41 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 21, 2025 20:43 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 21, 2025 20:43 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 21, 2025 20:43 — with GitHub Actions Inactive
@bkorycki bkorycki merged commit f1d11af into main Mar 24, 2025
4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2025
@bkorycki bkorycki deleted the pipeline-retries branch March 24, 2025 16:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants