Caches `should_stop` so we're not hammering Ray. #973

finbarrtimbers · 2025-08-30T05:01:38Z

As part of #859, I made a number of major changes. To be a better software engineer, I'm breaking them out into separate PRs (and also because there's a bug in #859 that I can't identify).

This caches the value of should_stop so we're not constantly hammering Ray.

Single GPU run: Beaker.

hamishivi

Looks good to me, gave it a quick small test. One comment about some logging on a weird case.

hamishivi · 2025-08-30T18:27:53Z

open_instruct/vllm_utils3.py

+                self._should_stop_value = ray.get(ready_refs[0])
+                self._last_should_stop_update = time.perf_counter()
+            else:
+                ray.cancel(should_stop_ref)


Do you think its worth adding some logging here? It seems like if the actor manager doesn't return at all / times out for this call its a sign something is a bit off.

finbarrtimbers added 4 commits August 29, 2025 23:00

Now, caches should_stop value.

87556b4

Merge branch 'main' into cache-stop

79bda33

Updated code to maintain last value on timeout.

ae8a4d8

Cleaned up conditional.

1b87348

finbarrtimbers requested a review from hamishivi August 30, 2025 05:10

finbarrtimbers marked this pull request as ready for review August 30, 2025 05:10

finbarrtimbers enabled auto-merge August 30, 2025 05:10

hamishivi approved these changes Aug 30, 2025

View reviewed changes

finbarrtimbers added this pull request to the merge queue Aug 30, 2025

Merged via the queue into main with commit cf92137 Aug 30, 2025
3 checks passed

finbarrtimbers deleted the cache-stop branch September 3, 2025 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Caches `should_stop` so we're not hammering Ray. #973

Caches `should_stop` so we're not hammering Ray. #973

Uh oh!

finbarrtimbers commented Aug 30, 2025 •

edited

Loading

Uh oh!

hamishivi left a comment

Uh oh!

hamishivi Aug 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Caches should_stop so we're not hammering Ray. #973

Caches should_stop so we're not hammering Ray. #973

Uh oh!

Conversation

finbarrtimbers commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Caches `should_stop` so we're not hammering Ray. #973

Caches `should_stop` so we're not hammering Ray. #973

finbarrtimbers commented Aug 30, 2025 •

edited

Loading