Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Aug 30, 2025

As part of #859, I made a number of major changes. To be a better software engineer, I'm breaking them out into separate PRs (and also because there's a bug in #859 that I can't identify).

This caches the value of should_stop so we're not constantly hammering Ray.

Single GPU run: Beaker.

@finbarrtimbers finbarrtimbers marked this pull request as ready for review August 30, 2025 05:10
Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, gave it a quick small test. One comment about some logging on a weird case.

self._should_stop_value = ray.get(ready_refs[0])
self._last_should_stop_update = time.perf_counter()
else:
ray.cancel(should_stop_ref)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think its worth adding some logging here? It seems like if the actor manager doesn't return at all / times out for this call its a sign something is a bit off.

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Aug 30, 2025
Merged via the queue into main with commit cf92137 Aug 30, 2025
3 checks passed
@finbarrtimbers finbarrtimbers deleted the cache-stop branch September 3, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants