Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortfin assumes that function results do not alias outside allocations #980

Open
stellaraccident opened this issue Feb 19, 2025 · 0 comments

Comments

@stellaraccident
Copy link
Contributor

stellaraccident commented Feb 19, 2025

We never created a calling convention to differentiate between function results that may alias a global allocation and those that can be assumed uniquely transferred to the caller. The reason this matters is that when receiving a result of unknown providence, the caller needs to be able to decide whether it should do stream ordered deallocation scheduling or just rely on global refcount based destruction. Making the wrong decision can have very adverse outcomes:

  1. If taking ownership and scheduling async deallocation on globally held buffers, this can result in prematurely freed resources.
  2. If failing to take ownership and falling back on global deallocation, this comes at very large performance penalties on most backends as they have to adopt very conservative, stop-the-world deallocation behavior.

While we would ideally like to be conservatively correct, the performance implications are too great to ignore, especially considering that we have no example or real programs that violate the aliasing rule. While I can imagine them, with current frontends, it is not even straight-forward to construct such scenarios.

There is an override that can be used by the programmer when they know aliasing is possible:

For initial patch landing, the assume_no_alias flag will be on, forcing aggressive behavior. It can be enabled/disabled at the process or invocation level.

Per invocation:

inv = await some_function(foo, bar)
inv.assume_no_alias = False
(my_result,) = inv

Globally:

import shortfin as sf
sf.ProgramInvocation.global_assume_no_alias = False

Keeping this issue as documentation and to track complete resolution.

stellaraccident added a commit that referenced this issue Feb 19, 2025
* Device allocations are now async, queue ordered alloc/dealloc.
* Program invocations asynchronously deallocate function call results if
it can. If it ever cannot, then a small tracy zone
`SyncImportTimelineResource` will be emitted per result that cannot be
async deallocated.
* Adds `ProgramInvocation.assume_no_alias` instance boolean to disable
the assumption which allows async deallocation to work.
* Adds global `ProgramIncovation.global_no_alias` property to control
process-wide.

This is a very fiddly optimization which requires (esp in multi-device
cases) a number of things to line up. Tested on amdgpu and CPU with a
number of sample workloads (with logging enabled and visually
confirmed).

See #980 for detailed analysis and further work required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant