Shortfin assumes that function results do not alias outside allocations #980

stellaraccident · 2025-02-19T04:44:13Z

We never created a calling convention to differentiate between function results that may alias a global allocation and those that can be assumed uniquely transferred to the caller. The reason this matters is that when receiving a result of unknown providence, the caller needs to be able to decide whether it should do stream ordered deallocation scheduling or just rely on global refcount based destruction. Making the wrong decision can have very adverse outcomes:

If taking ownership and scheduling async deallocation on globally held buffers, this can result in prematurely freed resources.
If failing to take ownership and falling back on global deallocation, this comes at very large performance penalties on most backends as they have to adopt very conservative, stop-the-world deallocation behavior.

While we would ideally like to be conservatively correct, the performance implications are too great to ignore, especially considering that we have no example or real programs that violate the aliasing rule. While I can imagine them, with current frontends, it is not even straight-forward to construct such scenarios.

There is an override that can be used by the programmer when they know aliasing is possible:

For initial patch landing, the assume_no_alias flag will be on, forcing aggressive behavior. It can be enabled/disabled at the process or invocation level.

Per invocation:

inv = await some_function(foo, bar)
inv.assume_no_alias = False
(my_result,) = inv

Globally:

import shortfin as sf
sf.ProgramInvocation.global_assume_no_alias = False

Keeping this issue as documentation and to track complete resolution.

* Device allocations are now async, queue ordered alloc/dealloc. * Program invocations asynchronously deallocate function call results if it can. If it ever cannot, then a small tracy zone `SyncImportTimelineResource` will be emitted per result that cannot be async deallocated. * Adds `ProgramInvocation.assume_no_alias` instance boolean to disable the assumption which allows async deallocation to work. * Adds global `ProgramIncovation.global_no_alias` property to control process-wide. This is a very fiddly optimization which requires (esp in multi-device cases) a number of things to line up. Tested on amdgpu and CPU with a number of sample workloads (with logging enabled and visually confirmed). See #980 for detailed analysis and further work required.

stellaraccident mentioned this issue Feb 19, 2025

[shortfin] Implement async alloc/dealloc of buffers. #507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shortfin assumes that function results do not alias outside allocations #980

Shortfin assumes that function results do not alias outside allocations #980

stellaraccident commented Feb 19, 2025 •

edited

Loading

Shortfin assumes that function results do not alias outside allocations #980

Shortfin assumes that function results do not alias outside allocations #980

Comments

stellaraccident commented Feb 19, 2025 • edited Loading

stellaraccident commented Feb 19, 2025 •

edited

Loading