-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define garbage collection rooting APIs #8011
Conversation
Part of #5032 |
e11a9c8
to
a21e324
Compare
Subscribe to Label Actioncc @fitzgen, @peterhuene
This issue or pull request has been labeled: "cranelift", "fuzzing", "wasmtime:api", "wasmtime:c-api", "wasmtime:ref-types"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level this all looks good to me. I think there's work we can do to build confidence in the internals, though. While I realize we can't remove all unsafe
here I suspect we don't need quite so many just-a-typed-transmute
functions. Those are really difficult to reason about the safety.
Additionally I think this is definitely an area where we're going to be leaning on miri pretty heavily. Can you ensure that there's tests that run in miri doing all the various bits and bobs with the API other than actually calling in to wasm?
} | ||
|
||
#[cfg(feature = "gc")] | ||
unsafe impl WasmTy for ManuallyRooted<ExternRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future PR, and with more GC types, I think it'd be reasonable to move these impls to the gc_ref.rs
module to avoid the #[cfg]
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea.
@alexcrichton I think I've addressed everything in your review. Going to start rebasing on |
Rooting prevents GC objects from being collected while they are actively being used. We have a few sometimes-conflicting goals with our GC rooting APIs: 1. Safety: It should never be possible to get a use-after-free bug because the user misused the rooting APIs, the collector "mistakenly" determined an object was unreachable and collected it, and then the user tried to access the object. This is our highest priority. 2. Moving GC: Our rooting APIs should moving collectors (such as generational and compacting collectors) where an object might get relocated after a collection and we need to update the GC root's pointer to the moved object. This means we either need cooperation and internal mutability from individual GC roots as well as the ability to enumerate all GC roots on the native Rust stack, or we need a level of indirection. 3. Performance: Our rooting APIs should generally be as low-overhead as possible. They definitely shouldn't require synchronization and locking to create, access, and drop GC roots. 4. Ergonomics: Our rooting APIs should be, if not a pleasure, then at least not a burden for users. Additionally, the API's types should be `Sync` and `Send` so that they work well with async Rust. For example, goals (3) and (4) are in conflict when we think about how to support (2). Ideally, for ergonomics, a root would automatically unroot itself when dropped. But in the general case that requires holding a reference to the store's root set, and that root set needs to be held simultaneously by all GC roots, and they each need to mutate the set to unroot themselves. That implies `Rc<RefCell<...>>` or `Arc<Mutex<...>>`! The former makes the store and GC root types not `Send` and not `Sync`. The latter imposes synchronization and locking overhead. So we instead make GC roots indirect and require passing in a store context explicitly to unroot in the general case. This trades worse ergonomics for better performance and support for moving GC and async Rust. Okay, with that out of the way, this module provides two flavors of rooting API. One for the common, scoped lifetime case, and another for the rare case where we really need a GC root with an arbitrary, non-LIFO/non-scoped lifetime: 1. `RootScope` and `Rooted<T>`: These are used for temporarily rooting GC objects for the duration of a scope. Upon exiting the scope, they are automatically unrooted. The internal implementation takes advantage of the LIFO property inherent in scopes, making creating and dropping `Rooted<T>`s and `RootScope`s super fast and roughly equivalent to bump allocation. This type is vaguely similar to V8's [`HandleScope`]. [`HandleScope`]: https://v8.github.io/api/head/classv8_1_1HandleScope.html Note that `Rooted<T>` can't be statically tied to its context scope via a lifetime parameter, unfortunately, as that would allow the creation and use of only one `Rooted<T>` at a time, since the `Rooted<T>` would take a borrow of the whole context. This supports the common use case for rooting and provides good ergonomics. 2. `ManuallyRooted<T>`: This is the fully general rooting API used for holding onto non-LIFO GC roots with arbitrary lifetimes. However, users must manually unroot them. Failure to manually unroot a `ManuallyRooted<T>` before it is dropped will result in the GC object (and everything it transitively references) leaking for the duration of the `Store`'s lifetime. This type is roughly similar to SpiderMonkey's [`PersistentRooted<T>`], although they avoid the manual-unrooting with internal mutation and shared references. (Our constraints mean we can't do those things, as mentioned explained above.) [`PersistentRooted<T>`]: http://devdoc.net/web/developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS::PersistentRooted.html At the end of the day, both `Rooted<T>` and `ManuallyRooted<T>` are just tagged indices into the store's `RootSet`. This indirection allows working with Rust's borrowing discipline (we use `&mut Store` to represent mutable access to the GC heap) while still allowing rooted references to be moved around without tying up the whole store in borrows. Additionally, and crucially, this indirection allows us to update the *actual* GC pointers in the `RootSet` and support moving GCs (again, as mentioned above).
Remove some transmute methods, assert that `VMExternRef`s are the only valid `VMGcRef`, etc.
Until we can add proper GC rooting.
Handling the fallout for this in the wasmtime-cpp repository I think that the requirement to take a Anyway that's a long way of asking, do you think we're going to indefinitely want to have a context argument on these functios into the future, even with the idea of an indexed heap? |
Yes, unfortunately, I do. At least in Rust. I could see a C/C++ specific approach to alleviating this, however... Indexed GC heaps are orthogonal to the rooting approach here. In order to support moving GCs, we need to either:
Option (2) is a pretty big constraint on GC implementations. We would have to really bend over backwards to support that in our planned copying collector. So focusing on option (1), there are basically two approaches available to us for implementing rooting APIs:
So, if we wanted to make the C/C++ API a little more ergonomic, we could support the intrusive list option. Ideally not in the Rust embedder API, but maybe with some |
Ok nah that all sounds good, no need to go the intrusive list route just yet, I think it's ok if things are slightly less ergonomic in the C++ bindings API unless a C++ wizard more adept than I can figure out a better solution |
The function you're calling takes a You'll need to use a move iterator to force moving out of the container. One of the annoying bits about C++ iterators. You could use ranges/views to make this more seemless. As for dropping the copy constructor/assignment I think that's fine, but it can be useful to have an explicit copy function that takes a context. |
Makes sense! In bytecodealliance/wasmtime-cpp#48 I ended up adding overloaded versions for |
Added a suggestion here: https://github.com/bytecodealliance/wasmtime-cpp/pull/48/files#r1525490495 |
This includes updates for: - bytecodealliance/wasmtime#8451 - bytecodealliance/wasmtime#8461 - bytecodealliance/wasmtime#8011 TODOs: - Allocating an `externref` can now fail (by `wasmtime_externref_new` returning `false`). Currently, we throw a `WasmtimeException` in that case. We need to check where that exception can be thrown, and whether we need to do any additional clean-up (e.g. when converting arguments for a function call). - Check whether it's ok to compare the `__private` field of externs (which has been remaned in the C API, previously it was `index`). - `anyref` type is not yet supported, but I'm not sure what exactly it is and whether we need to add it. Fixes bytecodealliance#315
* Update to recent Wasmtime C API changes regarding values. This includes updates for: - bytecodealliance/wasmtime#8451 - bytecodealliance/wasmtime#8461 - bytecodealliance/wasmtime#8011 TODOs: - Allocating an `externref` can now fail (by `wasmtime_externref_new` returning `false`). Currently, we throw a `WasmtimeException` in that case. We need to check where that exception can be thrown, and whether we need to do any additional clean-up (e.g. when converting arguments for a function call). - Check whether it's ok to compare the `__private` field of externs (which has been remaned in the C API, previously it was `index`). - `anyref` type is not yet supported, but I'm not sure what exactly it is and whether we need to add it. Fixes #315 * Follow-Up: Make fields private. * Ensure to clean-up `Value` instances (in arguments for a function call, and in results for an untyped callback) when e.g. allocating an `externref` fails. We don't need to do such a clean-up for unchecked function calls that use `ValueRaw` because in that case we don't own `externref` values. * Avoid accessing the `__private` fields in tests by checking the whole struct for equality (which is the case when all members are equal). * Use separate dictionaries for caching `Function`, `Memory`, and `Global` objects in the `Store`, which avoids having to explicitly accessing the `__private` field (because the whole struct is now compared). Additionally, it is more type-safe (since we don't need to cast the `object`).
Rooting prevents GC objects from being collected while they are actively being used.
We have a few sometimes-conflicting goals with our GC rooting APIs:
Safety: It should never be possible to get a use-after-free bug because the user misused the rooting APIs, the collector "mistakenly" determined an object was unreachable and collected it, and then the user tried to access the object. This is our highest priority.
Moving GC: Our rooting APIs should moving collectors (such as generational and compacting collectors) where an object might get relocated after a collection and we need to update the GC root's pointer to the moved object. This means we either need cooperation and internal mutability from individual GC roots as well as the ability to enumerate all GC roots on the native Rust stack, or we need a level of indirection.
Performance: Our rooting APIs should generally be as low-overhead as possible. They definitely shouldn't require synchronization and locking to create, access, and drop GC roots.
Ergonomics: Our rooting APIs should be, if not a pleasure, then at least not a burden for users. Additionally, the API's types should be
Sync
andSend
so that they work well with async Rust.For example, goals (3) and (4) are in conflict when we think about how to support (2). Ideally, for ergonomics, a root would automatically unroot itself when dropped. But in the general case that requires holding a reference to the store's root set, and that root set needs to be held simultaneously by all GC roots, and they each need to mutate the set to unroot themselves. That implies
Rc<RefCell<...>>
orArc<Mutex<...>>
! The former makes the store and GC root types notSend
and notSync
. The latter imposes synchronization and locking overhead. So we instead make GC roots indirect and require passing in a store context explicitly to unroot in the general case. This trades worse ergonomics for better performance and support for moving GC and async Rust.Okay, with that out of the way, this module provides two flavors of rooting API. One for the common, scoped lifetime case, and another for the rare case where we really need a GC root with an arbitrary, non-LIFO/non-scoped lifetime:
RootScope
andRooted<T>
: These are used for temporarily rooting GC objects for the duration of a scope. Upon exiting the scope, they are automatically unrooted. The internal implementation takes advantage of the LIFO property inherent in scopes, making creating and droppingRooted<T>
s andRootScope
s super fast and roughly equivalent to bump allocation.This type is vaguely similar to V8's
HandleScope
.Note that
Rooted<T>
can't be statically tied to its context scope via a lifetime parameter, unfortunately, as that would allow the creation and use of only oneRooted<T>
at a time, since theRooted<T>
would take a borrow of the whole context.This supports the common use case for rooting and provides good ergonomics.
ManuallyRooted<T>
: This is the fully general rooting API used for holding onto non-LIFO GC roots with arbitrary lifetimes. However, users must manually unroot them. Failure to manually unroot aManuallyRooted<T>
before it is dropped will result in the GC object (and everything it transitively references) leaking for the duration of theStore
's lifetime.This type is roughly similar to SpiderMonkey's
PersistentRooted<T>
, although they avoid the manual-unrooting with internal mutation and shared references. (Our constraints mean we can't do those things, as mentioned explained above.)At the end of the day, both
Rooted<T>
andManuallyRooted<T>
are just tagged indices into the store'sRootSet
. This indirection allows working with Rust's borrowing discipline (we use&mut Store
to represent mutable access to the GC heap) while still allowing rooted references to be moved around without tying up the whole store in borrows. Additionally, and crucially, this indirection allows us to update the actual GC pointers in theRootSet
and support moving GCs (again, as mentioned above).