From 3927d56ef25b13bb8554f623b1666af7720e05ed Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Wed, 4 Feb 2015 16:11:01 +0100 Subject: [PATCH 01/11] initial draft --- text/0000-box-and-in-for-stdlib.md | 520 +++++++++++++++++++++++++++++ 1 file changed, 520 insertions(+) create mode 100644 text/0000-box-and-in-for-stdlib.md diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md new file mode 100644 index 00000000000..fe91171e2fd --- /dev/null +++ b/text/0000-box-and-in-for-stdlib.md @@ -0,0 +1,520 @@ +- Start Date: 2015-02-04 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + + * Change placement-new syntax from: `box () ` instead + to: `in () `. + + * Change `box ` to an overloaded operator that chooses its + implementation based on the expected type. + + * Use unstable traits in `core::ops` for both operators, so that + libstd can provide support for the overloaded operators; the + traits are unstable so that the language designers are free to + revise the underlying protocol in the future post 1.0. + +# Motivation + +Goal 1: We want to support an operation analogous to C++'s placement +new, as discussed previously in [Placement Box RFC PR 470]. + +[Placement Box RFC PR 470]: https://github.com/rust-lang/rfcs/pull/470 + +Goal 2: We also would like to overload our `box` syntax so that more +types, such as `Rc` and `Arc` can gain the benefit of avoiding +intermediate copies (i.e. allowing expressions to install their result +value directly into the backing storage of the `Rc` or `Arc` +when it is created). + +However, during discussion of [Placement Box RFC PR 470], some things +became clear: + + * The syntax `in () ` is superior to `box () + ` for the operation analogous to placement-new. + + The proposed `in`-based syntax avoids ambiguities such as having + to write `box () ()` (or `box (alloc::HEAP) ()`) when + one wants to surround `` with parentheses. It allows the + parser to provide clearer error messages if a user accidentally + writes `in `. + + * It would be premature for Rust to commit to any particular + protocol for supporting placement-`in`. A number of participants in + the discussion of [Placement Box RFC PR 470] were unhappy with the + baroque protocol, especially since it did not support DST and + potential future language changes would allow the protocol + proposed there to be significantly simplified. + +Therefore, this RFC proposes a middle ground for 1.0: Support the +desired syntax, but do not provide stable support for end-user +implementations of the operators. The only stable ways to use the +overloaded `box ` or `in () ` operators will be in +tandem with types provided by the stdlib, such as `Box`. + +# Detailed design + +* Add traits to `core::ops` for supporting the new operators. + This RFC does not commit to any particular set of traits, + since they are not currently meant to be implemented outside + of the stdlib. (However, a demonstration of one working set + of traits is given in [Appendix A].) + + Any protocol that we adopt for the operators needs to properly + handle panics; i.e., `box ` must properly cleanup any + intermediate state if `` panics during its evaluation, + and likewise for `in () ` + + (See [Placement Box RFC PR 470] or [Appendix A] for discussion on + ways to accomplish this.) + +* Change `box ` from built-in syntax (tightly integrated with + `Box`) into an overloaded-`box` operator that uses the expected + return type to decide what kind of value to create. For example, if + `Rc` is extended with an implementation of the appropriate + operator trait, then + + ```rust + let x: Rc<_> = box format!("Hello"); + ``` + + could be a legal way to create an `Rc` without having to + invoke the `Rc::new` function. This will be more efficient for + building instances of `Rc` when `T` is a large type. (It is also + arguably much cleaner syntax to read, regardless of the type `T`.) + + Note that this change will require end-user code to no longer assume + that `box ` always produces a `Box`; such code will need to + either add a type annotation e.g. saying `Box<_>`, or will need to + call `Box::new()` instead of using `box `. + +* Add support for parsing `in () ` as the basis for the + placement operator. + + Remove support for `box () ` from the parser. + + Make `in () ` an overloaded operator that uses + the `` to determine what placement code to run. + +* The only stablized implementation for the `box ` operator + proposed by this RFC is `Box`. The question of which other types + should support integration with `box ` is a library design + issue and needs to go through the conventions and library + stabilization process. + + Similarly, this RFC does not propose *any* stablized implementation + for the `in () ` operator. (An obvious candidate for + `in () ` integration would be a `Vec::emplace_back` + method; but again, the choice of which such methods to add is a + library design issue, beyond the scope of this RFC.) + + (A sample implementation illustrating how to support the operators + on other types is given in [Appendix A].) + +# Drawbacks + +* End-users might be annoyed that they cannot add implementations of + the overloaded-`box` and placement-`in` operators themselves. But + such users who want to do such a thing will probably be using the + nightly release channel, which will not have the same stability + restrictions. + +# Alternatives + +* We could keep the `box () ` syntax. It is hard + to see what the advantage of that is, unless (1.) we can identify + many cases of types that benefit from supporting both + overloaded-`box` and placement-`in`, or unless (2.) we anticipate + some integration with `box` pattern syntax that would motivate using + the `box` keyword for placement. + +* Do nothing. I.e. do not even accept an unstable libstd-only protocol + for placement-`in` and overloaded-`box`. This would be okay, but + unfortunate, since in the past some users have identified + intermediate copies to be a source of inefficiency, and proper use + of `box ` and placement-`in` can help remove intermediate + copies. + +# Unresolved questions + +None + +# Appendices + +## Appendix A: sample operator traits +[Appendix A]: #appendix-a-sample-operator-traits + +The goal is to show that code like the following can be made to work +in Rust today via appropriate desugarings and trait definitions. + +```rust +fn main() { + use std::rc::Rc; + + let mut v = vec![1,2]; + in (v.emplace_back()) 3; // has return type `()` + println!("v: {:?}", v); // prints [1,2,3] + + let b4: Box = box 4; + println!("b4: {}", b4); + + let b5: Rc = box 5; + println!("b5: {}", b5); + + let b6 = in (HEAP) 6; // return type Box + println!("b6: {}", b6); +} +``` + +To demonstrate the above, this appendix provides code that runs today; +it demonstrates sample protocols for the proposed operators. +(The entire code-block below should work when e.g. cut-and-paste into +http::play.rust-lang.org ) + +```rust +#![feature(unsafe_destructor)] // (hopefully unnecessary soon with RFC PR 769) +#![feature(alloc)] + +// The easiest way to illustrate the desugaring is by implementing +// it with macros. So, we will use the macro `in_` for placement-`in` +// and the macro `box_` for overloaded-`box`; you should read +// `in_!( () )` as if it were `in () ` +// and +// `box_!( )` as if it were `box `. + +// The two macros have been designed to both 1. work with current Rust +// syntax (which in some cases meant avoiding certain associated-item +// syntax that currently causes the compiler to ICE) and 2. infer the +// appropriate code to run based only on either `` (for +// placement-`in`) or on the expected result type (for +// overloaded-`box`). + +macro_rules! in_ { + (($placer:expr) $value:expr) => { { + let p = $placer; + let mut place = ::protocol::Placer::make_place(p); + let raw_place = ::protocol::Place::pointer(&mut place); + let value = $value; + unsafe { + ::std::ptr::write(raw_place, value); + ::protocol::InPlace::finalize(place) + } + } } +} + +macro_rules! box_ { + ($value:expr) => { { + let mut place = ::protocol::BoxPlace::make_place(); + let raw_place = ::protocol::Place::pointer(&mut place); + let value = $value; + unsafe { + ::std::ptr::write(raw_place, value); + ::protocol::Boxed::finalize(place) + } + } } +} + +// Note that while both desugarings are very similar, there are some +// slight differences. In particular, the placement-`in` desugaring +// uses `InPlace::finalize(place)`, which is a `finalize` method that +// is overloaded based on the `place` argument (the type of which is +// derived from the `` input); on the other hand, the +// overloaded-`box` desugaring uses `Boxed::finalize(place)`, which is +// a `finalize` method that is overloaded based on the expected return +// type. Thus, the determination of which `finalize` method to call is +// derived from different sources in the two desugarings. + +// The above desugarings refer to traits in a `protocol` module; these +// are the traits that would be put into `std::ops`, and are given +// below. + +mod protocol { + +/// Both `in (PLACE) EXPR` and `box EXPR` desugar into expressions +/// that allocate an intermediate "place" that holds uninitialized +/// state. The desugaring evaluates EXPR, and writes the result at +/// the address returned by the `pointer` method of this trait. +/// +/// A `Place` can be thought of as a special representation for a +/// hypothetical `&uninit` reference (which Rust cannot currently +/// express directly). That is, it represents a pointer to +/// uninitialized storage. +/// +/// The client is responsible for two steps: First, initializing the +/// payload (it can access its address via `pointer`). Second, +/// converting the agent to an instance of the owning pointer, via the +/// appropriate `finalize` method (see the `InPlace`. +/// +/// If evaluating EXPR fails, then the destructor for the +/// implementation of Place to clean up any intermediate state +/// (e.g. deallocate box storage, pop a stack, etc). +pub trait Place { + /// Returns the address where the input value will be written. + /// Note that the data at this address is generally uninitialized, + /// and thus one should use `ptr::write` for initializing it. + fn pointer(&mut self) -> *mut Data; +} + +/// Interface to implementations of `in (PLACE) EXPR`. +/// +/// `in (PLACE) EXPR` effectively desugars into: +/// +/// ``` +/// let p = PLACE; +/// let mut place = Placer::make_place(p); +/// let raw_place = Place::pointer(&mut place); +/// let value = EXPR; +/// unsafe { +/// std::ptr::write(raw_place, value); +/// InPlace::finalize(place) +/// } +/// ``` +/// +/// The type of `in (PLACE) EXPR` is derived from the type of `PLACE`; +/// if the type of `PLACE` is `P`, then the final type of the whole +/// expression is `P::Place::Owner` (see the `InPlace` and `Boxed` +/// traits). +/// +/// Values for types implementing this trait usually are transient +/// intermediate values (e.g. the return value of `Vec::emplace_back`) +/// or `Copy`, since the `make_place` method takes `self` by value. +pub trait Placer { + /// `Place` is the intermedate agent guarding the + /// uninitialized state for `Data`. + type Place: InPlace; + + /// Creates a fresh place from `self`. + fn make_place(self) -> Self::Place; +} + +/// Specialization of `Place` trait supporting `in (PLACE) EXPR`. +pub trait InPlace: Place { + /// `Owner` is the type of the end value of `in (PLACE) EXPR` + /// + /// Note that when `in (PLACE) EXPR` is solely used for + /// side-effecting an existing data-structure, + /// e.g. `Vec::emplace_back`, then `Owner` need not carry any + /// information at all (e.g. it can be the unit type `()` in that + /// case). + type Owner; + + /// Converts self into the final value, shifting + /// deallocation/cleanup responsibilities (if any remain), over to + /// the returned instance of `Owner` and forgetting self. + unsafe fn finalize(self) -> Self::Owner; +} + +/// Core trait for the `box EXPR` form. +/// +/// `box EXPR` effectively desugars into: +/// +/// ``` +/// let mut place = BoxPlace::make_place(); +/// let raw_place = Place::pointer(&mut place); +/// let value = $value; +/// unsafe { +/// ::std::ptr::write(raw_place, value); +/// Boxed::finalize(place) +/// } +/// ``` +/// +/// The type of `box EXPR` is supplied from its surrounding +/// context; in the above expansion, the result type `T` is used +/// to determine which implementation of `Boxed` to use, and that +/// `` in turn dictates determines which +/// implementation of `BoxPlace` to use, namely: +/// `<::Place as BoxPlace>`. +pub trait Boxed { + /// The kind of data that is stored in this kind of box. + type Data; /* (`Data` unused b/c cannot yet express below bound.) */ + type Place; /* should be bounded by BoxPlace */ + + /// Converts filled place into final owning value, shifting + /// deallocation/cleanup responsibilities (if any remain), over to + /// returned instance of `Self` and forgetting `filled`. + unsafe fn finalize(filled: Self::Place) -> Self; +} + +/// Specialization of `Place` trait supporting `box EXPR`. +pub trait BoxPlace : Place { + /// Creates a globally fresh place. + fn make_place() -> Self; +} + +} // end of `mod protocol` + +// Next, we need to see sample implementations of these traits. +// First, `Box` needs to support overloaded-`box`: (Note that this +// is not the desired end implementation; e.g. the `BoxPlace` +// representation here is less efficient than it could be. This is +// just meant to illustrate that an implementation *can* be made; +// i.e. that the overloading *works*.) +// +// Also, just for kicks, I am throwing in `in (HEAP) ` support, +// though I do not think that needs to be part of the stable libstd. + +struct HEAP; + +mod impl_box_for_box { + use protocol as proto; + use std::mem; + use super::HEAP; + + struct BoxPlace { fake_box: Option> } + + fn make_place() -> BoxPlace { + let t: T = unsafe { mem::zeroed() }; + BoxPlace { fake_box: Some(Box::new(t)) } + } + + unsafe fn finalize(mut filled: BoxPlace) -> Box { + let mut ret = None; + mem::swap(&mut filled.fake_box, &mut ret); + ret.unwrap() + } + + impl<'a, T> proto::Placer for HEAP { + type Place = BoxPlace; + fn make_place(self) -> BoxPlace { make_place() } + } + + impl proto::Place for BoxPlace { + fn pointer(&mut self) -> *mut T { + match self.fake_box { + Some(ref mut b) => &mut **b as *mut T, + None => panic!("impossible"), + } + } + } + + impl proto::BoxPlace for BoxPlace { + fn make_place() -> BoxPlace { make_place() } + } + + impl proto::InPlace for BoxPlace { + type Owner = Box; + unsafe fn finalize(self) -> Box { finalize(self) } + } + + impl proto::Boxed for Box { + type Data = T; + type Place = BoxPlace; + unsafe fn finalize(filled: BoxPlace) -> Self { finalize(filled) } + } +} + +// Second, it might be nice if `Rc` supported overloaded-`box`. +// +// (Note again that this may not be the most efficient implementation; +// it is just meant to illustrate that an implementation *can* be +// made; i.e. that the overloading *works*.) + +mod impl_box_for_rc { + use protocol as proto; + use std::mem; + use std::rc::{self, Rc}; + + struct RcPlace { fake_box: Option> } + + impl proto::Place for RcPlace { + fn pointer(&mut self) -> *mut T { + if let Some(ref mut b) = self.fake_box { + if let Some(r) = rc::get_mut(b) { + return r as *mut T + } + } + panic!("impossible"); + } + } + + impl proto::BoxPlace for RcPlace { + fn make_place() -> RcPlace { + unsafe { + let t: T = mem::zeroed(); + RcPlace { fake_box: Some(Rc::new(t)) } + } + } + } + + impl proto::Boxed for Rc { + type Data = T; + type Place = RcPlace; + unsafe fn finalize(mut filled: RcPlace) -> Self { + let mut ret = None; + mem::swap(&mut filled.fake_box, &mut ret); + ret.unwrap() + } + } +} + +// Third, we want something to demonstrate placement-`in`. Let us use +// `Vec::emplace_back` for that: + +mod impl_in_for_vec_emplace_back { + use protocol as proto; + + use std::mem; + + struct VecPlacer<'a, T:'a> { v: &'a mut Vec } + struct VecPlace<'a, T:'a> { v: &'a mut Vec } + + pub trait EmplaceBack { fn emplace_back(&mut self) -> VecPlacer; } + + impl EmplaceBack for Vec { + fn emplace_back(&mut self) -> VecPlacer { VecPlacer { v: self } } + } + + impl<'a, T> proto::Placer for VecPlacer<'a, T> { + type Place = VecPlace<'a, T>; + fn make_place(self) -> VecPlace<'a, T> { VecPlace { v: self.v } } + } + + impl<'a, T> proto::Place for VecPlace<'a, T> { + fn pointer(&mut self) -> *mut T { + unsafe { + let idx = self.v.len(); + self.v.push(mem::zeroed()); + &mut self.v[idx] + } + } + } + impl<'a, T> proto::InPlace for VecPlace<'a, T> { + type Owner = (); + unsafe fn finalize(self) -> () { + mem::forget(self); + } + } + + #[unsafe_destructor] + impl<'a, T> Drop for VecPlace<'a, T> { + fn drop(&mut self) { + unsafe { + mem::forget(self.v.pop()) + } + } + } +} + +// Okay, that's enough for us to actually demonstrate the syntax! +// Here's our `fn main`: + +fn main() { + use std::rc::Rc; + // get hacked-in `emplace_back` into scope + use impl_in_for_vec_emplace_back::EmplaceBack; + + let mut v = vec![1,2]; + in_!( (v.emplace_back()) 3 ); + println!("v: {:?}", v); + + let b4: Box = box_!( 4 ); + println!("b4: {}", b4); + + let b5: Rc = box_!( 5 ); + println!("b5: {}", b5); + + let b6 = in_!( (HEAP) 6 ); // return type Box + println!("b6: {}", b6); +} +``` From a980f3d2b4fe110a34dde39b0fad9b9ee603d121 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Thu, 5 Feb 2015 18:40:11 +0100 Subject: [PATCH 02/11] mention more about alternative syntax. --- text/0000-box-and-in-for-stdlib.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index fe91171e2fd..23004660c2c 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -129,6 +129,26 @@ tandem with types provided by the stdlib, such as `Box`. some integration with `box` pattern syntax that would motivate using the `box` keyword for placement. +* A number of other syntaxes for placement have been proposed in the + past; see for example discussion on [RFC PR 405] as well as + [the previous placement RFC][RFC Surface Syntax Discussion]. + + The main constraints I want to meet are: + 1. Do not introduce ambiguity into the grammar for Rust + 2. Maintain left-to-right evaluation order (so the place should + appear to the left of the value expression in the text). + + But otherwise I am not particularly attached to any single + syntax. + + One particular alternative that might placate those who object + to placement-`in`'s `box`-free form would be: + `box (in ) `. + +[RFC PR 405]: https://github.com/rust-lang/rfcs/issues/405 + +[RFC Surface Syntax Discussion]: https://github.com/pnkfelix/rfcs/blob/fsk-placement-box-rfc/text/0000-placement-box.md#same-semantics-but-different-surface-syntax + * Do nothing. I.e. do not even accept an unstable libstd-only protocol for placement-`in` and overloaded-`box`. This would be okay, but unfortunate, since in the past some users have identified From c081b9381fa87e84df9e134b6cd1447c4db38bec Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Thu, 5 Feb 2015 18:52:04 +0100 Subject: [PATCH 03/11] add new drawback based on observing inference weakness for `Box`. --- text/0000-box-and-in-for-stdlib.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index 23004660c2c..c2ac3905729 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -120,6 +120,12 @@ tandem with types provided by the stdlib, such as `Box`. nightly release channel, which will not have the same stability restrictions. +* The currently-implemented desugaring does not infer that in an + expression like `box as Box`, the use of `box ` + should evaluate to some `Box<_>`. This may be due to a weakness + in the current desugaring, though pnkfelix suspects that it is + probably due to a weakness in compiler itself. + # Alternatives * We could keep the `box () ` syntax. It is hard From 10388e0e3742f9de5937284fe35f935113714b1a Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 12:19:15 +0100 Subject: [PATCH 04/11] switched placement-`in` syntax to `in PLACE { BLOCK }`. (Though I have to admit, this may actually not win out in this end. The `in (PLACE) EXPR` form has two fewer spaces, and putting more delimiters around the `EXPR` may be ill-advised, since it is often the more complicated construction already when compared to PLACE. --- text/0000-box-and-in-for-stdlib.md | 58 ++++++++++++++++++------------ 1 file changed, 35 insertions(+), 23 deletions(-) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index c2ac3905729..4116b893aa0 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -5,7 +5,7 @@ # Summary * Change placement-new syntax from: `box () ` instead - to: `in () `. + to: `in { }`. * Change `box ` to an overloaded operator that chooses its implementation based on the expected type. @@ -15,6 +15,13 @@ traits are unstable so that the language designers are free to revise the underlying protocol in the future post 1.0. +(Note that `` here denotes the interior of a block expression; i.e.: +``` + ::= [ ';' | ] * [ ] +``` +This is the same sense in which the `block` nonterminal is used in the +reference manual.) + # Motivation Goal 1: We want to support an operation analogous to C++'s placement @@ -31,14 +38,15 @@ when it is created). However, during discussion of [Placement Box RFC PR 470], some things became clear: - * The syntax `in () ` is superior to `box () + * Many syntaxes using the `in` keyword are superior to `box () ` for the operation analogous to placement-new. The proposed `in`-based syntax avoids ambiguities such as having to write `box () ()` (or `box (alloc::HEAP) ()`) when - one wants to surround `` with parentheses. It allows the - parser to provide clearer error messages if a user accidentally - writes `in `. + one wants to surround `` with parentheses. + It allows the parser to provide clearer error messages when + encountering `in ` (clearer compared to the previous + situation with `box `). * It would be premature for Rust to commit to any particular protocol for supporting placement-`in`. A number of participants in @@ -50,7 +58,7 @@ became clear: Therefore, this RFC proposes a middle ground for 1.0: Support the desired syntax, but do not provide stable support for end-user implementations of the operators. The only stable ways to use the -overloaded `box ` or `in () ` operators will be in +overloaded `box ` or `in { }` operators will be in tandem with types provided by the stdlib, such as `Box`. # Detailed design @@ -64,7 +72,7 @@ tandem with types provided by the stdlib, such as `Box`. Any protocol that we adopt for the operators needs to properly handle panics; i.e., `box ` must properly cleanup any intermediate state if `` panics during its evaluation, - and likewise for `in () ` + and likewise for `in { }` (See [Placement Box RFC PR 470] or [Appendix A] for discussion on ways to accomplish this.) @@ -89,12 +97,12 @@ tandem with types provided by the stdlib, such as `Box`. either add a type annotation e.g. saying `Box<_>`, or will need to call `Box::new()` instead of using `box `. -* Add support for parsing `in () ` as the basis for the +* Add support for parsing `in { }` as the basis for the placement operator. Remove support for `box () ` from the parser. - Make `in () ` an overloaded operator that uses + Make `in { }` an overloaded operator that uses the `` to determine what placement code to run. * The only stablized implementation for the `box ` operator @@ -104,8 +112,8 @@ tandem with types provided by the stdlib, such as `Box`. stabilization process. Similarly, this RFC does not propose *any* stablized implementation - for the `in () ` operator. (An obvious candidate for - `in () ` integration would be a `Vec::emplace_back` + for the `in { }` operator. (An obvious candidate for + `in { }` integration would be a `Vec::emplace_back` method; but again, the choice of which such methods to add is a library design issue, beyond the scope of this RFC.) @@ -135,6 +143,10 @@ tandem with types provided by the stdlib, such as `Box`. some integration with `box` pattern syntax that would motivate using the `box` keyword for placement. +* We could use the `in () ` syntax. An earlier + version of this RFC used this alternative. It is easier to implement + on the current code base, but I do not know of any other benefits. + * A number of other syntaxes for placement have been proposed in the past; see for example discussion on [RFC PR 405] as well as [the previous placement RFC][RFC Surface Syntax Discussion]. @@ -179,7 +191,7 @@ fn main() { use std::rc::Rc; let mut v = vec![1,2]; - in (v.emplace_back()) 3; // has return type `()` + in v.emplace_back() { 3 }; // has return type `()` println!("v: {:?}", v); // prints [1,2,3] let b4: Box = box 4; @@ -188,7 +200,7 @@ fn main() { let b5: Rc = box 5; println!("b5: {}", b5); - let b6 = in (HEAP) 6; // return type Box + let b6 = in HEAP { 6 }; // return type Box println!("b6: {}", b6); } ``` @@ -205,7 +217,7 @@ http::play.rust-lang.org ) // The easiest way to illustrate the desugaring is by implementing // it with macros. So, we will use the macro `in_` for placement-`in` // and the macro `box_` for overloaded-`box`; you should read -// `in_!( () )` as if it were `in () ` +// `in_!( () )` as if it were `in { }` // and // `box_!( )` as if it were `box `. @@ -257,7 +269,7 @@ macro_rules! box_ { mod protocol { -/// Both `in (PLACE) EXPR` and `box EXPR` desugar into expressions +/// Both `in PLACE { BLOCK }` and `box EXPR` desugar into expressions /// that allocate an intermediate "place" that holds uninitialized /// state. The desugaring evaluates EXPR, and writes the result at /// the address returned by the `pointer` method of this trait. @@ -282,22 +294,22 @@ pub trait Place { fn pointer(&mut self) -> *mut Data; } -/// Interface to implementations of `in (PLACE) EXPR`. +/// Interface to implementations of `in PLACE { BLOCK }`. /// -/// `in (PLACE) EXPR` effectively desugars into: +/// `in PLACE { BLOCK }` effectively desugars into: /// /// ``` /// let p = PLACE; /// let mut place = Placer::make_place(p); /// let raw_place = Place::pointer(&mut place); -/// let value = EXPR; +/// let value = { BLOCK }; /// unsafe { /// std::ptr::write(raw_place, value); /// InPlace::finalize(place) /// } /// ``` /// -/// The type of `in (PLACE) EXPR` is derived from the type of `PLACE`; +/// The type of `in PLACE { BLOCK }` is derived from the type of `PLACE`; /// if the type of `PLACE` is `P`, then the final type of the whole /// expression is `P::Place::Owner` (see the `InPlace` and `Boxed` /// traits). @@ -314,11 +326,11 @@ pub trait Placer { fn make_place(self) -> Self::Place; } -/// Specialization of `Place` trait supporting `in (PLACE) EXPR`. +/// Specialization of `Place` trait supporting `in PLACE { BLOCK }`. pub trait InPlace: Place { - /// `Owner` is the type of the end value of `in (PLACE) EXPR` + /// `Owner` is the type of the end value of `in PLACE { BLOCK }` /// - /// Note that when `in (PLACE) EXPR` is solely used for + /// Note that when `in PLACE { BLOCK }` is solely used for /// side-effecting an existing data-structure, /// e.g. `Vec::emplace_back`, then `Owner` need not carry any /// information at all (e.g. it can be the unit type `()` in that @@ -377,7 +389,7 @@ pub trait BoxPlace : Place { // just meant to illustrate that an implementation *can* be made; // i.e. that the overloading *works*.) // -// Also, just for kicks, I am throwing in `in (HEAP) ` support, +// Also, just for kicks, I am throwing in `in HEAP { }` support, // though I do not think that needs to be part of the stable libstd. struct HEAP; From 28a30fb32bfb795e9858d5e09d8abded625cffed Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 12:25:07 +0100 Subject: [PATCH 05/11] Extend Drawbacks section's discussion of type-inference and coercion interactions. --- text/0000-box-and-in-for-stdlib.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index 4116b893aa0..f357985866f 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -130,9 +130,23 @@ tandem with types provided by the stdlib, such as `Box`. * The currently-implemented desugaring does not infer that in an expression like `box as Box`, the use of `box ` - should evaluate to some `Box<_>`. This may be due to a weakness - in the current desugaring, though pnkfelix suspects that it is - probably due to a weakness in compiler itself. + should evaluate to some `Box<_>`. pnkfelix has found that this is + due to a weakness in compiler itself ([Rust PR 22012]). + + Likewise, the currently-implemented desugaring does not interact + well with the combination of type-inference and implicit coercions + to trait objects. That is, when `box ` is used in a context + like this: + ``` + fn foo(Box) { ... } + foo(box some_expr()); + ``` + the type inference system attempts to unify the type `Box` + with the return-type of `::protocol::Boxed::finalize(place)`. + This may also be due to weakness in the compiler, but that is not + immediately obvious. + +[Rust PR 22012]: https://github.com/rust-lang/rust/pull/22012 # Alternatives From 9f4b6e8d6426e7776901b1afcb2a7619630845e4 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 19:13:12 +0100 Subject: [PATCH 06/11] Added more complete example of drawback, taken from gists on PR 22086. --- text/0000-box-and-in-for-stdlib.md | 97 ++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index f357985866f..52f5fcd96fd 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -146,6 +146,10 @@ tandem with types provided by the stdlib, such as `Box`. This may also be due to weakness in the compiler, but that is not immediately obvious. + [Appendix B] has a complete code snippet (using a desugaring much like + the one found in the other appendix) that illustrates two cases of + interest where this weakness arises. + [Rust PR 22012]: https://github.com/rust-lang/rust/pull/22012 # Alternatives @@ -570,3 +574,96 @@ fn main() { println!("b6: {}", b6); } ``` + +## Appendix B: examples of interaction between desugaring, type-inference, and coercion +[Appendix B]: #appendix-b-examples-of-interaction-between-desugaring-type-inference-and-coercion + +The following code works with the current version of `box` syntax in Rust, but needs some sort +of type annotation in Rust as it stands today for the desugaring of `box` to work out. + +(The following code uses `cfg` attributes to make it easy to switch between slight variations +on the portions that expose the weakness.) + +``` +#![feature(box_syntax)] + +// NOTE: Scroll down to "START HERE" + +fn main() { } + +macro_rules! box_ { + ($value:expr) => { { + let mut place = ::BoxPlace::make(); + let raw_place = ::Place::pointer(&mut place); + let value = $value; + unsafe { ::std::ptr::write(raw_place, value); ::Boxed::fin(place) } + } } +} + +// (Support traits and impls for examples below.) + +pub trait BoxPlace : Place { fn make() -> Self; } +pub trait Place { fn pointer(&mut self) -> *mut Data; } +pub trait Boxed { type Place; fn fin(filled: Self::Place) -> Self; } + +struct BP { _fake_box: Option> } + +impl BoxPlace for BP { fn make() -> BP { make_pl() } } +impl Place for BP { fn pointer(&mut self) -> *mut T { pointer(self) } } +impl Boxed for Box { type Place = BP; fn fin(x: BP) -> Self { finaliz(x) } } + +fn make_pl() -> BP { loop { } } +fn finaliz(mut _filled: BP) -> Box { loop { } } +fn pointer(_p: &mut BP) -> *mut T { loop { } } + +// START HERE + +pub type BoxFn<'a> = Box; + +#[cfg(all(not(coerce_works1),not(coerce_works2),not(coerce_works3)))] +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { box_!( f ) } + +#[cfg(coerce_works1)] +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { box f } + +#[cfg(coerce_works2)] +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { let b: Box<_> = box_!( f ); b } + +#[cfg(coerce_works3)] // (This one assumes PR 22012 has landed) +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { box_!( f ) as BoxFn } + + +trait Duh { fn duh() -> Self; } + +#[cfg(all(not(duh_works1),not(duh_works2)))] +impl Duh for Box<[T]> { fn duh() -> Box<[T]> { box_!( [] ) } } + +#[cfg(duh_works1)] +impl Duh for Box<[T]> { fn duh() -> Box<[T]> { box [] } } + +#[cfg(duh_works2)] +impl Duh for Box<[T]> { fn duh() -> Box<[T]> { let b: Box<[_; 0]> = box_!( [] ); b } } +``` + +You can pass `--cfg duh_worksN` and `--cfg coerce_worksM` for suitable +`N` and `M` to see them compile. The point I want to get across is +this: It looks like both of these cases can be worked around via +explicit type ascription. Whether or not this is an acceptable cost +is a reasonable question. + +The `fn coerce` example comes from uses of the `fn combine_structure` function in the +`libsyntax` crate. + +The `fn duh` example comes from the implementation of the `Default` +trait for `Box<[T]>`. + +Both examples are instances of coercion; the `fn coerce` example is +trying to express a coercion of a `Box` to a `Box` +(i.e. making a trait-object), and the `fn duh` example is trying to +express a coercion of a `Box<[T; k]>` (specifically `[T; 0]`) to a +`Box<[T]>`. Both are going from a pointer-to-sized to a +pointer-to-unsized. + +(Maybe there is a way to handle both of these cases in a generic +fashion; pnkfelix is not sufficiently familiar with how coercions +currently interact with type-inference in the first place.) From 4ca4bd091b2206c245167d89fdfb917cad5fbcf3 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 19:17:25 +0100 Subject: [PATCH 07/11] add transcript of compilation attempts, so the reader can see concretely the quality (or lack thereof) of the error messages. --- text/0000-box-and-in-for-stdlib.md | 38 ++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index 52f5fcd96fd..ee35b1a116f 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -584,7 +584,7 @@ of type annotation in Rust as it stands today for the desugaring of `box` to wor (The following code uses `cfg` attributes to make it easy to switch between slight variations on the portions that expose the weakness.) -``` +```rust #![feature(box_syntax)] // NOTE: Scroll down to "START HERE" @@ -646,7 +646,41 @@ impl Duh for Box<[T]> { fn duh() -> Box<[T]> { let b: Box<[_; 0]> = box_!( [ ``` You can pass `--cfg duh_worksN` and `--cfg coerce_worksM` for suitable -`N` and `M` to see them compile. The point I want to get across is +`N` and `M` to see them compile. Here is a transcript with those attempts, +including the cases where type-inference fails in the desugaring. + +``` +% rustc /tmp/foo6.rs --cfg duh_works1 --cfg coerce_works1 +% rustc /tmp/foo6.rs --cfg duh_works1 --cfg coerce_works2 +% rustc /tmp/foo6.rs --cfg duh_works2 --cfg coerce_works1 +% rustc /tmp/foo6.rs --cfg duh_works1 +/tmp/foo6.rs:10:25: 10:41 error: the trait `Place` is not implemented for the type `BP` [E0277] +/tmp/foo6.rs:10 let raw_place = ::Place::pointer(&mut place); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:37:64: 37:76 note: expansion site +/tmp/foo6.rs:9:25: 9:41 error: the trait `core::marker::Sized` is not implemented for the type `core::ops::Fn()` [E0277] +/tmp/foo6.rs:9 let mut place = ::BoxPlace::make(); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:37:64: 37:76 note: expansion site +error: aborting due to 2 previous errors +% rustc /tmp/foo6.rs --cfg coerce_works1 +/tmp/foo6.rs:10:25: 10:41 error: the trait `Place<[_; 0]>` is not implemented for the type `BP<[T]>` [E0277] +/tmp/foo6.rs:10 let raw_place = ::Place::pointer(&mut place); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:52:51: 52:64 note: expansion site +/tmp/foo6.rs:9:25: 9:41 error: the trait `core::marker::Sized` is not implemented for the type `[T]` [E0277] +/tmp/foo6.rs:9 let mut place = ::BoxPlace::make(); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:52:51: 52:64 note: expansion site +error: aborting due to 2 previous errors +% +``` + +The point I want to get across is this: It looks like both of these cases can be worked around via explicit type ascription. Whether or not this is an acceptable cost is a reasonable question. From 9d5379708d85a850c9edbcbbe63671a9d4b7dedc Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 19:20:48 +0100 Subject: [PATCH 08/11] Add one more note about acription and a place where it is particularly annoying: fixed-length arrays. --- text/0000-box-and-in-for-stdlib.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index ee35b1a116f..a9fe32ccdaf 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -685,6 +685,12 @@ this: It looks like both of these cases can be worked around via explicit type ascription. Whether or not this is an acceptable cost is a reasonable question. + * Note that type ascription is especially annoying for the `fn duh` case, + where one needs to keep the array-length encoded in the type consistent + with the length of the array generated by the expression. + This might motivate extending the use of wildcard `_` within type expressions + to include wildcard constants, for use in the array length, i.e.: `[T; _]`. + The `fn coerce` example comes from uses of the `fn combine_structure` function in the `libsyntax` crate. From 4ec9be73df2949f5a7cb11634dea4a0f776b6c92 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 19:31:40 +0100 Subject: [PATCH 09/11] Add notes on using distinct feature-names for gating the two syntaxes separately. --- text/0000-box-and-in-for-stdlib.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index a9fe32ccdaf..36ed2000377 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -1,3 +1,4 @@ +- Feature Name: box_syntax, placement_in_syntax - Start Date: 2015-02-04 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -5,7 +6,7 @@ # Summary * Change placement-new syntax from: `box () ` instead - to: `in { }`. + to: `in { }`. * Change `box ` to an overloaded operator that chooses its implementation based on the expected type. @@ -15,6 +16,10 @@ traits are unstable so that the language designers are free to revise the underlying protocol in the future post 1.0. + * Feature-gate the placement-`in` syntax via the feature name `placement_in_syntax`. + + * The overloaded `box ` will reuse the `box_syntax` feature name. + (Note that `` here denotes the interior of a block expression; i.e.: ``` ::= [ ';' | ] * [ ] @@ -120,6 +125,11 @@ tandem with types provided by the stdlib, such as `Box`. (A sample implementation illustrating how to support the operators on other types is given in [Appendix A].) +* Feature-gate the two syntaxes under separate feature identifiers, so that we + have the option of removing the gate for one syntax without the other. + (I.e. we already have much experience with non-overloaded `box `, + but we have nearly no experience with placement-`in` as described here). + # Drawbacks * End-users might be annoyed that they cannot add implementations of From f6df740f054258d5e40deb9f792abda6ef2ba046 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 19:35:44 +0100 Subject: [PATCH 10/11] Note potential reason to keep `in () ` over `in { }` --- text/0000-box-and-in-for-stdlib.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index 36ed2000377..dd8e110d746 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -174,6 +174,7 @@ tandem with types provided by the stdlib, such as `Box`. * We could use the `in () ` syntax. An earlier version of this RFC used this alternative. It is easier to implement on the current code base, but I do not know of any other benefits. + (Well, maybe parentheses are less "heavyweight" than curly-braces?) * A number of other syntaxes for placement have been proposed in the past; see for example discussion on [RFC PR 405] as well as From 2880173f6da51ca1725f488ca0535a1b83c00f9b Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Mon, 9 Feb 2015 19:41:16 +0100 Subject: [PATCH 11/11] Decided I care enough about the overloading+inference+coercion question to actually encode my concern in the form of an unresolved question. --- text/0000-box-and-in-for-stdlib.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/text/0000-box-and-in-for-stdlib.md b/text/0000-box-and-in-for-stdlib.md index dd8e110d746..34a2c83420c 100644 --- a/text/0000-box-and-in-for-stdlib.md +++ b/text/0000-box-and-in-for-stdlib.md @@ -205,7 +205,18 @@ tandem with types provided by the stdlib, such as `Box`. # Unresolved questions -None +* Can the type-inference and coercion system of the compiler be + enriched to the point where overloaded `box` and `in` are + seamlessly usable? Or are type-ascriptions unavoidable when + supporting overloading? + + In particular, I am assuming here that some amount of current + weakness cannot be blamed on any particular details of the + sample desugaring. + + (See [Appendix B] for example code showing weaknesses in + `rustc` of today.) + # Appendices