tweaks

RalfJung · RalfJung · commit c65b87bfbce9 · 2018-12-10T18:18:54.000+01:00
diff --git a/active_discussion/validity.md b/active_discussion/validity.md
@@ -2,8 +2,8 @@
 
 This discussion is meant to focus on the question: Which invariants derived from
 types are there that the compiler expects to be *always* maintained, and
-(equivalently) that unsafe code must *always* uphold.  This is what is called
-"validity invariant" in
+(equivalently) that unsafe code must *always* uphold (or else cause undefined
+behavior)?  This is what is called "validity invariant" in
 [Ralf's blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html),
 but we might also decide to change that name.
 
@@ -19,28 +19,40 @@ generating LLVM IR.  For example, we emit `aligned` attributes pretty much any
 time we can, which means it is probably a good idea to say that valid references
 must be aligned.
 
-### Extent of "always"
-
-One point we will have to figure out is what exactly "always" means.  Thinking
-in terms of a semantics for MIR, data most probably needs to be valid any time
-it is copied, which primarily happens when executing assignment statements (the
-other cases are passing of function arguments and return values).  However, it
-is less clear whether merely creating a place without accessing the data inside
-(such as in `&*x`) should require the data to be valid.
+Finally, another consideration to take into account is that ruling out certain
+behavior can be great for bug finding.  For example, if arithmetic overflow is
+defined to have two's-complement-behavior, then bug finding tools can no longer
+use overflow as an indication of a software bug.  (This is a real problem with
+unsigned integer arithmetic in C/C++.)
 
 ### Possible bit patterns
 
 The validity invariant of a type is, basically, a set of bit patterns that is
 allowed to occur at that type.  ("Basically" because the invariant may also be
 allowed to depend on memory.)  To discuss this properly, we need to first agree
-on what "bit patterns" even are.  It is certainly not enough to just consider
-sequences of 0 and 1, because we also need to take uninitialized data into
-account.  For the purpose of this discussion, I think it is sufficient to
-consider every bit as being either 0, 1 or uninitialized.
+on what "bit patterns" even are.  It is not enough to just consider sequences of
+0 and 1, because we also need to take uninitialized data into account.  For the
+purpose of this discussion, I think it is sufficient to consider every bit as
+being either 0, 1 or uninitialized.
 [That is not always sufficient](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html),
 but I think we can mostly ignore the extra complications introduced by pointer
 values.
 
+### Extent of "always"
+
+One point we will have to figure out is what exactly "always" means.  Thinking
+in terms of a semantics for MIR, data most probably needs to be valid any time
+it is copied, which primarily happens when executing assignment statements (the
+other cases are passing of function arguments and return values).  However, it
+is less clear whether merely creating a place without accessing the data inside
+(such as in `&*x`) should require the data to be valid.
+
+The entire discussion here is only about validity invariants that have to hold
+when the compiler considers a variable initialized.  For example, `let b: bool;`
+is completely okay to not be initialized because the compiler knows about that;
+`let b: bool = mem::uninitialized();` however copies uninitialized data at type
+`bool` and hence violates `bool`'s validity invariant.
+
 ## Goals
 
 * For every primitive type, determine which assumptions (if any) the compiler
@@ -57,10 +69,11 @@ values.
 To start, we will create threads for each major category of types.
 
 * Integers and floating point types
-
   * Do we allow values that contain uninitialized bits?  If yes, what are the
     rules for arithmetic and logical operations involving uninitialized bits,
-    e.g. in cases like `x * 0`?
+    e.g. in cases like `x * 0`?  There is also some interaction with bug finding
+    here: tools can only flag uninitialized data at integer type as a bug if we
+    do not allow that to happen in unsafe code.
 
 * Raw pointers
   * Do we allow values that contain uninitialized bits?
@@ -81,6 +94,10 @@ To start, we will create threads for each major category of types.
   * Presumably, these must be non-NULL.  Anything else?  Can there ever be
     uninitialized bits?
 
+* Booleans
+  * Is there anything to say besides: A `bool` must be `0x0` or `0x1`?  Do we
+    allow the remaining bits to be uninitialized?
+
 * Unions
   * Do we make any restrictions here, or are unions just "bags of bits" that may
     contain anything?  That would mean we can do no layout optimizations.
@@ -94,9 +111,9 @@ To start, we will create threads for each major category of types.
   * Is there anything to say besides: All fields must be valid at their
     respective types?
   * The padding between fields can be anything, including uninitialized.  It was
-  * [recently determined][generators-maybe-uninit] that generators behave
-  * different from other aggregates here.  Are we okay with that?  Should we push
-  * for generator fields to reflect this in their types?
+    [recently determined][generators-maybe-uninit] that generators behave
+    different from other aggregates here.  Are we okay with that?  Should we push
+    for generator fields to reflect this in their types?
 
 [RFC2582]: https://github.com/rust-lang/rfcs/pull/2582
 [generators-maybe-uninit]: https://github.com/rust-lang/rust/pull/56100