2
2
3
3
This discussion is meant to focus on the question: Which invariants derived from
4
4
types are there that the compiler expects to be * always* maintained, and
5
- (equivalently) that unsafe code must * always* uphold. This is what is called
6
- "validity invariant" in
5
+ (equivalently) that unsafe code must * always* uphold (or else cause undefined
6
+ behavior)? This is what is called "validity invariant" in
7
7
[ Ralf's blog post] ( https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html ) ,
8
8
but we might also decide to change that name.
9
9
@@ -19,28 +19,40 @@ generating LLVM IR. For example, we emit `aligned` attributes pretty much any
19
19
time we can, which means it is probably a good idea to say that valid references
20
20
must be aligned.
21
21
22
- ### Extent of "always"
23
-
24
- One point we will have to figure out is what exactly "always" means. Thinking
25
- in terms of a semantics for MIR, data most probably needs to be valid any time
26
- it is copied, which primarily happens when executing assignment statements (the
27
- other cases are passing of function arguments and return values). However, it
28
- is less clear whether merely creating a place without accessing the data inside
29
- (such as in ` &*x ` ) should require the data to be valid.
22
+ Finally, another consideration to take into account is that ruling out certain
23
+ behavior can be great for bug finding. For example, if arithmetic overflow is
24
+ defined to have two's-complement-behavior, then bug finding tools can no longer
25
+ use overflow as an indication of a software bug. (This is a real problem with
26
+ unsigned integer arithmetic in C/C++.)
30
27
31
28
### Possible bit patterns
32
29
33
30
The validity invariant of a type is, basically, a set of bit patterns that is
34
31
allowed to occur at that type. ("Basically" because the invariant may also be
35
32
allowed to depend on memory.) To discuss this properly, we need to first agree
36
- on what "bit patterns" even are. It is certainly not enough to just consider
37
- sequences of 0 and 1, because we also need to take uninitialized data into
38
- account. For the purpose of this discussion, I think it is sufficient to
39
- consider every bit as being either 0, 1 or uninitialized.
33
+ on what "bit patterns" even are. It is not enough to just consider sequences of
34
+ 0 and 1, because we also need to take uninitialized data into account. For the
35
+ purpose of this discussion, I think it is sufficient to consider every bit as
36
+ being either 0, 1 or uninitialized.
40
37
[ That is not always sufficient] ( https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html ) ,
41
38
but I think we can mostly ignore the extra complications introduced by pointer
42
39
values.
43
40
41
+ ### Extent of "always"
42
+
43
+ One point we will have to figure out is what exactly "always" means. Thinking
44
+ in terms of a semantics for MIR, data most probably needs to be valid any time
45
+ it is copied, which primarily happens when executing assignment statements (the
46
+ other cases are passing of function arguments and return values). However, it
47
+ is less clear whether merely creating a place without accessing the data inside
48
+ (such as in ` &*x ` ) should require the data to be valid.
49
+
50
+ The entire discussion here is only about validity invariants that have to hold
51
+ when the compiler considers a variable initialized. For example, ` let b: bool; `
52
+ is completely okay to not be initialized because the compiler knows about that;
53
+ ` let b: bool = mem::uninitialized(); ` however copies uninitialized data at type
54
+ ` bool ` and hence violates ` bool ` 's validity invariant.
55
+
44
56
## Goals
45
57
46
58
* For every primitive type, determine which assumptions (if any) the compiler
@@ -57,10 +69,11 @@ values.
57
69
To start, we will create threads for each major category of types.
58
70
59
71
* Integers and floating point types
60
-
61
72
* Do we allow values that contain uninitialized bits? If yes, what are the
62
73
rules for arithmetic and logical operations involving uninitialized bits,
63
- e.g. in cases like ` x * 0 ` ?
74
+ e.g. in cases like ` x * 0 ` ? There is also some interaction with bug finding
75
+ here: tools can only flag uninitialized data at integer type as a bug if we
76
+ do not allow that to happen in unsafe code.
64
77
65
78
* Raw pointers
66
79
* Do we allow values that contain uninitialized bits?
@@ -81,6 +94,10 @@ To start, we will create threads for each major category of types.
81
94
* Presumably, these must be non-NULL. Anything else? Can there ever be
82
95
uninitialized bits?
83
96
97
+ * Booleans
98
+ * Is there anything to say besides: A ` bool ` must be ` 0x0 ` or ` 0x1 ` ? Do we
99
+ allow the remaining bits to be uninitialized?
100
+
84
101
* Unions
85
102
* Do we make any restrictions here, or are unions just "bags of bits" that may
86
103
contain anything? That would mean we can do no layout optimizations.
@@ -94,9 +111,9 @@ To start, we will create threads for each major category of types.
94
111
* Is there anything to say besides: All fields must be valid at their
95
112
respective types?
96
113
* The padding between fields can be anything, including uninitialized. It was
97
- * [ recently determined] [ generators-maybe-uninit ] that generators behave
98
- * different from other aggregates here. Are we okay with that? Should we push
99
- * for generator fields to reflect this in their types?
114
+ [ recently determined] [ generators-maybe-uninit ] that generators behave
115
+ different from other aggregates here. Are we okay with that? Should we push
116
+ for generator fields to reflect this in their types?
100
117
101
118
[ RFC2582 ] : https://github.com/rust-lang/rfcs/pull/2582
102
119
[ generators-maybe-uninit ] : https://github.com/rust-lang/rust/pull/56100
0 commit comments