-
Notifications
You must be signed in to change notification settings - Fork 1
/
rapport.tex
614 lines (514 loc) · 24.5 KB
/
rapport.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{listings}
\usepackage{url}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{mathpartir}
% \usepackage[french]{babel}
\lstset{language=caml}
% Some macros:
\newcommand{\tfun}[2]{\mathtt{fun}~#1 \to #2}
\newcommand{\tapply}[2]{#1~#2}
\newcommand{\tlet}[3]{\mathtt{let~}#1=#2\mathtt{~in~}#3}
\newcommand{\Deref}{\mathsf{Deref}}
\newcommand{\Unguarded}{\mathsf{Unguarded}}
\newcommand{\Guarded}{\mathsf{Guarded}}
\newcommand{\Delayed}{\mathsf{Delayed}}
\newcommand{\Unused}{\mathsf{Unused}}
\newcommand{\inspect}{\mathsf{inspect}}
\newcommand{\guard}{\mathsf{guard}}
\newcommand{\discard}{\mathsf{discard}}
\newcommand{\delay}{\mathsf{delay}}
% \newcommand{\mode}[1]{\text{\textbf{#1}}}
\title{Internship Report: Rethinking OCaml Recursive Values}
\author{Alban Reynaud}
\date{}
% I know it's cheating!
\addtolength{\oddsidemargin}{0.1cm}
\addtolength{\evensidemargin}{0.1cm}
\begin{document}
\maketitle
This work has been done in an internship in Inria Saclay, team Parsifal, during
June and July 2018. It was supervised by Gabriel Scherer.
\pagebreak
\section{Introduction}
OCaml is a statically-types functional language of the ML family. One of the
features of the language is the \lstinline|let rec| operator. It is usually
used to define recursive functions. For example, the following code defines the
factorial function:
\begin{lstlisting}
let rec fac x =
if x = 0 then 1
else x * (fac (x - 1))
\end{lstlisting}
This operator could also be used to define recursive values, such as the
following piece of code that defines an infinite list containing only ones.
\begin{lstlisting}
let rec x = 1 :: x
\end{lstlisting}
Note that this infinite list is a cyclic list. It uses a finite amount of
memory, because it is only composed by a cons-cell referencing itself.
However, not all definitions can be computed. For example, definitions like:
\begin{lstlisting}
let rec x = x + 1
\end{lstlisting}
should be rejected by the compiler. In fact, it is impossible to give a meaning
to this definition. Furthermore, the variable \lstinline|x|, which is typed as
an integer, is used in its own definition. To compute \lstinline|x + 1|, the
variable \lstinline|x| should have a known address and a value. As it may not
have been allocated, and that its value must be undefined, accepting such a
definition could lead to segfaults or non-determinism.
Functional languages have different ways to deal with recursive values. The
ML language rejects all recursive definitions that are not recursive functions.
The Haskell language, due to lazy evaluation, accepts every well-typed recursive
definitions, although some of them lead to infinite computation.
In OCaml, this check previously relied on \textit{ad hoc} rules
\cite{PreviousRules}, which caused several issues. In 2016, Jeremy Yallop
proposed a new check, designed as a simplified typing system \cite{Yallop}.
One of the problem is that his system is too complex, and it is hard to reason
about.
During this internship, I investigated the current system and (following the
intuition of Gabriel Scherer) and rewrote it as an inference rules system. Then,
I designed a simpler system. I implemented it in the compiler and successfully
managed to pass the test-suite. Also, examining Jeremy Yallop's code, Gabriel
and I found a bug and sent a pull request to fix it.
\pagebreak
\section{The current system}
The current system, since OCaml 4.06, is the one that has been proposed by
Jeremy Yallop. Here is an overview of this system.
\subsection{Access modes}
Three \textit{access modes} are used in this system. These modes are:
\begin{description}
\item[Deref] : the value of a variable is accessed.
\item[Guarded] : the address of a variable is either placed in a constructor,
either in an expression that is lazily evaluated, either unused.
\item[Unguarded] : the address of a variable is not used in a guarded
context.
\end{description}
For example:
\begin{itemize}
\item In the expression \lstinline|x + 1|, the variable \lstinline|x| is
accessed in the $\Deref$ mode because its value must be accessed before
being used as a function argument.
\item In the expression \lstinline|1 :: x|, the variable \lstinline|x| is
accessed in the $\Guarded$ mode. The address of \lstinline|x| is used in
the construction of the list, be it does not require the evaluation of
\lstinline|x|.
\item In the expression \lstinline|fun () -> x|, the variable \lstinline|x|
is accessed in the $\Guarded$ mode, because the body of this function is
lazily evaluated, so it does not require the evaluation of \lstinline|x|.
\item Consider the expression \lstinline |let rec x = let y = x in ...|
In the definition of the variable \lstinline|y|, the variable
\lstinline|x| is neither evaluated, nor placed in a constructor, nor lazily
evaluated. \lstinline|x| is considered being used in the $\Unguarded$ mode.
\end{itemize}
\subsection{Types and Environments}
In this system, types $T$ are maps that associate an access mode to each
variable.
Intuitively, the type of an expression gives the use of variables in the
definition of the expression.
Environments $\Gamma$ are maps that associate a type to each variable.
In this system, environments are inputs and types are are outputs. The type of
an expression is evaluated given an environment.
\pagebreak
\subsection{Operations}
Some operations are defined on modes:
\begin{displaymath}
\begin{array}{lll}
\guard(\Unguarded) & = & \Guarded \\
\guard(m \neq \Unguarded) & = & m \\
\inspect(m) & = & \Deref \\
\delay(m) & = & \Guarded \\
\discard(m) & = & \guard(m)
\end{array}
\end{displaymath}
These operations are generalized on types.
% Should I describe more precesily what I mean, or is it clear?
% [Gabriel] "precesily" is a typo that would have caught by
% a spellchecker. You should find out how to run a spellcheck from
% your favorite text editor, and run it from times to times. In Emacs
% I use "M-x ispell".
It is possible to define an total order on modes:
$$\Deref > \Unguarded > \Guarded$$
We have $m > m'$ if the mode $m$ is more constraining than the mode $m'$.
So, let $m + m'$ denote the maximal mode between $m$ and $m'$, with respect to
this order relation. Note that this operation is called \textit{prec} in
Jeremy Yallop's code.
We can generalize this definition for types and environments:
\begin{displaymath}
\begin{array}{lll}
(T + T') & : x \rightarrow & T(x) + T'(x) \\
(\Gamma + \Gamma') & : x \rightarrow & \Gamma(x) + \Gamma'(x)
\end{array}
\end{displaymath}
with:
\begin{displaymath}
\begin{array}{lll}
T(x) = \Guarded & \text{if} & x \notin dom(T) \\
\Gamma(x) = \emptyset & \text{if} & x \notin dom(\Gamma)
\end{array}
\end{displaymath}
\subsection{The Check Algorithm}
The check is performed recursively. Every node of the abstract syntax tree
\footnote{defined in the module \texttt{Typedtree}, that is defined is the
files \texttt{typing/typedtree.mli} and \texttt{typing/typedtree.ml}} is
checked. If it is a node with the form:
\begin{lstlisting}[mathescape=true]
let rec $x_1$ = $e_1$ and ... and $x_n$ = $e_n$
\end{lstlisting}
An environment $\Gamma = (x_i: (x_i: \Unguarded))^i$ is created.
Then, the types of the expressions $e_i$ are computed in the environment
$\Gamma$. If a variable is used in an unguarded mode (\textit{i.e} in mode
$\Unguarded$ or $\Deref$) in one of these types, then the compiler rejects this
expression.
\pagebreak
\subsection{Inference Rules}
Originally, Jeremy Yallop just gave an implementation of this
check. Gabriel Scherer proposed a first draft of inference rules
corresponding to this implementation \cite{GascheComment1, GascheComment2}.
This system works as follow: conclusions have the form
$$\Gamma \vdash e: T$$
where $\Gamma$ is an environment, $e$ is an expression and $T$ a type. As
explained above, the environment is the input and the type is the output.
The main rules are:
\begin{mathpar}
\infer*{ }
{\Gamma, x: T \vdash x: T}
\infer*
{x \notin \Gamma}
{\Gamma \vdash x: \emptyset}
\\
\infer*{\Gamma \vdash e_1: T_1 \\
\Gamma \vdash e_2: T_2}
{\Gamma \vdash \tapply{e_1}{e_2}: \inspect(T_1 + T_2)}
\\
\infer*{\Gamma \vdash e: T}
{\Gamma \vdash \tfun x e: \delay(T)}
\\
\infer*
{(\Gamma \vdash e_i: T_i)^i}
{\Gamma \vdash K(e_1, ..., e_n): \guard(T_1 + ... + T_n)}
\\
\infer*
{\Gamma \vdash e_1: T_1 \\
\Gamma, x : T_1 \vdash e_2: T_2}
{\Gamma \vdash \tlet{x}{e_1}{e_2}: \discard(T_1) + T_2}
\end{mathpar}
% Should I give an explanation of the `let` rule or of another rule?
% The `let` rule is the more complex, but I'm not sure to understand
% "everything" about it. I guess that if I try to describe it, my explanation
% would be unclear.
% The other rules are pretty straightforward, it could also be intersting to
% describe one of them. Someone who reads this report quickly could get the idea
% of the system without going deep into details.
For example, the \lstinline|let| rule works as follow:
\begin{itemize}
\item The right expression $e_1$ is typed under the environment $\Gamma$. The
returned type is $T_1$.
\item The environment in which $e_2$ is evaluated contains one more variable,
that is $x$ containing the value of $e_1$. So $x: T_1$ is added to $\Gamma$
and $e_2$ is typed under this environment. The returned type is $T_2$.
\item The type of the overall expression is $\discard(T_1) + T_2$. Indeed,
a variable dereferenced in $e_1$ is used in the mode $\Deref$ in the overall
expression However, a variable in used in the mode $\Unguarded$ in $e_1$,
its access mode in the overall expression is its access mode in $e_2$.
\end{itemize}
\pagebreak
\subsection{Bug in the implementation}
When examining the code, Gabriel and I found a bug. It appears in the recursive
class declarations check.
Indeed, it is possible to define classes with the syntax:
\begin{lstlisting}[mathescape=true]
class $a$ = let $x_1$ = $e_1$ and ... and $x_n$ = $e_n$ in $e$
\end{lstlisting}
The compiler check that the \lstinline|let| bindings are correct. In
particular, it is not allowed to use the class $a$ that is being defined in the
$\Deref$ mode (for example, when creating an object with the keyword
\lstinline|new|).
The problem was that in the previous check, the expressions affected to
variables bound in the \lstinline|let| and the class expression $e$ were
typed separately. For example, in the definition:
\begin{lstlisting}[mathescape=true]
class a = let x = $e_x$ in let y = $e_y$ in $e$
\end{lstlisting}
it is possible to make an incorrect use of the variable \lstinline|x| in the
expression $e_y$.
Gabriel found that the definition:
\begin{lstlisting}
class a = let x () = new a in let _y = x () in object end
\end{lstlisting}
caused a segfault.
I wrote a patch that fix this bug, added new tests and sent a pull request,
with the help of Gabriel \cite{PullRequest}.
% Note that this bug only appears since the version 4.06.
\section{A new system}
One of the problem of the previous system is that it is quite complicated. The
checker manipulates environments that are maps of maps. It makes it harder to
consider proving that expressions accepted by the checker are well-formed.
Gabriel proposed the idea of another simpler system in September 2017, under the
discussion about Jeremy Yallop's proposal \cite{SchererRules}.
The next part of my work consisted in developing such a system.
\subsection{Modes, types and environments}
In Jeremy's system, access modes refer to way variables are used in expressions.
In this new system, modes are also used to describe the context in which
expressions are defined and evaluated.
Also, a problem of Jeremy's system is that the $\Guarded$ mode has a lot of
different meanings. To be more expressive, this mode is split into three modes
in this system: $\Guarded$, $\Delayed$ and $\Unused$. So there are now five
modes:
\begin{description}
\item[Deref]: it refer to an expression that is evaluated and inspected, or to
a variable such that its value is inspected.
\item[Guarded]: it refer to a variable such that its address is used in some
data-structure (for example: a constructor) but the variable has not been
inspected. It also refer to an expression such that its address will used
in such a context.
\item[Delayed]: it refer to an expression such that its evaluation is delayed,
or to a variable only used in delayed expressions.
\item[Unused]: it refer to an unused variable.
\item[Unguarded]: it refer to a variable such that its value is not inspected,
and that its address is directly placed in a variable, not placed in some
data-structure.
\end{description}
For example:
\begin{itemize}
\item In the expression \lstinline|x + (y * z)|, both the variable
\lstinline|x| and the expression \lstinline|y * z| are used in the $\Deref$
mode.
\item In the expression \lstinline|(f x) :: y|, both the variable
\lstinline|y| and the expression \lstinline|f x| are used in the $\Guarded$
mode.
\item In the expression \lstinline|fun () -> 1 + x|, since the expression
\lstinline|1 + x| is evaluated just when this function is called, this
expression is used in the $\Delayed$ mode. Consequently, the variable
\lstinline|x| is also used in the $\Delayed$ mode.
\item In the expression \lstinline|fun x -> y + z|, as \lstinline|x| does not
appear in the expression \lstinline|y + z|, \lstinline|x| is used in the
mode $\Unused$ in the last expression.
\item In the expression \lstinline|let y = x in y|, as the locally defined
variable \lstinline|y| uses the value \lstinline|x| but does not require the
inspection of \lstinline|x|, as the address of \lstinline|x| is not placed
in some data-structure, and as \lstinline|y| is returned, the variable
\lstinline|x| is used in the $\Unguarded$ mode in the overall expression.
\item A more subtle example: in the expression
\lstinline|let rec f = fun () -> f in f| (the option \verb|-rectypes| must
be passed to the compiler to accept it), the address of the variable
\lstinline|f| is used in the environment stored in this closure.
The variable \lstinline|f| is used in the $\Guarded$ mode.
\end{itemize}
In this system, a type is simply a mode $m$ in which an expression is
evaluated. An environment $\Gamma$ is a map that associate to variables a mode.
By default, if a variable does appear in the domain of this map, this variable
is associated to $\Unused$.
Contrary to the previous system, types are used as inputs and environments are
used as outputs. Intuitively, given an expression $e$ and a mode $m$, the
computed environment $\Gamma$ indicates how the variable not locally bound in
$e$ must be accessed to use $e$ in the mode $m$.
\pagebreak
\subsection{Relations between modes}
With these new modes, we can similarly define an order relation:
$$\Deref > \Unguarded > \Guarded > \Delayed > \Unused$$
We can define how to join two environments $\Gamma$ and $\Gamma'$ in the same
way, and use the same notation $\Gamma + \Gamma'$.
A key concept in this new system is the composition of modes. If a variable $x$
is used in an expression $e$ in mode $m$, and that this expression $e$ is used
in an expression $e'$ in mode $e'$, let $m'[m]$ denote the mode in which the
variable $x$ is used in the expression $e'$. This is called the
\textit{composition} of modes $m'$ and $m$. We can compute the composition of two modes in this way: \begin{displaymath} \begin{array}{lll} \Deref [m] & = & \Deref \\
\Delayed [m] & = & \Delayed \\
m[\Unguarded] & = & m \\
\Unguarded [m] & = & m \\
\Guarded [m] & = & m \\
\end{array}
\end{displaymath}
% Gabriel: You could write e[x] for e with x that occurs in it, and
% e'[e[x]] for e' with e[x] that occurs in it.
% Alban: I don't know exactly how to use this notation, so I used my own words.
For example:
\begin{itemize}
\item In the expression \lstinline|fun () -> x + 1|, the variable
\lstinline|x| is used in the $\Deref$ mode in the expression
\lstinline|x + 1|. But this expression is used in the mode $\Delayed$ in the
expression \lstinline|fun () -> x + 1|. We can notice that the evaluation of
x could be delayed in this function, so \lstinline|x| is used in mode
$\Delayed$ in the overall expression. Indeed, we have
$\Delayed [\Deref] = \Delayed$.
\item In the expression \lstinline|Some (x + 1)|, even if the expression
\lstinline|x + 1| is used in mode $\Guarded$, the variable \lstinline|x|
must be inspected to evaluate both the expression \lstinline|x + 1| and the
overall expression. Indeed, we have $\Guarded [\Deref] = \Deref$.
\end{itemize}
\subsection{The Check Algorithm}
Similarly to the previous system, the check is performed when expressions of the
form
\begin{lstlisting}[mathescape=true]
let rec $x_1$ = $e_1$ and ... and $x_n$ = $e_n$
\end{lstlisting}
are encountered.
The types of the expressions $e_i$ are computed with the mode $m = \Unguarded$
set as the input. If a variable among the $e_i$ is used in an unguarded mode,
then the compiler rejects this expression.
\pagebreak
\subsection{Inference rules}
In this system, deduction rules have the form: $$\Gamma \vdash e: m$$
where $\Gamma$ is a context, $e$ is an expression and $m$ is a mode.
As explained above, the mode $m$ is set as the input, and the environment
$\Gamma$ is the output.
The main rules are:
\begin{mathpar}
\infer*{c~\text{is a constant}}
{\emptyset \vdash c: m}
\infer*{ }
{x: m \vdash x: m}
\\
\infer*{\Gamma, x: \_ \vdash e: m[\Delayed]}
{\Gamma \vdash \tfun x e : m}
\\
\infer*{\Gamma_1 \vdash e_1: m[\Deref] \\
\Gamma_2 \vdash e_2: m[\Deref]}
{\Gamma_1 + \Gamma_2 \vdash \tapply{e_1}{e_2}: m}
\\
\infer*{(\Gamma_i \vdash e_i: m[\Guarded])^i}
{\Gamma_1 + ... + \Gamma_n \vdash K(e_1, ..., e_n): m}
\\
\infer*{\Gamma_1 \vdash e_1: m[m_x + \Guarded] \\
\Gamma_2, x: m_x \vdash e_2: m}
{\Gamma_1 + \Gamma_2 \vdash \tlet{x}{e_1}{e_2}: m}
\end{mathpar}
% I hope this explanation is somehow understandable. Have you got some idea to
% make it clearer?
For example, the \lstinline|let| rule works as follow:
\begin{itemize}
\item The right expression $e_2$ is typed under the environment $m$, because
$e_2$ is the returned expression. The resulting environment is $\Gamma_2$.
\item As the right expression $e_1$ is evaluated in the surrounding mode $m$
(there is no delayed evaluation), it is evaluated at least in the mode
$m[\Guarded]$. As its value is placed inside the variable $x$ which later
is used in the mode $m_x$ in $e_2$, it is at least in the mode $m[m_x]$.
Therefore, $e_1$ is evaluated under the mode
$m[\Guarded] + m[m_x] = m[m_x + \Guarded]$. The resulting environment is
$\Gamma_1$.
% [Gabriel]: maybe you could start by explaining m[m_x] (with the
% example where m_x is Deref?), and then point out that even if
% (e1) is only used under a delay in e2, it is computed right away
% by the let-binding (OCaml is a strict language), so its usage mode
% must be at least Guarded.
\end{itemize}
So, the environment to evaluate this expression is $\Gamma_1 + \Gamma_2$.
\pagebreak
\subsection{Implementation}
Here are some extracts of the implementation:
\begin{lstlisting}
open Asttypes
open Typedtree
open Types
type mode =
Dereferenced
| Guarded
| Delayed
| Unused
| Unguarded
let compos : mode -> mode -> mode = (* ... *)
module Env :
sig
type t
val join : t -> t -> t
(* ... *)
end =
struct
module M = Map.Make(Ident)
type t = mode M.t
(* ... *)
end
(* ... *)
let rec expression : mode -> Typedtree.expression -> Env.t =
fun mode exp -> match exp.exp_desc with
(* ... *)
| Texp_record { fields = es;
extended_expression = eo;
representation = rep } ->
let m' = match rep with
| Record_float -> compos mode Dereferenced
| Record_unboxed _ -> mode
| Record_regular | Record_inlined _
| Record_extension -> compos mode Guarded
in
let field m = function
_, Kept _ -> Env.empty
| _, Overridden (_, e) -> expression m e
in
Env.join (array field m' es)
(option expression mode eo)
(* ... *)
and option : 'a. (mode -> 'a -> Env.t) -> mode -> 'a option
-> Env.t = (* ... *)
and list : 'a. (mode -> 'a -> Env.t) -> mode -> 'a list
-> Env.t = (* ... *)
and array : 'a. (mode -> 'a -> Env.t) -> mode -> 'a array
-> Env.t = (* ... *)
and value_binding : Env.t -> Ident.t list -> mode
-> Typedtree.value_binding
-> Env.t = (* ... *)
and pattern : Env.t -> mode -> pattern -> mode = (* ... *)
\end{lstlisting}
The main function is the function \lstinline|expression|. It takes a mode (type
\lstinline|mode|), an expression (type \lstinline|Typedtree.expression|) and
returns an environment (type \lstinline|Env.t|).
% [Gabriel]: on the other side, it could be nice to also show the
% signature of `structure_item `value_bindings` and `pattern`, and
% briefly mention them in the text, to show that there are more
% functions for various syntactic categories of the language.
There are also functions such as \lstinline|option|, \lstinline|list| and
\lstinline|array| that take a data structure (an option, a list, an array) of
elements, that map these elements to environments (given a mode and a function
passed in argument) and return the join of these environments. For example, in
the case \lstinline|Texp_record|, the array \lstinline|eo| contains the
expression of the elements of the records. The element are typed with the mode
mode \lstinline|m'| with the function \lstinline|expression|, and their
environments are joined.
Also, there are functions such as \lstinline|pattern| that deals with pattern,
\lstinline|value_binding| that deals with the sub-part $p = e$ (with $p$ a
pattern and $e$ an expression) in \lstinline|let| rules, as well functions that
deal with other syntactic features of OCaml.
As we can see, there are many more subtleties in the implementation of the
check than in the rules, which cover only a restricted subset of OCaml. For
example, because of an optimization performed by the compiler, floating-point
numbers in all-float records are unboxed. As a consequence, codes such as:
\begin{lstlisting}
type wrapper = {value: float}
let rec x = 0.0 and w = {value = x}
\end{lstlisting}
are unsafe and should be refused, but would be safe if the field
\lstinline|value| of the type \lstinline|wrapper| was not \lstinline|float|.
This is why in the case of float-only records, the field expressions are typed
with the mode $m' = m[\Deref]$.
There are also special cases for unboxed records and extension records.
Though my implementation do not cover all the constructions that can be found in
the language, I managed to make it pass all the tests from the compiler's
test-suite. Gabriel sent a pull request that is currently being discussed
\cite{NewSystemPullRequest}.
\section*{Conclusion and Future Work}
During this internship, I analysed the current recursive value checker in OCaml
and formalized it as an inference rule system. I developed a new system and
implemented it. I passed all the tests from the compiler's test-suite. With
Gabriel Scherer, I found a bug in the implementation and proposed a fix.
In the future, if this system is integrated in the compiler, it could be
interesting to continue this work in different ways. Some constraints of the
system could be relaxed to accept safe recursive definitions that are currently
rejected. It could also be interesting to investigate the size checker which is
currently pretty conservative.
This system could also be used to formally prove that accepted definitions are
correct and prevent some bugs. It could also be interesting to compare this
system to other approaches of the problem proposed in the literature, such as
extensions proposals \cite{jeannin2012cocaml} or type systems
\cite{dreyer2004type}.
Finally, the computed environments could be used by the compiler in other
modules.
Personally, I enjoyed this experience in a research team. Now, I feel more
familiar with inference rules design. With the implementation part, I learnt a
lot about git and GitHub. My only regret is that I would have liked to have
more time to explore other parts of the subject.
\bibliographystyle{plain}
\bibliography{sources}
\end{document}