-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintroduction.tex
268 lines (239 loc) · 10.9 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
\section{Introduction}
\label{sec:introduction}
Java is one of the most important programming languages. In 2019, Java
was the second most popular language according to a study
based on GitHub
data.\footnote{\url{https://www.businessinsider.de/international/the-10-most-popular-programming-languages-according-to-github-2018-10/}} Estimates
for the number of Java programmers range between 7.6 and 9 million.\footnote{\url{https://www.zdnet.com/article/programming-languages-python-developers-now-outnumber-java-ones/},
\url{http://infomory.com/numbers/number-of-java-developers/}} Java
has been around since 1995 and progressed through 16 versions.
Swarms of programmers have taken their first steps in Java. Many more
have been introduced to object-oriented programming through Java, as
it is among the first mainstream languages supporting
object-orientation. Java is a class-based language with static single inheritance among
classes, hence it has nominal types with a specified subtyping
hierarchy. Besides classes there are interfaces to characterize common
traits independent of the inheritance hierarchy. Since version J2SE~5.0,
the Java language supports F-bounded polymorphism in the form of generics.
Java is generally explicitly typed with some amendments introduced in
recent versions. That is,
variables, fields, method parameters, and method returns must be
adorned with their type. Figure~\ref{fig:intro-example-generic-fj}
contains a simple example with generics.
% from Featherweight Java \cite{DBLP:journals/toplas/IgarashiPW01}
\begin{figure}[tp]
% \begin{subfigure}[t]{0.55\textwidth}
% \begin{lstlisting}
% class Pair {
% Object fst;
% Object snd;
% Pair(Object fst, Object snd) {
% this.fst=fst;
% this.snd=snd;
% }
% Pair setfst(Object fst) {
% return new Pair(fst, this.snd);
% }
% }
% \end{lstlisting}
% \end{subfigure}
\begin{subfigure}[t]{0.49\linewidth}
\begin{lstlisting}[style=fgj]
class Pair<X,Y> {
X fst;
Y snd;
Pair<X,Y>(X fst, Y snd) {
this.fst=fst;
this.snd=snd;
}
<Z> Pair<Z,Y> setfst(Z fst) {
return new Pair(fst, this.snd);
}
Pair<Y,X> swap() {
return new Pair(this.snd, this.fst);
}
}
\end{lstlisting}
\caption{Featherweight Generic Java (FGJ)}
\label{fig:intro-example-generic-fj}
\end{subfigure}
~
\begin{subfigure}[t]{0.49\linewidth}
\begin{lstlisting}[style=tfgj]
class Pair<X,Y> {
X fst;
Y snd;
Pair(fst, snd) {
this.fst=fst;
this.snd=snd;
}
setfst(fst) {
return new Pair(fst, this.snd);
}
swap() {
return new Pair(this.snd, this.fst);
}
}
\end{lstlisting}
\caption{FGJ with global type inference (\TFGJ)}
\label{fig:intro-example-generic-jtx}
\end{subfigure}
\caption{Example code}
\label{fig:intro-example-code}
\end{figure}
Our type inference algorithm is able to infer generalized method types
as shown in figure \ref{fig:intro-example-generic-fj}.
Our algorithm deducts those types out of the input in \ref{fig:intro-example-generic-jtx}.
But this is only possible by processing each class by itself as explained in chapter \ref{chapter:syntax}.
Lets assume for example, inside the class \texttt{Pair} there would be a call to the method \texttt{setfst} like so:
\texttt{setfst(new Integer())}.
This would cause the method \texttt{setfst} to get the type \texttt{Pair<Integer, Y> setfst(Integer fst)}.
This is due to the type rule \texttt{GT-CLASS} in figure \ref{fig:expression-typing}.
Inside the same class methods cannot be used in a polymorphic way.
We have to make this restriction to combat polymorphic recursion, which would render type inference undecidable.
While the overhead of explicit types look reasonable in the example,
realistic programs often contain variable initializations like
the following:\footnote{Taken from
\url{https://stackoverflow.com/questions/4120216/map-of-maps-how-to-keep-the-inner-maps-as-maps/4120268}.}
\begin{lstlisting}[basicstyle=\ttfamily\fontsize{8}{9.6}\selectfont,style=fgj]
HashMap<String, HashMap<String, Object>> outerMap =
new HashMap<String, HashMap<String, Object>>();
\end{lstlisting}
Java's \emph{local variable type inference} (since version 10\footnote{\url{https://openjdk.java.net/jeps/286}}) deals
satisfactorily with examples like the initialization of
\lstinline{outerMap}.
In many initialization scenarios for local variables, Java infers their type
if it is obvious from the context. In the
example, we can write
\begin{lstlisting}[basicstyle=\ttfamily\fontsize{8}{9.6}\selectfont,style=fgj]
var outerMap = new HashMap<String, HashMap<String, Object>>();
\end{lstlisting}
because the constructor of the map spells out the type in
full. More specifically, ``obvious'' means that the right side of the initialization is
\begin{itemize}
\item a constant of known type (e.g., a string),
\item a constructor call, or
\item a method call (the return type is known from the method
signature).
\end{itemize}
The \lstinline{var} declaration can also be used for an iteration
variable where the type can be obtained from the elements of the
container or from the initializer.
Alternatively, if the variable is used as the method's return value,
its type can be obtained from the current method's signature.
However, there are still many places where the programmer must provide types. In
particular, an explicit type must be given for
\begin{itemize}
\item a field of a class,
\item a local variable without initializer or initialized to \lstinline{NULL},
\item a method parameter, or
\item a method return type.
\end{itemize}
In this paper, we study \emph{global type inference} for Java. Our aim
is to write code that omits most type annotations, except for class
headers and field types. Returning to the \lstinline{Pair} example, it
is sufficient to write the code in Figure~\ref{fig:intro-example-generic-jtx}
and global type inference fills in the rest so that the result is
equivalent to Figure~\ref{fig:intro-example-generic-fj}. Our
motivation to study global type inference is threefold.
\begin{itemize}
\item Programmers are relieved from writing down obvious types.
\item Programmers may write types that leak implementation details. The
\lstinline{outerMap} example provides a good example of this
problem. From a software engineering
perspective, it would be better to use a more general abstract type like
\begin{lstlisting}[basicstyle=\ttfamily\fontsize{8}{9.6}\selectfont,style=fgj]
Map<String, Map<String, Object>> outerMap = ...
\end{lstlisting}
Global type inference finds most general types.
\item Programmers may write types that are more specific than
necessary instead of using generic types. Here, type
inference helps programmers to find the most general type. Suppose
we wanted to add a static method \texttt{eqPair} for pairs of integers to the
\lstinline/Pair/ class.
\begin{lstlisting}[basicstyle=\ttfamily\fontsize{8}{9.6}\selectfont,style=fgj]
boolean eqPair (Pair<Integer,Integer> p) {
return p.fst.equals(p.snd);
}
\end{lstlisting}
With global type inference it is sufficient to write the code on the
left of Figure~\ref{fig:equal-pair} and obtain the FGJ code with the most general type on the right.
\end{itemize}
\begin{figure}[t]
\begin{minipage}[t]{0.49\linewidth}
\begin{lstlisting}[style=tfgj]
eqPair (p) {
return p.fst.equals(p.snd);
}
\end{lstlisting}
\end{minipage}
\begin{minipage}[t]{0.49\linewidth}
\begin{lstlisting}[style=fgj]
<T> boolean eqPair (Pair<T,T> p){
return p.fst.equals<T>(p.snd);
}
\end{lstlisting}
\end{minipage}
\caption{\lstinline{eqPair} in \TFGJ and FGJ}
\label{fig:equal-pair}
\end{figure}
% \item Sometimes, it can be hard to find a correct typing at all.
% \todo[inline]{example?}
To make our investigation palatable, we focus on global type inference for Featherweight
Generic Java \cite{DBLP:journals/toplas/IgarashiPW01} (FGJ), a
functional Java core language with full support for generics. Our type inference algorithm
applies to FGJ programs that specify the full class header and all field types,
but omit all method signatures.
Given this input, our algorithm
infers a set of most general method signatures (parameter types and return types).
Inferred types are generic as much as possible and may contain
recursive upper bounds.
The inferred signatures have the following round-trip property
(relative completeness). If we
start with an FGJ program that does not make use of polymorphic
recursion (see Section~\ref{sec:polym-recurs}), strip all types from
method signatures, and run the algorithm on the
resulting stripped program, then at least one of the inferred typings is equivalent or more
general than the types in the original FGJ program.
\subsection*{Contributions}
\label{sec:contributions}
We specify syntax and type system of the language \FGJGT, which drops all method type
annotations from FGJ and the typing of which rules out polymorphic
recursion. This language is amenable to polymorphic type inference and
each \FGJGT program can be completed to an FGJ program (see Section~\ref{sec:preliminaries}).
We characterize uses of polymorphic recursion in FGJ and their impact
on signatures of generic methods (Section~\ref{sec:polym-recurs-form}).
We define a constraint-based algorithm that performs global type
inference for \FGJGT. This algorithm is sound and relatively complete
for FGJ programs without polymorphic recursion
(Sections~\ref{sec:type-infer-algor} and~\ref{sec:unify}). Our algorithm improves
on previous attempts at type inference for Java in the literature as detailed in
Section~\ref{sec:related-work}.
We investigate the complexity of global type inference and show its NP-completeness (Section~\ref{sec:complexity}).
We implemented a prototype of the type inference algorithm, which we
plan to submit for artifact evaluation.
\if0
\commentary{PT what else do you want to say? Do you want to point to
your implementation for full Java? Can we point to Andi's prototype
(check conditions in CFP)?
Submit as an artifact?}
Our contributions in this paper are an algorithm for global type inference for
FGJ. Therefore we redraft the typing rules of FGJ such that the programs without
type annotations could be correct and in this case polymorphic recursion is
excluded. We prove soundness and completeness of the algorithm about the rules.
The type inference algorithm is reduced to a constraint solving type unification
algorithm. We improved our type unification algorithm such that constraints
of the form $a \lessdot ty$, where $a$ is type variable and $ty$ is is a
non-type variable type,
are not resolved rather converted to bounded type parameters \texttt{a extends
ty}. This implicates an enormes reduction of solutions of the type
unification algorithm without restricting the generality of typings of
FGJ-programs.
We show that Global type inference for FGJ is NP complete.
Finally we have done an implementation of global type inference for a reduced
set of full Java.
\fi
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "TIforGFJ"
%%% End: