-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdino.tex
1594 lines (1358 loc) · 74.9 KB
/
dino.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%-----------------------------------------------------------------------------
%
% Template for sigplanconf LaTeX Class
%
% Name: sigplanconf-template.tex
%
% Purpose: A template for sigplanconf.cls, which is a LaTeX 2e class
% file for SIGPLAN conference proceedings.
%
% Guide: Refer to "Author's Guide to the ACM SIGPLAN Class,"
% sigplanconf-guide.pdf
%
% Author: Paul C. Anagnostopoulos
% Windfall Software
% 978 371-2316
%
% Created: 15 February 2005
%
%-----------------------------------------------------------------------------
\documentclass[preprint]{sigplanconf}
% The following \documentclass options may be useful:
% preprint Remove this option only once the paper is in final form.
% 10pt To set in 10-point type instead of 9-point.
% 11pt To set in 11-point type instead of 9-point.
% authoryear To obtain author/year citation style instead of numeric.
\usepackage{amsmath}
\usepackage{graphicx}
\begin{document}
\special{papersize=8.5in,11in}
\setlength{\pdfpageheight}{\paperheight}
\setlength{\pdfpagewidth}{\paperwidth}
\conferenceinfo{CONF '15}{October 27, 2015, Pittsburgh, Pennsylvania, USA}
\copyrightyear{2015}
\copyrightdata{978-1-nnnn-nnnn-n/yy/mm}
\doi{nnnnnnn.nnnnnnn}
% Uncomment one of the following two, if you are not going for the
% traditional copyright transfer agreement.
%\exclusivelicense % ACM gets exclusive license to publish,
% you retain copyright
\permissiontopublish % ACM gets nonexclusive license to publish
% (paid open-access papers,
% short abstracts)
% \titlebanner{A Draft} % These are ignored unless
\preprintfooter{Dino Implementation} % 'preprint' option specified.
\title{Implementation of the Programming Language Dino -- A Case Study in Dynamic Language Performance}
%\subtitle{Subtitle Text, if any}
\authorinfo{Vladimir N. Makarov}
{Red Hat}
%\authorinfo{Name2\and Name3}
% {Affiliation2/3}
% {Email2/3}
\maketitle
\begin{abstract}
% XXX MOVED TO INTRODUCTION: "Programming language Dino has a long history. ..."
% TODOXXX
The article gives a brief overview of the current state of programming
language Dino in order to see where its stands between other dynamic
programming languages. Then it describes the current implementation,
used tools and major implementation decisions including how to
implement a stable, portable and simple JIT compiler.
We study the effect of major implementation decisions on the
performance of Dino on x86-64, AARCH64, and Powerpc64. In brief, the performance
of some model benchmark on x86-64 was improved by \textbf{3.1}
times after moving from a stack based virtual machine to
a register-transfer architecture, a further \textbf{1.5} times by adding byte code
combining, a further \textbf{2.3} times through the use of JIT, and a further \textbf
{4.4} times by performing type inference with byte code specialization, with a resulting overall performance improvement of about \textbf{47} times. To put
these results in context, we include performance comparisons
% XXX possibly list languages as Ruby, Python (including Python 3 and PyPy) and JavaScript
of Dino with widely used implementations of Ruby, Python 3, PyPy
and JavaScript on the three platforms mentioned above.
% XXX '... on the three platforms mentioned above.'
% TODOXXX MOVED FROM INTRODUCTION
The goal of this article is to share the experience of Dino
implementation with other dynamic language implementors in hope that
it can help them to improve implementation of popular dynamic
languages to make them probably faster and more portable, using less
developer resources, and may be to avoid some mistakes and wrong directions
which were experienced during Dino development.
\end{abstract}
\category{D.3.4}{Processors}{Interpreters}
% general terms are not compulsory anymore,
% you may leave them out
\terms
Languages, Performance, Algorithms
\keywords
Dynamic Language, Language Design, Interpreters, Optimizations, JIT
\section{Introduction}
% TODOXXXX
Programming language Dino has a long history. Originally it was
designed, implemented, and used as a simple dynamically typed
scripting language in a small computer game company. During its long
life, the language and its implementation were dramatically changed.
The implementation was changed many times by using new tools, new
virtual machines, by adding new optimizations. Therefore its
development could be a good case study of dynamic language
implementation, simple, portable, and with a performance competitive
with JIT implementations of other popular dynamic languages.
% XXX REPLACED BY PARAGRAPH FROM ABSTRACT:
% Programming language Dino has long history.
%Originally it was designed and implemented as a simple scripting
%language for describing dinosaurs movements in small Russian computer
%game company ANIMATEK. During its long life, the language and its
%implementation were dramatically changed.
% XXX MOVED TO ABSTRACT: "The goal of this article..."
% The goal of this article is to share the experience of Dino
%implementation with other dynamic language implementors in hope that
%it can help them to improve implementation of popular dynamic
%languages to make them probably faster and more portable, using less
%resources, and may be to avoid some mistakes and wrong directions
%which were experienced during Dino development.
The first part of the article contains brief overview of the current
version of the programming language Dino.
We describe the basic design of the language, its type system and particular features
such as multithreading, heterogeneous extensible arrays, array slices, associative tables,
first-class functions, pattern-matching, as well as Dino's unique approach to class
inheritance via the `use' class composition operator.
The second part of the article describes Dino's implementation.
We outline the overall structure of the Dino interpreter and just-in-time compiler (JIT) and the design of the byte code
and major optimizations. We also describe implementation details such as the garbage collection system, the algorithms underlying Dino's data structures,
Dino's built-in profiling system,
and the various tools and libraries used in the implementation. Our goal is to give an overview of the major implementation decisions
involved in a dynamic language, including how to implement a stable and portable JIT.
The third part studies the effect of design choices and major
optimizations on Dino's performance on x86-64, AARCH64, and Powerpc64.
We examine the choice to use a register-transfer virtual machine instead of a
stack-based design and the effects of byte code combining, just-in-time compilation,
and type inference with byte code specialization.
The fourth part compares Dino's performance with other
popular dynamic language implementations, in order to better understand
the combined effect of all described optimizations. Performance results
are given for the same three platforms as in the previous part.
We conclude by giving possible directions of further
research based on Dino, whether by testing new optimizations
or studying ways to improve the JIT.
\section{Language Overview}
In this section we describe % REPLACE 'of'
the current version of the Dino
programming language.
This section does not aim to give a formal or comprehensive specification, and there is a possibility that the omitted details might raise questions for the reader. However, our goal is to give a high-level language overview in order to see how the features of the language compare to those of similar
dynamic languages and to understand how % suggest 'to understand that' ???
the implementation methods used for Dino are applicable to other languages. % ???
%In this section we describes briefly the current state of Dino as a
%programming language. This article is not about the formal Dino
%definition and there is a possibility that informal introduction omits
%many details and might lead to numerous questions to the reader.
%However, we just need a quick start up to get high-level language
%understanding. It is necessary to see where the language stands up
%between other dynamic languages and understanding how its
%implementation methods are applicable to the other languages.
% XXX maybe replace 'The best way' or add 'In our opinion, the best way to proceed'
The best way to proceed will be to give a series of small example programs. To start with, the following is a
Dino implementation of the Sieve of Eratosthenes:
%The best way to do this is to study it by small examples. Here is the
%first program on Dino -- Eratosthenes sieve:
{\footnotesize
\begin{verbatim}
1. val SieveSize = 8191;
2. var i, prime, k, count = 0, flags = [SieveSize : 1];
3. for (i = 0; i < SieveSize; i++)
4. if (flags[i]) {
5. prime = i + i + 3;
6. k = i + prime;
7. for (;;) {
8. if (k >= SieveSize)
9. break;
10. flags[k] = 0;
11. k += prime;
12. }
13. count++;
14. }
15. putln (count);
\end{verbatim}
}
% TODOXXX Suggestion
An initial design goal was to make Dino resemble C whenever reasonable
in order to reduce the learning curve for programmers familiar with C.
%An initial design goal was to
%make Dino resemble C whenever reasonable, as {at the time Dino was first being developed} %most programmers worked using C.
According to a commonly used classification of programming languages,
Dino can therefore be thought of as a \emph{curly-bracket language}.
%The program looks close to C language. An original design goal was to
%make Dino close to C as at that time most programmers worked using C.
%So Dino is a \emph{curly-bracket language} according to one used classification
%of programming languages.
Variables in Dino can hold values of any type, so Dino
is a \emph{dynamically typed language}. The scalar value types
include \emph{char} (Unicode character),
\emph{integer} (64-bit signed integers),
\emph{long} (multi-precision integers) and
\emph{float}\footnote{Dino floats are IEEE 754-1985 double precison
floating point numbers.}.
% POSSIBLY
Dino types are themselves values\footnote{The type of type values is denoted by the keyword \emph{type}.}.
% WAS: The type of type values is also a value denoted by the keyword \emph{type}.
Variables in Dino should be declared (lines 1 and 2).
The declaration
scope of a variable starts from the declaration point and finishes at the end of
the block containing the declaration\footnote{A block is a series of statements and declarations enclosed in curly-brackets.}.
% declaration block\footnote{Block is a code inside curly-brackets.}.
It can finish earlier if the identifier is redeclared in the block.
The declaration scope includes nested blocks but excludes scopes of other
declarations with the same identifier in the nested
blocks\footnote{Scope rules in Dino were originally more relaxed,
permitting an identifier to be referenced before its declaration,
% TODOXXX VERIFY
but
they were changed to the current ones when an interactive REPL environment
was implemented.}.
%blocks\footnote{Originally, scope rules in Dino were more relaxed, but
%they were changed to the current ones as an interactive environment,
%REPL, was implemented.}.
If a variable is declared by a \emph{val} declaration (line 1), then it is a constant
whose value cannot be changed.
\subsection{Array, Slices, and Tables}
The structured value types include \emph{arrays}. In the above
example, the variable {\tt flags} holds an array of integers of size {\tt
SieveSize}. Each element of the array is initialized to integer value 1
(line 2). Different elements of an array can hold values of different
types. Arrays are \emph{mutable} by default, i.e. their elements can be
modified, removed, and new elements can be inserted. An array can be
transformed to become \emph{immutable} by a special operation.
\emph{Slices} of an array can be referenced and be used as operands of some operations.
For example, the Sieve of Eratosthenes program from the previous section can be rewritten using slices:
{\footnotesize
\begin{verbatim}
1. val flags = [SieveSize : 1];
2. var i, prime, count = 0, SieveSize = 8191;
3. for (i = 0; i < SieveSize; i++)
4. if (flags[i]) {
5. prime = i + i + 3;
6. flags[i + prime:SieveSize:prime] = 0;
7. count++;
8. }
9. putln (count);
\end{verbatim}
}
Line 6 in the above example contains an array slice. This slice refers to elements
at indices starting from {\tt i+prime} up to {\tt Sievesize} (exclusive)
with step {\tt prime}.
While array elements are referenced by integer indices,
Dino also includes the associative array type \emph{table}
whose elements are referenced by \emph{keys} that can be arbitrary values. % TODOXXX FAULKNER 'that'/'which' can be arbitrary values -- CHECK THROUGHOUT
%If array elements are accessed by their integer order numbers,
%\emph{indexes}, the tables in Dino, accessed by any value, called
%\emph{key}. The table values can look like
Table values can be built using the {\tt tab []} constructor:
{\footnotesize
\begin{verbatim}
1. tab ["string key" : 10.0
2. [1, 2] : 2,
3. tab [3, 4] : [1, 2]]
\end{verbatim}
}
The table element with value {\tt 10.0} is stored under a string key,
while an array and another table are used as keys for the table elements in lines 2 and 3 respectively.
%while other table elements
%are stored under an array and another table as their keys.
As in the case of arrays,
tables can be mutable or immutable, and elements of mutable tables can
be modified, added and removed.
\subsection{Functions, Closures, and Fibers}
Functions in Dino are first class values. In other words, they
can be assigned to variables, passed as parameters, returned as
results from other functions and be stored within data structures. Here
are some example function declarations and function calls:
{\footnotesize
\begin{verbatim}
1. fun even;
2. fun odd (i) {i == 0 ? 0 : even (i - 1);}
3. fun even (i) {i == 0 ? 1 : odd (i - 1);}
4 putln (odd (1000000));
5.
7. filter (fun (a) {a > 0;}, v);
8. fold (fun (a, b) {a * b;}, v, 1);
9.
10. fun incr (base) {fun (incr_val) {base + incr_val;}}
\end{verbatim}
}
According to Dino scope rules, a declaration should be present before any usage of the
declared identifier. Thus, in the case of mutually recursive functions, we need to include
a forward declaration. Lines 1-4 of the above example provide an illustration.
%Lines 1-4 of the example present a pair of mutually recursive functions.
%In this case we need a forward declaration (line 1) as, according to
%Dino scope rules, a declaration should be present before any usage of the
%declared identifier.
Lines 7 and 8 provide an example of the use of \emph{anonymous} functions. We pass
the anonymous functions as parameters to the {\tt filter}
and {\tt fold} functions\footnote{Functions {\tt filter} and {\tt fold} are defined in the standard
Dino environment.}.
Function values in Dino always exist within a context, as the
functions can refer to outside declarations. A function value considered along with its
context is called a \emph{closure}. The last line in the above example
illustrates the use of closures. The function {\tt incr} returns an anonymous
function which in turn returns the sum of its parameter {\tt incr\_val} and the parameter {\tt
base} given to the corresponding call of {\tt incr}.
%Function values in Dino always exist in a context, as the
%functions can refer to outside declarations. Function values with its
%context are called \emph{closures}. The last line in the example
%illustrates the use of closures. The function {\tt incr} returns an anonymous
%function which returns sum of its parameter and value of parameter {\tt
%base} of the corresponding call of function {\tt incr}.
A \emph{fiber} in Dino is a function which executes concurrently from
its point of invocation. A fiber call being executed is called a
\emph{thread}. Here is an example:
{\footnotesize
\begin{verbatim}
1. fiber t (n) {for (var i = 0; i < n; i++) putln (i);}
2. t(100); // the following code will not wait for t finish
3. for (var i = 0; i < 1000; i++) putln ("main", i);
\end{verbatim}
}
% TODOXXX CHECK \emph{}, \tt, and `' for keywords
To synchronize threads, Dino provides a basic {\tt wait} statement:
%To synchronize threads, Dino has a low level wait-statement:
% TODOXXX NEED TO GIVE AN EXAMPLE OR CLARIFY
{\footnotesize
\begin{verbatim}
wait (cond) [stmt];
\end{verbatim}
}
% XXX CHECK CORRECTNESS OF THE BELOW DESCRIPTION
Thread is concurrent but not parallel. In other words, the multithreading
is implemented by so-called \emph{green threads}, with the Dino interpreter alternately executing statements from different threads\footnote{The multithreading system in Dino
has existed without any change since its very first version,
as the original application of Dino (scripting dinosaur movements in a simulation)
did not require any more complex form of parallelism.
There are plans
to make threads parallel, e.g. by implementing them as OS-level threads.
This will likely change the semantics of Dino threads in the future.}.
Simple statements, e.g. statements that do not contain function calls, are atomic, i.e. cannot be interrupted by another thread.
%Thread execution is concurrent but not parallel, in other words the threads
%are implemented by so called \emph{green threads}. Simple statements,
%e.g. assignments without calls, are atomic, i.e. can not be
%interrupted by another thread execution\footnote{Threads in Dino
%exists since its very first version without any change as the first
%usage of Dino was to describe dinosaurs movements. There are plans
%to make threads parallel, e.g. implementing by OS-threads and this
%probably will change semantics of Dino threads.}.
\subsection{Classes and Objects}
Some programming languages attempt to unify the functional and
object-oriented approaches. For example, in Scala a function is just
another kind of object,
and a function call invokes the method {\tt apply} on the
corresponding object.
% TODOXXX MAY NEED TO CLARIFY == rewrite to 'block instance', 'this'
Dino takes a bit of a different approach to unifying the two concepts.
A class in Dino is defined as % <-- NOTE!
a special
kind of function which returns an entity called a \emph{block instance}\footnote{We prefer to use the
term \emph{block instance} instead of the widely used `activation record' or `stack frame'
as an activation record is usually allocated on the call stack, whereas a Dino block
instance is allocated in the heap.} representing the created
object. The inner declarations of the block instance are publicly visible by default.
Here are some example class declarations:
{\footnotesize
\begin{verbatim}
1. class num (i) {fun print {put (i);}}
2. class binop (l, r) {
3. fun print_op;
4. fun print {l.print(); print_op (); r.print ();}
5. }
\end{verbatim}
}
Line 1 contains the declaration of a class {\tt num} with one function {\tt print} and one public variable {\tt i} whose value is initialized by the call to the class. Lines 2 to 5
contain the declaration of an abstract class {\tt binop} with two variables {\tt
l} and {\tt r}, a defined function {\tt print} and a function {\tt print\_op} which declared but not yet defined.
Dino has a powerful class/function composition operator
\emph{use} which can be used to \emph{inlay} declarations from one class into another.
% TODOXXX EXPLAIN 'inlay' HERE; The concept of inlaying can be used to emulate
This operator can be used to emulate \emph{inheritance} (including multiple inheritance), \emph{traits}, and
\emph{dynamic dispatching}.
Here is a continuation of the above example:
{\footnotesize
\begin{verbatim}
6. class add (l, r) {
7. use binop former l, r later print_op;
8. fun print_op {put (" + ");}
9. }
\end{verbatim}
}
Line 7 contains a \emph{use}-clause which inlays the
declarations (functions and variables) from class {\tt binop}, making them available within the scope of {\tt add}.
The \emph{former}-clause overrides the {\tt l} and {\tt r} variables from {\tt binop}
with the {\tt l} and {\tt r} variables defined earlier in the declaration of {\tt add}.
Likewise, the \emph{later}-clause overrides the {\tt print\_op} function inlaid from {\tt binop} with the {\tt print\_op} function defined later in the declaration of {\tt add}.
%Line 2 means inclusion of declarations of class {\tt binop} but
%instead of {\tt l} and {\tt r} from {\tt binop} usage of declarations
%with the same names from class {\tt add} defined earlier and instead
%of {\tt print\_op} from {\tt binop} usage of one defined in {\tt add}
%after the use operator.
The \emph{use}-clause provides a safe and powerful way to support object
oriented programming. It has the following semantics:
\begin{enumerate}
\item Declarations from the class mentioned by the \emph{use}-clause are inlaid into the current class.
\item Declarations from the current class that occur before the \emph{use}-clause override inlaid declarations mentioned in the \emph{former}-clause.
\item Declarations from the current class that occur after the \emph{use}-clause override inlaid declarations mentioned in the \emph{later}-clause.
\item Declarations in the current class must \emph{match} any inlaid declarations that they override.\footnote{An
exact definition of what it means for two declarations to match
is not given here. An example
of matching declarations is a pair of functions with the same number of parameters.}
\item A declaration from the original class mentioned by the \emph{use}-clause can be \emph{renamed} and thus made available in the current class instead of simply being overridden.
\end{enumerate}
%\begin{enumerate}
% \item Declarations of class mentioned in \emph{use} are inlayed.
% \item Declarations before the use rewrite corresponding inlayed declarations mentioned in \emph{former}-clause.
% \item Declarations after the use rewrite corresponding inserted declarations mentioned in \emph{later}-clause.
% \item The declarations should be \emph{matched}\footnote{The
% exact definition of matching is not given here. An example
% of matched declarations is Dino functions with the same number of parameters.}.
% \item The original and new declarations should be \emph{present} if they are given in former- or later-clause.
% \item The original declaration can be \emph{renamed} and still used instead of just rewriting if necessary.
%\end{enumerate}
A class that references another class via a \emph{use}-clause becomes a \emph{subtype} of the referenced class. The subtyping relation is transitive.
To test subtyping of a class or object, the standard function \emph{isa}
can be used\footnote{Pattern matching can also be used for this purpose.}:
%To check sub-typing of class or object, standard function \emph{isa}
%can be used\footnote{Pattern matching also can be used for these purposes.}:
{\footnotesize
\begin{verbatim}
1. isa (add, binop);
2. isa (add (num (1), num (2)), binop);
\end{verbatim}
}
Dino provides syntactic sugar for declaring a singleton, which functions as
an abbreviation for declaring an anonymous class, creating an object of the class
and assigning it to a variable declared using \emph{val}:
%Dino provides a syntactic sugar for singleton description which is
%simply abbreviation of anonymous class, corresponding object creation,
%and assigning it to the val variable.
{\footnotesize
\begin{verbatim}
1. obj coords {
2. var x = 450, y = -200;
3. }
\end{verbatim}
}
\subsection{Pattern Matching}
Pattern matching is a useful instrument for distinguishing and decomposing
values and extracting their parts into variables.
Dino's pattern matching mechanism can
be used in a variable declaration or in a \emph{pmatch-statement}.
The following is an example of pattern matching in a declaration:
{\footnotesize
\begin{verbatim}
1. try {
2. var [a, b, 2, ...] = v;
3. ...
4. } catch (patternmatch) {
5. putln ("your assumption is wrong");
6. }
\end{verbatim}
}
The declaration on line 2 checks that the value of {\tt v} is an array of at
least 3 elements and that the third element of {\tt v} is equal to 2. If this
is not the case, an exception will occur.
The values of the first and second elements of {\tt v} will be assigned to newly declared variables {\tt a} and {\tt b} respectively.
%The declaration on line 2 means that {\tt v} value should be an array of at
%least 3 elements and the third element should be equal to 2. If this
%is not true, an exception will occur and processed if necessary which
%is demonstrated in the example. Value of the first and the
%second elements will be assigned correspondingly to newly declared
%variables {\tt a} and {\tt b}.
The following is an example of pattern matching in a pmatch-statement:
{\footnotesize
\begin{verbatim}
1. pmatch (v) {
2. case [...]: putln ("array"); continue;
3. case [a, ...]:
4. if (a == 0) break;
5. putln ("array with non-zero 1st element");
6. case node (v) if v != 0:
7. putln ("object of class node with nozero value");
8. case _: putln ("anything else");
9. }
\end{verbatim}
}
%Instead of throwing an exception in the case of a failed match,
The pmatch-statement tries to match
a value with the patterns in the case clauses and
executes the code corresponding to the first matched pattern. The scope
of variables defined in the pattern extends to statements within the case clause.
A \emph{continue} statement within a case clause means that pattern matching should resume and continue to the subsequent case clauses. A \emph{break} statement exits the pmatch
statement. There is an implicit break at the end of each case clause.
The above example illustrates the possibilities of the pmatch statement, but
the program used for the example is artificial. The following is a more
realistic program using classes and pattern matching:
%The previous example represents possibilities of pmatch statement but
%the example itself is artificial. The following example is more
%of a real-life program using classes and functions and pattern matching:
{\scriptsize
\begin{verbatim}
1. class tree {}
2. class leaf (i) {use tree;}
3. class node (l, r) {use tree;}
4. fun exists_leaf (test, t) {
5. pmatch (t) {
6. case leaf (v): test (v);
7. case node (l, r):
8. exists_leaf (test, l) || exists_leaf (test, r);
9. }
10. }
11. fun has_odd_leaf (t) {
12. exists_leaf (fun (n) {type (n) == int && n % 2 == 1;}, t);
13. }
\end{verbatim}
}
\subsection{Standard Environment}
The Dino standard environment contains a variety of built-in functions
(input/output, higher order functions, regexp functions, functions for
array/table manipulations, etc.) and classes (mostly describing
various exceptions).
This environment is small in comparison with the standard libraries of popular dynamic
languages. The creation of a full set of standard libraries requires a lot of
efforts and has not been a priority as Dino was more of a research project for a long time. That said, there
is one unique feature in the Dino standard environment: a
predefined class for the creation of parsers for programming languages or
natural languages.
The class implements an enhanced \emph{Earley parser} algorithm with
\emph{simple syntax directed translation}.
It can parse according to ambiguous grammars, producing either a compact representation of \emph{all} possible parse trees or a \emph{minimal cost} parsing tree when costs are assigned to the grammar rules. % TODOXXX comma , before 'when'?
%It can parse according to
%ambiguous grammars producing a compact representation of \emph{all} possible
%parse trees or \emph{minimal cost} parsing trees if costs of the grammar
%rules are defined.
The parser can perform syntax recovery, finding the \emph{minimal}
number of ignored tokens which produces a correct abstract syntax tree (AST). It is also fairly fast, being capable of parsing about 250,000 lines of C
language code per second on modern CPUs\footnote{The implementation of Earley's parser used in Dino achieves 255,700 lines per second when parsing a preprocessed 67,000 line C file on 4.2GHz Intel i7-4790K. Peak memory usage for parser internal data was about 3.9MB.}.
%\footnote{Speed of Earley's parser on
% 4.2GHz Intel i7-4790k achieves 255,700 lines per second on a
% preprocessed 67,000 lines C file created from a big Msta source file
% concatenated many times. To compile this file, peak memory usage
% for the parser internal data was about 3.9MB.}.
Here is an example where the Earley's parser built into Dino is used to parse a tiny programming language:
%Here is an example
%how to use it to parse a tiny programming language:
{\scriptsize
\begin{verbatim}
expose yaep.*;
val grammar = "TERM ident=301, num=302, if=303,
then=304, for=305, do=307, var=308;
program = program stmt # list (0 1)
stmt = ident '=' expr ';' # asgn (0 2)
| if expr then stmt else stmt # if (1 3 5)
| for ident '=' expr expr do stmt # for (1 3 4 6)
| '{' program '}' # block (1)
| var ident ';' # var (1)
| error
expr = expr '+' factor # plus (0 2)
factor = factor '*' term # mult (0 2)
term = ident # 0
| '(' expr ')' # 1";
val p = parser (); // create an Earley parser
p.set_grammar (grammar); // set grammar
fun syntax_error; // forward decl of error reporting func
val abstract_tree = p.parse (token_vector, syntax_error);
\end{verbatim}
}
\begin{center}
\begin{figure*}[ht]
%\onecolumn
\begin{center}
\includegraphics[scale=0.5]{Dino_Flow.eps}
\end{center}
\caption{Overall Dino data flow}
\label{fig:Dino-flow}
\end{figure*}
\end{center}
\section{Implementation}
Unfortunately, the article size does not permit a description
of all design decisions, algorithms and optimizations in full detail.
Only a high level overview is given, and only some important elements
of the implementation are described in more detail.
% Unfortunately, the article size does not permit to describe all
%design decisions, used implementation algorithms, and optimizations
%in full details. Only a high level description of them is given
%here. And only some important ones are described later in more
%details.
Figure \ref{fig:Dino-flow} represents the general structure of the current Dino
implementation\cite{Makarov} and the flow of data through it.
Dino interpreter can be used in one of two modes. The first mode is a read-eval-print
loop (REPL) where a completed statement undergoes all
processing stages including execution. % was 'implementation'.
The second mode is the standard batch mode, where the interpreter
first processes all program files to generate byte code of the entire
program and then executes the byte code. Program byte code can
be saved in a readable form, modified and read back for execution.
The interpreter includes a function-level JIT compiler which is implemented
with the aid of a C compiler.
The Dino program can load and execute PIC object files created from
C code using a foreign function interface.
The interpreter uses a memory heap with garbage collection. The
heap can be automatically extended by demand. The interpreter performs a
simple escape analysis to transform heap allocations into stack ones.
The garbage collector uses a combination
% of the Mark and Sweep and fast Mark and Copy algorithms % TODOXXX check
of the mark-and-sweep and the fast mark-and-copy strategies
to prevent heap fragmentation and
decrease program memory requirements \cite{GCH}.
The first implementation of associative tables in Dino was based on classical hash
tables with buckets. The implementation was later changed to
resizable hash tables without buckets\footnote{The hash function used in Dino is based on MurMur hashing \cite{Appleby}.}. % TODOXXX CHECK FOOTNOTE
Conflict resolution is
achieved by secondary hashing. Such an approach permits to decrease
pointer chasing and results in more compact hash table
representations and better data locality.
% The first Dino associative table implementation were classical hash
%tables with buckets. The implementation was changed later by
%resizable hash tables without buckets. The conflict resolution is
%achieved by secondary hashing. Such approach permits to decrease
%pointer chasing and results in more compact hash table
%representations and better data locality.
With the right choice
of maximal load factor
for the hash table, the implementation using secondary hashing compared to the implementation using
buckets can be 1.5 times faster on modern processors\footnote{Both hash table implementations provide the same
functionality and approximately the same overall table sizes and
rate of growth.
Hashing benchmarks from the old version of the computer language
shootout were used for the comparison \cite{Benchmarks}.}.
% XXX An analogous approach is used for the hash tables in the GCC compiler. % -- for this version, it is not clear why the GCC connection is mentioned.
% XXX ALTERNATIVE: The secondary hashing approach is analogous to the one used in the GCC compiler.
The hash table implementation based on secondary hashing was later adapted for use in the GCC compiler.
The Dino interpreter employs numerous optimizations to improve performance.
Below we discuss optimizations which are important from the performance point of view
or widely described in research literature.
%The most important optimizations from the performance point of view or widely
%described in a research literature are discussed later in the article.
\subsection{Byte Code Dispatch}
Although a lot of articles and discussions about dynamic language
implementations are focused on \emph{byte code dispatch}, in our experience % TODOXXX comma here?
the choice of dispatch
method was found to be relatively unimportant for the performance of the Dino implementation.
% A lot of articles and discussions about dynamic language
%implementations are devoted to \emph{byte code dispatch}. The dispatch is
%actually found to be less important in Dino implementation.
In the \emph{direct threaded code} method, each byte code
instruction contains the address of the code responsible for execution of
the next byte code instruction \cite{Bell} \cite{Ertl}. For example, the Ruby VM
%defaults to using direct threaded code \cite{Sasada}.
uses direct threaded code by default \cite{Sasada}.
Although the code required for dispatch is very simple (read the address and jump to it), implementing the direct threaded strategy requires a nonstandard extension to the C language which allows labels to be treated as values \cite{GCC}. The use of this extension prevents the compiler from performing some optimizations on the interpreter code (e.g. predictive commoning).
The \emph{direct dispatch} method is based on a standard C switch
statement. Each case of the switch statement contains an implementation of a different byte code instruction. Dino implementation
experience shows that a \emph{proper} direct dispatch implementation is
more efficient on modern architectures. By `proper', we mean that the compiler
should be forced to generate optimized code for the switch statement
if the number of all byte code intructions is less than 256\footnote{For example, it is possible to force the GCC compiler to omit range checking and zero extension of the opcode value. The resulting code for x86/x86-64 has less than half the instructions compared to the code generated for a switch statement by default.}. The optimization results in the code for direct dispatch being compiled to a number of machine instructions comparable to that of the machine code for direct threaded dispatch.
To take advantage of hardware branch prediction, some authors proposed
to replicate instructions or to make a direct switch
at the end of each code responsible for executing a byte code instruction \cite{Casey}. This
might work well when we have a small benchmark that takes advantage of the hot path,
but in most real programs the bytecode will generally have different instructions adjacent to one another. % one type of instruction after another.
The larger and more complex the benchmarks, and the more different types of
byte code instructions the VM provides, the worse this approach will perform.
%The bigger and more complex benchmarks and the more different types of
%byte code instructions we have\footnote{Dino implementation uses more
% 250 byte code instructions.}, the worse result we have.
The performance rapidly becomes worse than that of direct dispatch.
%The
%performance becomes very soon worse than one for the direct
%dispatch.
In early Dino implementation, the use of a proper direct dispatch implementation improved
code performance for a simple loop benchmark up to 3\% (on a modern
x86-64 architecture) compared to the next-best direct threaded
dispatch method. Unfortunately, the latest Dino implementation uses more
than 256 byte code instructions. Therefore we started to use the direct threaded code
method as the best possible dispatch method for such number of byte
code instructions.
% For Dino the proper direct dispatch implementation improves code
% up to 3\% on a simple loop benchmark on modern x86-64 architecture over
% the next best direct threaded dispatch.
A much greater performance improvement can be reached by focusing on other
issues than the choice of byte code dispatch. The most important aspect of the design
is choosing the right byte code architecture for efficient execution.
% Much bigger improvement than choosing the right byte code dispatch
%can be reached by other solutions. And the most important one is to
%choose the right byte code design for execution.
\subsection{Virtual Machine and Bytecode Design}
\emph{Stack-based} (SB) virtual machines (VMs) are a fairly popular design for
implementation of dynamic language interpreters. For example, the Ruby
VM uses a stack-based architecture \cite{Sasada} \cite{Shaughnessy}. Stack based byte
code is compact and simple.
The previous implementation of Dino VM used a stack-based design as well.
The current implementation is based on a \emph{register transfer
language} (RTL). Changing the virtual machine architecture from a stack-based to
a register-transfer-language VM architecture improved code
%The transfer from SB VM to RTL one improved code
performance up to \textbf{3} times,
which is bigger than the speedup reported
in \cite{Shi}\footnote{There are many RTL design details
which could influence performance
and may account for the difference in speedup.
For example, the addressing system in Dino bytecode
allows an instruction to address both local and temporary variables as well as
variables in global scope.}.
The main advantage of RTL is a reduced number of instructions, as one RTL instruction can correspond to 3 or more SB
byte code instructions. In many cases, the reduced number of instructions decreases
the overhead of byte code dispatch and reduces the amount of operand shuffling and
type checking. Moreover, there is greater scope for
applying compiler optimizations to the bytecode interpreter itself
if it is based on an RTL rather than an SB design.
%RTL instruction execution decreases byte code dispatch
%overhead, decreases number of operand shuffling and operand type
%checking in many cases, and permit a bigger scope for use of compiler
%optimizations on the code implementing the instruction.
The benefit of RTL is even greater when we take into account the ability to
perform optimizations on the byte code, including instruction combining, instruction specialization, and colour based variable allocation. These optimizations are
very difficult or even impossible to implement for SB virtual machines.
% The importance of RTL increases even more when other optimizations
%including instruction combining, instruction specializations, or colour based
%variable allocation are taken into account. Such optimizations are
%impossible or hard to implement for SB machines.
The current Dino byte code format includes multi-operand instructions
with 1-5 operands (usually 3), control flow instructions (blocks, branches, calls, etc.),
as well as instructions representing declarations such as vdecl (variable declaration)
and fdecl (function, class, and fiber declarations).
% The current Dino byte code consists of declarations including vdecl
%(variable declaration) and fdecl (function, class, thread
%declarations), control flow instructions (blocks, branches, calls etc) and
%multi-operand instructions with 1-5 operands (usually 3 operands).
Dino byte code has two representations: an in-memory format for execution
and a human-readable representation which can be output, modified, and input
back by the interpreter for further execution.
The Dino code
{\footnotesize
\begin{verbatim}
var i, n = 1000;
for (i = 0; i < n; i++);
\end{verbatim}
}
is compiled to byte code which includes the following segments (given in human-readable representation):
% has the following excerpts of readable code representation:
{\scriptsize
\begin{verbatim}
0 block fn="ex.d" ln=1 pos=1 next=730 vars_num=29 tvars_num=3
...
372 vdecl ... ident=i ident_num=268 decl_scope=0 var_num=27
373 vdecl ... ident=n ident_num=269 decl_scope=0 var_num=28
...
788 ldi ... op1=28 op2=1000 // 28 <- i1000
789 ldi ... next=791 op1=27 op2=0 // 27 <- i0
790 btltinc ... next=792 op1=27 bcmp_op2=28 bcmp_res=29 pc=790
791 btlt ... op1=27 bcmp_op2=28 bcmp_res=29 pc=790
792 bend ...
\end{verbatim}
}
\subsection{Optimizations}
Choosing the right byte code representation is
crucial, but there are other techniques % TODOXXX REMOVE?? and optimizations
which are important for attaining good performance.
\subsubsection{Byte Code Combining}
Reducing the number of executed instructions provides a performance advantage,
due to reduced dynamic dispatch overhead and an
increased scope for applying compiler optimizations when building the Dino bytecode interpreter. The use of RTL representation is one way of achieving this.
% The advantage of RTL byte code is to decrease the number of executed
%instructions and as a result to decrease dynamic dispatch overhead and to
%increase scope for compiler optimization.
The same result can also be achieved by creating new bytecode instructions and using them to
replace frequently executed chains of operations. Such an optimization
is performed by the Dino interpreter. An analogous approach can be found in
\cite{Abdelrahman}, where this optimization is called \emph{concatenation}, or in
\cite{Casey}, where it is called \emph{combining instructions into superinstructions}.
% The same result can be achieved by creating new instructions which
%implement frequently executed chains of instructions. Such optimization
%is done by Dino interpreter. The analogous approaches can be found in
%\cite{Abdelrahman}, where it is called \emph{concatenation}, or in
%\cite{Casey}, where it is called \emph{superinstruction}.
% The following example of transformations of empty loop byte code
%illustrates how it is done in Dino interpreter:
The following example illustrates how the Dino interpreter transforms byte code
corresponding to an empty for loop:
{\scriptsize
\begin{verbatim}
label: addi op1, op1, i1; lt res, op1, op2; bt res, label =>
label: addi op1, op1, i1; blt res, op1, op2, label =>
label: btltinc op1, op2, i1, res, label
\end{verbatim}
}
Byte code combining improves the performance of the model benchmarks described in section \ref{model_benchmarks} by about 1.5 times.
% Byte code combining improves the performance in 1.5 times for some
%model benchmark described later.
\subsubsection{Pure Function Calls}
A function is called pure if it has no side effects and
its result always depends only on its argument values.
When a pure function is called many times with
the same arguments, we don't need to calculate its result again.
Dino interpreter can save values of arguments and corresponding
results of pure function calls and then reuse the results instead of
performing the calculations again. This optimization can improve the performance
of some benchmarks such as factorial or Fibonacci by an order of magnitude.
\subsubsection{Just-in-time Compilation}
Performance of frequently executed code can be improved by just-in-time
compilation (JIT) to machine code. There are a number of approaches to selecting which unit of
code to compile. The most frequently used approach is \emph{trace-based} selection.
Examples of trace-based JIT implementations include the PyPy implementation of Python and
most widely used JavaScript implementations.
JIT compilation can be implemented through a specialized JIT framework
(e.g. JVM\footnote{If JVM is used for the implementation, the bytecode obviously has to be Java bytecode.})
%\footnote{Obviously, a JVM JIT framework can only be used to compile Java bytecode.})
or by using more general compiler frameworks such as LLVM or the GCC JIT plugin \cite{Lattner} \cite{Malcolm}.
The Dino interpreter uses function-level JIT. Function compilation is
triggered by the first call. The function's byte code is translated
into C function code.
For each type of byte code instruction there is a corresponding
C inline function. When building the Dino interpreter, these functions
are copied to a C-preprocessed file, which is included in the generated C
code during JIT compilation\footnote{The C inline functions implementing the byte code instructions
are declared \emph{static}. Thus, GCC does not
optimize or generate separate machine code for these functions;
they are inlined only if they are used.
This considerably increases the speed of JIT compilation.
Another important factor that increases the compilation speed
is the fact that the preprocessed file contains only declarations that
are used by the inline functions, rather than the full set of declarations
from the Dino interpreter.
%Another
%important way to increase the compilation speed is to ensure that the preprocessed file
%contains only declarations
%that are used by the inline functions.
% This is done by generating the preprocessed file when building the Dino interpreter. A small Dino script is used to copy the necessary declarations out of the main interpreter codebase.
% A small Dino script removes 3713 such declarations out of 4018 all declarations.
}. Thus, the
C translation of a function's byte code consists mostly of calls to
these functions.
%Each byte code instruction has a
%corresponding inline function in the interpreter. So the
%translation of a function byte code is to write mostly calls of
%corresponding C functions. The necessary inline C functions
%contained in a C preprocessed file are added to generated C
A C compiler is called with the generated C code provided through a UNIX pipe. The
C compiler produces a position independent code (PIC) object file which is
stored in the operating system's standard temporary directory\footnote{These days, UNIX systems usually store their temporary directory in memory.}.
After successful compilation, the interpreter loads the
object file and calls the compiled function.
This approach has many advantages. First of all, it results in a
simple and portable JIT implementation. In the Dino interpreter, the JIT implementation
required less than 100 new lines of C code, and can be used with any native C compiler.
As the C language has a stable and well-defined standard,
Dino's portable JIT implementation will continue to work in the future.
On the other hand, if we had used the special-purpose interfaces provided
by LLVM and GCC for JIT compilation, we would potentially need to update the interpreter
in order to be compatible with future compiler versions.
Machine code generation by passing a C source file to the GCC frontend gives only a slight performance penalty compared to generation
%Machine code generation using the GCC frontend is only a bit slower than
using the GCC JIT plugin\footnote{Experiments with the JIT plugin benchmark from
GCC's testsuite show that compilation using the GCC JIT plugin is about
20\% faster than compiling from source code on a Xeon E5-2697 v3
machine.}. In fact, JIT compilation of a small Dino function takes only about
% a small function code generation takes about
50-70ms using GCC with -O3 on modern Intel CPUs\footnote{A heapsort function
was used for the measurement on a Xeon E5-2697 v3 machine.}.
\subsubsection{Type-inference and Byte code Type Specialization}
Dino is a dynamically typed programming language. Nevertheless, the types of
many expressions and operands can be recognized during compilation time.
Unfortunately, even the peak optimization modes of industrial C compilers used for JIT are not powerful enough to figure out the types of
byte code operands and to remove unnecessary operand type checking.