Skip to content

Latest commit

 

History

History
246 lines (182 loc) · 9.9 KB

code-overview.md

File metadata and controls

246 lines (182 loc) · 9.9 KB

Code Overview

Project Structure

This repository contains three projects:

  • monads: a small library providing a number of monads used by the Sail backend.
  • nanosail: relies on monads and contains all of the actual translation logic.
  • sail_plugin: a very small project that imports nanosail and registers itself with Sail as a plugin.

Note regarding dependencies When Sail loads the plugin, it does not automatically also load the dependencies needed by this plugin. It is therefore necessary to statically link nanosail's dependencies. See the dune file for the plugin project (embed_in_plugin_libraries).

Project structure

Translation Overview

The translation takes place in two phases:

  • Sail is translated into nanosail, an intermediate language bridging Sail and muSail.
  • Nanosail is translated into muSail: user-provided templates describe which translation belongs where in which file.

This top level logic resides in the sail_plugin project.

Ast Module

The Ast module contains all definitions related to the nanosail intermediate language. When Sail and muSail disagree in their representations, nanosail tends to side with muSail.

  • Nanosail makes the distinction between expressions and statements the same way muSail does.
  • Nanosail keeps track of some extra information that cannot be directly translated to muSail, such as numeric expressions and constraints, which come in handy for monomorphization.
  • Similarly to Sail, a nanosail program (see Ast.Program.t) consists of a series of definitions. See Ast.Definition for which types of definitions exist.

Ast Module

Sail to Nanosail

All Sail to nanosail translation logic is grouped in the SailToNanosail module. To assist in the translation, we defined a monad TranslationContext, which is basically a combination of the State and the Result monads. It provides the following functionality:

  • It can generate unique identifiers using generate_unique_identifier and generate_unique_identifiers.
  • It keeps track of all translated definitions.
    • store_definition adds a new definition to the list.
    • lookup_definition/lookup_definition_opt looks for a definition that satisfies a predicate. Useful predicates are readily available in Ast.Definition.Select.
    • Other helper functions such as lookup_register_type, is_register, lookup_variant_by_constructor are also available.
  • It keeps track of calls made to polymorphic functions (register_polymorphic_function_call_type_arguments), which is useful for reporting them to the user so that they know which monomorphizations to ask for.
  • There are two types of failures: NotYetImplemented and Failure
    • A Failure causes the translation to be halted and should be used sparingly.
    • Each Sail definition is translated in turn. When a NotYetImplemented is signaled, the current definition being translated is given up on: a Ast.Definition.UntranslatedDefinition gets registered and the translation process moves on to the next definition.

SailToNanosail Module

Nanosail to muSail

All nanosail to muSail functionality can be found in the NanosailToSail module. Similarly to the first translation phase, we defined a monad, named GenerationContext, which provide us with some useful functionality:

  • It holds a list of all definitions gathered during the first phase. These can be looked up with helper functions such as select_definitions and lookup_definition_opt.
  • Translations can be annotated in multiple ways:
    • First, generated code can be annotated with information about which functions were involved in the generation of it (a bit like a stack trace). This information is not inherent to GenerationContext, but is functionality provided by Document, see below.
    • Generated code can be grouped in blocks. A block can be annotated with comments and annotations. See section below.

NanosailToMicrosail Module

Blocks

Generated code can be annotated with comments:

(* Some comments *)
Some generated code

This can be achieved by defining a block and making a call to add_comment inside of it:

GenerationContext.block begin
  let* () = add_comment @@ PP.string "Some comments"
  in
  PP.string "Some generated code"
end

add_comment will attach comments to the current block. Blocks can be nested. Comments and annotations are always attached to the current innermost block.

One can also add annotations, which are basically numbered comments:

(*
  Some comments
  
  [1] An annotation
  
  [2] Another annotation
*)
Some generated code, a reference to annotation 1 and annotation 2.

is generated by

GenerationContext.block begin
  let* () = add_comment @@ PP.string "Some comments"
  and* ref1 = GenerationContext.add_annotation @@ PP.string "An annotation"
  and* ref2 = GenerationContext.add_annotation @@ PP.string "Another annotation"
  in
  PP.format "Some generated code, a reference to annotation %d and annotation %2" ref1 ref2
end

These numbered annotations are often used to leave "not-yet-implemented holes" in the translation:

let translate_bool (b : bool) : PP.t GenerationContext.t =
  match b with
  | true  -> GenerationContext.return @@ P.string "true"
  | false -> GenerationContext.not_yet_implemented [%here]

Logging

The Logging module can be used to log messages. We currently support four levels (error, warning, info, debug). Using the environment variable VERBOSE one can choose up to which level log messages should be shown:

  • VERBOSE=0: quiet mode, no logging takes place.
  • VERBOSE=1: only errors are shown.
  • ...
  • VERBOSE=4: all messages are printed.

F-Expressions

OCaml does not provide a standard way of converting values to strings. Originally we intended to implement all kinds of string_of_ functions but the one dimensional nature of strings impeded readability.

F-Expressions are more structured than plain strings and allow for better pretty printing. Apart from integers, booleans, strings and lists, F-expressions have function application written Head[pos1, pos2, ..., keyword1=val1, keyword2=val2], .... See the FExpr module for more details.

We defined to_fexpr functions for most types, which can be helpful for debugging. A FExpr.pp_diff function is available to highlight differences between two F-expressions, which is useful while testing.

PP

The PP module offers limited pretty print functionality. Originally, the more powerful PPrint module was used, but we wanted to be able to annotate each generated piece of text with information about which functions were responsible for its generation. Later, we also added support for decoration: foreground/background colors, bold, underline, ...

horizontal [
  string "Charles";
  space;
  vertical [ string "Jason"; string "Henry" ]  
]

gives

Charles Jason
        Henri

StringOf Module

This module was originally meant to group all string_of_ functionality but ultimately ended up only providing this functionality for Sail types.

Most of the functionality has been taken straight from Libsail. Having our own module, however, allowed us to add string conversions for types neglected by Libsail (e.g. aval) and customize the conversion if we felt the Libsail implementation fell short.

For our own types, we preferred defining to_fexpr functions in their respective modules, e.g., Ast.Type.to_fexpr for nanosail types.

Coding Style

Below we justify our coding style, which may be considered to be rather unidiomatic.

No wildcards

We shied away from using wildcards in match expressions, instead opting to handle each case explicitly. Whenever a type is extended with a new case (something that was expected to occur frequently as development progressed), a simple rebuild would lead the compiler to point out all the locations in the code that required an update to deal with this new case.

Note This has somewhat been thwarted by what seems to be a bug in the OCaml compiler: it happened more than once that the build process simply got stuck until typing errors were fixed.

Module per Type

Many types are defined in their own personal module. We felt this led to much cleaner code:

  • No prefixes necessary for constructor: E_Variable but simply Variable.
  • Clearer names: Ast.Expression.Variable instead of E_Variable.
  • Typing context often allows us to omit the Ast.Expression. prefix in many cases such as in match patterns.
  • open Ast.Expression is always available if one gets tired of the long names.
  • Related functionality can be bundled in the same module, e.g., Ast.Variable.equal.
  • More consistent naming for related functions, i.e., all equality functions can be called equal.

Explicit equals

Given that the standard equality operator = cannot be customized for specific types, it seemed better to eschew it completely. Instead, we chose to define equal function for each type, accepting that large amounts boilerplate code is the cost for more robust code.

Explicit Typing

We chose to annotate most of our code with types.

  • In our opinion, it leads to clearer code.
  • It avoids issues with records with share field names.
  • Our functions often receive "too many" parameters, i.e., they were made to receive all information that was available, not just information that was actually required. We chose this approach due to the very incremental/iterative nature of the development process: the first implementation of a function would contain only very specific functionality, and it was unclear what information would be necessary when the function would later be fleshed out more. Having functions receive all available information from the get-go made it much easier when we later had to revisit the function to extend its functionality, as we had a clear view of all available puzzle pieces.