Skip to content
nbros edited this page Sep 13, 2010 · 9 revisions

Current implementation

Camlp4 syntax extensions are currently handled in OcaIDE by:

  • annotating each source file with a comment that indicates if the file must be pre-processed, and with what syntax extension
  • calling the Camlp4 pre-processor before parsing the file, and making it output the result in standard O’Caml syntax (with location annotations as comments)
  • using the regular OcaIDE parser for parsing the Camlp4 output file
  • setting AST node locations to what was found in the Camlp4 output file’s comments (those added by Camlp4)

Limitations

This method works, but is extremely slow, compared to parsing an O’Caml file using the standard syntax with no extensions.
This is mostly due to the fact that each AST node is annotated by a comment containing the full file name and location in the file, making this file disproportionately big compared to the original unpreprocessed source.
Additionally, Camlp4 only works on syntactically correct inputs.

As a result, it is not usable for parsing files as they are being edited, in real time.

Possible solutions

Custom Camlp4 printer

The default Camlp4 printer is much too verbose. We don’t need all the information it spits out. In particular, the file’s full path which appears on every AST node’s location comment is superfluous. Creating a new printer which only outputs needed information in a terser format would save a lot of wasted space and time.

Parse on save + markers

To make the editor responsive in spite of the pre-processing slowness, make the following changes in the presence of a syntax extension:

  • don’t parse the file while it is being edited
  • only parse on save
  • instead of associating AST nodes to locations represented by integer offsets, associate them with Eclipse markers on the document. So, the markers are automatically moved around by Eclipse as the text is being edited, so that they stay in sync with the text.

This allows the outline and code navigation tools to work correctly (hyperlinks, etc.).

But this doesn’t allow contextual completion, since the Camlp4 parser doesn’t work on incomplete files.
Completion on modules can still be done though.

Pre-processing also in OcaIDE

Extension points could be added in OcaIDE, to allow users to plug-in custom pre-processors:

  • The pre-processor can choose to work on bare text, before lexing and parsing take place in OcaIDE
  • The pre-processor can alter the token stream after the lexing phase

These pre-processors would then duplicate the behavior of the syntax extension they choose to support.
This method would work well for simple syntax extensions, but could be cumbersome with elaborate ones.
This method would allow OcaIDE to work as well with modified syntaxes as with the original one.

  • Allow extension writers to totally replace the OcaIDE lexer and parser, and build the AST expected by OcaIDE.
    This is the most powerful method, but also possibly the hardest to implement by syntax extension providers. The OcaIDE lexing and parsing code can be used as a starting point though.

Use Camlp4 as main parser

Instead of considering Camlp4 merely as a pre-processing tool and parsing its output with a Java parser, parse directly using Camlp4.
This has the advantage of removing the overhead of parsing a second time.

Camlp4 can then output the AST in a binary format (-printer Camlp4AstDumper), which can be deserialized by an O’Caml program, which acts as an O’Caml code indexing server and can communicate the results to OcaIDE on demand.
An advantage is that O’Caml is more suited to the task of working with its AST than Java.

limitations

Camlp4 doesn’t produce an AST when a file has syntax errors, and it doesn’t try to recover from errors automatically, unlike the Beaver parser currently used.
So, all the IDE’s services which need the AST (outline, hyperlinks, …) would only work on error free files.

A way to work around this limitation is to memorize the information, and keep track of the locations of the AST nodes in the editor with the help of Eclipse markers. This information is then updated each time the file can be parsed successfully.

Syntax extension for completion

Implement a syntax extension for Camlp4, which allows to parse an input that terminates abruptly (at the completion point). This would allow contextual completion to work, as long as all the code before the completion point is syntactically correct.

Additional extensions

Add extension points:

  • to provide custom syntax coloring rules
  • to replace, add to, or remove from the list of existing O’Caml keywords
  • to customize auto edit strategies (as-you-type formatter), since the buffer on which these work is always un-preprocessed