coAST is a universal abstract syntax tree that allows to easily analyze each programming language. Adding new languages should be easy and generic.
-
Describe languages using theoretical components, aimed at human comprehension, so that further understanding of concepts used by a language can be obtained by reading online resources rather than code.
-
Provide multiple usable levels of parse-ability, so that a file can be accurately split into parts which are not yet parse-able -- or the use case has no benefit in parsing -- and the parts may be modified and re-joined into an otherwise semantically equivalent file.
Performance and algorithmic beauty are not goals. Reversibility, like augeas, is not a goal, as that requires a Context Sensitive Tree.
To achieve the first goal, the primary output of this repository is a static website which allows the reader to understand the definitions contained there and link to other online resources where more information can be obtained.
Links to Wikidata, Antlr definitions, E(BNF) files, example files, will be integral components of the definitions there.
Terminology used to describe language components will be consistent across languages wherever possible, and defer to terminology used in academic literature or study guides to make these definitions more accessible and useful to students of language theory.
-
Organically grow a human-readable fact-based database of any syntax -- stored in YAML files -- covering any language, from large and complicated programming languages down to strings like URLs, especially focusing on style descriptions which describe a subset of a language.
-
Create programs to load these definitions and convert input files into a universal AST, primarily for building a test suite to verify that the language definitions are able to parse files at useful levels of detail. Again, focusing on style-defined subsets of languages, which are easier and also more useful.
These programs may use existing parsers by converting the coAST definitions into metasyntax used by other parsing toolkits, such as BNF and derivatives, Antlr .g4, and augeas.
-
Standardise the definition schema once a sufficiently large number of language definitions have been verified to determine that the schema can usefully describe most concepts found in commonly-used grammars.
These phases will be slightly overlapping.
The language definitions found at https://github.com/coala/coala/tree/master/coalib/bearlib/languages will be manually added as language definitions, growing the schema as necessary. Once the import of facts is complete, a generator will create the coala language definitions from a snapshot of the coAST language definitions, putting the collated coAST definitions into use.
There are many other collections of language definitions. Initially, the coAST definitions will only link to these external resources. In the second phase, those external grammars will be converted into coAST facts using batch import tools, or manually where necessary.
In this phase, tools to convert the coAST definitions into other syntax will be needed to roundtrip the language definitions. This will provide verification that the imports are complete, or that partial definitions allow correct partial parsing of those languages where complete parsing is too complex.
Create declarative descriptions of common styles, such as the Google Python coding guidelines and Airbnb JavaScript style.
The schema for describing styles will borrow from the coala aspects definitions, and should allow users to define their own custom styles. The priority, however, will be accurately describing well-established style guides, and important features of commonly used linters of various languages.
coala aspects development is driven by the needs of users, the complexity of bears, and the pre-existing implementation choices of coala.
To avoid causing incorrect design decisions in coAST, importing of aspects will not be considered until after style definitions are in place.
coAST is maintained by the coala community. Contact us on gitter!
The facts in this repository are inherently public domain, and are explicitly released under the CC-0 license. https://creativecommons.org/publicdomain/zero/1.0/
The website templates and assets included in this repository are released under the Creative Commons Share-alike license 4.0. https://creativecommons.org/licenses/by-sa/4.0/
Any code in this repository is to be released under the MIT license. https://opensource.org/licenses/MIT