Skip to content

Latest commit

 

History

History
63 lines (49 loc) · 2.82 KB

ARCHITECTURE.md

File metadata and controls

63 lines (49 loc) · 2.82 KB

antlr2datalog

This is a source code analysis framework where analysis logic is written in Datalog. The inputs are source code files (or directories) and the outputs are tab-separated tables.

When analyzing code, the analysis uses a "workspace" directory that holds all required information. Class Driver guides the pipeline of parsing/analyzing the code.

Parsing and fact generation

The source code is parsed and the parse trees are output as Datalog "facts", i.e. input tables for the logic. These facts are generated by traversing the source code with an appropriate ANTLR parser. Currently, a few different source languages are supported, using the ANTLR parsers in the grammars-v4 repository.

The framework is parser-agnostic, it only assumes that the parser is written in ANTLR4. Type ParserConfiguration is populated with the parsers currently supported. Every parser is reflectively instantiated and traversed (by SchemaFinder), to generate a parser-specific "schema" (output as <workspace>/schema.dl).

When the parser schema is established, the parser runs on the source and each parse rule/token is output as a tuple in an appropriately named relation. For tokens, position information is also generated.

Note that facts are written when all source code is traversed, to support running different parsers (each with its own schema) at the same time.

The facts are written as directory "facts" in the workspace directory.

Datalog

The analysis logic is given as Datalog rules (in directory logic) that run using the Souffle Datalog engine. The logic either runs interpreted (the default) or compiled (see command-line options).

  • The Datalog declarations for the input facts are found in generated file <workspace>/schema.dl. Input facts are split per-language with a prefix, for example predicate isClassBody() for Kotlin will be named as db_KOTLIN.isClassBody().
  • Base logic, common for all languages is found in logic/base-logic.dl. These rules define basic features that are common across programming languages, such as function definitions, variables, and types. Relations in this file start with BASE_.
  • For every supported language, file logic/<language>-logic.dl contains the Datalog rules for this language. These rules either populate relations from base-logic or compute their own relations.
  • To match patterns against paths in the parse tree, file logic/patterns.dl provides a set of C preprocessor macros.

Source code metadata

The framework can output some its base relations as code metadata, so that it can be used as a language front end for visualization or code navigation. The output is in JSON format via the metadata-model library.