This is a source code analysis framework where analysis logic is written in Datalog. The inputs are source code files (or directories) and the outputs are tab-separated tables.
When analyzing code, the analysis uses a "workspace" directory that
holds all required information. Class Driver
guides the pipeline of
parsing/analyzing the code.
The source code is parsed and the parse trees are output as Datalog "facts", i.e. input tables for the logic. These facts are generated by traversing the source code with an appropriate ANTLR parser. Currently, a few different source languages are supported, using the ANTLR parsers in the grammars-v4 repository.
The framework is parser-agnostic, it only assumes that the parser is
written in ANTLR4. Type ParserConfiguration
is populated with the
parsers currently supported. Every parser is reflectively instantiated
and traversed (by SchemaFinder
), to generate a parser-specific
"schema" (output as <workspace>/schema.dl
).
When the parser schema is established, the parser runs on the source and each parse rule/token is output as a tuple in an appropriately named relation. For tokens, position information is also generated.
Note that facts are written when all source code is traversed, to support running different parsers (each with its own schema) at the same time.
The facts are written as directory "facts" in the workspace directory.
The analysis logic is given as Datalog rules (in directory logic
)
that run using the Souffle Datalog
engine. The logic either
runs interpreted (the default) or compiled (see command-line options).
- The Datalog declarations for the input facts are found in generated
file
<workspace>/schema.dl
. Input facts are split per-language with a prefix, for example predicateisClassBody()
for Kotlin will be named asdb_KOTLIN.isClassBody()
. - Base logic, common for all languages is found in
logic/base-logic.dl
. These rules define basic features that are common across programming languages, such as function definitions, variables, and types. Relations in this file start withBASE_
. - For every supported language, file
logic/<language>-logic.dl
contains the Datalog rules for this language. These rules either populate relations from base-logic or compute their own relations. - To match patterns against paths in the parse tree, file
logic/patterns.dl
provides a set of C preprocessor macros.
The framework can output some its base relations as code metadata, so that it can be used as a language front end for visualization or code navigation. The output is in JSON format via the metadata-model library.