This repository contains 10 families of LTLf learning benchmarks. We include scripts to generate instances for all families, with many parameters allowing to generate instances with various number of traces and trace lengths.
There are over 15_000 learning problems, which makes for 2GB of files, by only 100MB in zipped format so we included the zip file as releaset asset.
The benchmark files are structured in JSON format, with each file representing a learning problem. Each problem provides a set of positive and negative traces under the keys positive_traces
and negative_traces
respectively. Each trace is a dictionary where keys are atomic propositions (e.g., "a0"
, "a1"
) and values are lists of binary values representing the truth assignment of the proposition at each step in the trace. The atomic_propositions
field explicitly lists all propositions used. Additional metadata such as number_traces
, number_positive_traces
, number_negative_traces
, max_length_traces
, and trace_type
provide summary statistics about the trace sets. Fields like smallest_known_formula
, generating_formula
, generating_seed
, original_repository
, name
, and parameters
offer further information about the origin and characteristics of the benchmark problem.
{
"positive_traces": [
{
"a0": [0, 1, 1, 0],
"a1": [0, 1, 0, 1]
}
],
"negative_traces": [
{
"a0": [1, 1, 1, 0],
"a1": [0, 1, 0, 1]
},
{
"a0": [0, 1, 1, 0],
"a1": [0, 1, 1, 1]
},
{
"a0": [0, 1, 1, 1],
"a1": [0, 1, 0, 1]
},
{
"a0": [0, 1, 1, 0],
"a1": [0, 1, 0, 0]
},
],
"atomic_propositions": ["a0", "a1"],
"number_atomic_propositions": 2,
"number_traces": 5,
"number_positive_traces": 1,
"number_negative_traces": 4,
"max_length_traces": 4,
"trace_type": "finite",
"smallest_known_formula": "",
"generating_formula": "",
"generating_seed": "",
"original_repository": "",
"name": "",
"parameters": {}
}
The main script is sample.py
. It takes as an input a text file (such as this one) containing a formula. It creates an LTLf learning instance by sampling a given number of positive and negative traces of a given length uniformly at random. It builds upon the approach developed in Scarlet, compiling the LTL formula into a deterministic automaton and using it for sampling traces.
-
Fixed_Formulas
includes:- a file
formulas.txt
where each line is a triple (formula, atomic_propositions, name). The list is extracted from Flie, which was inspired by Property specification patterns for finite-state verification by Dwyer, Avrunin, and Corbett. - instance files can be generated from the formulas in
formulas.txt
usingsample.py
. An instance is then calledfixed_formulas_trace_numbers=X_trace_length=Y.json
. We suggest the following parameters:X
: few = 5, much = 20, many = 100Y
: short = 16, medium = 32, long = 64
- a file
-
Generating_Instances
includes scripts namedgen_X.py
which take someX
-specific parameters and return an instance in JSON format:
Remark: we could first generate formulas using the scripts for these four families, and then use sample.py
to generate traces. The trace generation for each of these four families is specific and therefore builds harder instances.
Generating_Formulas
includes scripts namedgen_X.py
which take someX
-specific parameters and return a pair (formula, atomic_propositions):X = SingleCounter
finite-synthesisX = DoubleCounter
finite-synthesisX = Nim
finite-synthesisX = RandomConjunctsFromBasis
reproducing Symbolic LTLf Synthesis, inspired by Improved Automata Generation for Linear Temporal Logic