TreePPL Benchmark Suite

This repo contains the scripts necessary to run the benchmark suite for TreePPL. The purpose of this suite is to a) verify accuracy of inference and b) produce performance metrics like time and memory required to achieve that accuracy. The intent is also te be able to compare these metrics across different versions of the compiler and model implementations, which is the primary reason the suite is in a separate repository from the compiler.

We divide the scripts in two major categories: those that run experiments to produce data, and those that analyze that data. The former is a pair of fish scripts, while the latter is a small Python application.

Configuring and Running Experiments

The experiment runner is structured as two scripts:

collect/run.fish is the main entry point and defines a number of functions with which to define and configure tests. Can be run with, e.g., ./run.fish --help to see the available command line flags. This script is written to be quite general, it has no TreePPL specific functionality, though it is very much designed to be precisely what this particular suite needs.
collect/config.fish specifies the repositories to fetch and build (TreePPL and its direct dependencies), sets the environment variables necessary for the repositories to see each other, and then defines tests and what data to collect.

Typical usage is as follows:

# Run all tests, put data in `out.zip`
collect/run.fish out.zip --config-file collect/config.fish

Analyzing Collected Data

Analysis can be done by either:

Importing analyze/correctness.py or analyze/performance.py in a Python session and using the exposed functions, or
Running python3 analyze/correctness.py out.zip or python3 analyze/performance.py out.zip. Note that performance.py accepts an arbitrary number of archives to compare, while correctness.py expects a single archive.

The former is more flexible, while the latter gives a reasonable default analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
analyze		analyze
collect		collect
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TreePPL Benchmark Suite

Configuring and Running Experiments

Analyzing Collected Data

About

Uh oh!

Releases

Packages

Languages

treeppl/tppl-benchmark-suite

Folders and files

Latest commit

History

Repository files navigation

TreePPL Benchmark Suite

Configuring and Running Experiments

Analyzing Collected Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages