Skip to content

treeppl/tppl-benchmark-suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TreePPL Benchmark Suite

This repo contains the scripts necessary to run the benchmark suite for TreePPL. The purpose of this suite is to a) verify accuracy of inference and b) produce performance metrics like time and memory required to achieve that accuracy. The intent is also te be able to compare these metrics across different versions of the compiler and model implementations, which is the primary reason the suite is in a separate repository from the compiler.

We divide the scripts in two major categories: those that run experiments to produce data, and those that analyze that data. The former is a pair of fish scripts, while the latter is a small Python application.

Configuring and Running Experiments

The experiment runner is structured as two scripts:

  • collect/run.fish is the main entry point and defines a number of functions with which to define and configure tests. Can be run with, e.g., ./run.fish --help to see the available command line flags. This script is written to be quite general, it has no TreePPL specific functionality, though it is very much designed to be precisely what this particular suite needs.
  • collect/config.fish specifies the repositories to fetch and build (TreePPL and its direct dependencies), sets the environment variables necessary for the repositories to see each other, and then defines tests and what data to collect.

Typical usage is as follows:

# Run all tests, put data in `out.zip`
collect/run.fish out.zip --config-file collect/config.fish

Analyzing Collected Data

Analysis can be done by either:

  • Importing analyze/correctness.py or analyze/performance.py in a Python session and using the exposed functions, or
  • Running python3 analyze/correctness.py out.zip or python3 analyze/performance.py out.zip. Note that performance.py accepts an arbitrary number of archives to compare, while correctness.py expects a single archive.

The former is more flexible, while the latter gives a reasonable default analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published