Skip to content

Conversation

btraven00
Copy link
Contributor

this script aggregates performance metrics and clustering scores from several runs.

it expects the backend and run timestamp to be encoded in the output folder, in the form:

out_BACKEND_TIMESTAMP

as produced by the Makefile in this repo.

the script can produce both csv and parquet dataframes.

it produces two different files to keep normalization of the data structure. some of the metadata are properties of the method leaves, and some others are properties of the metric leaves.

indexing by run ID (the timestamp), together with dataset x method x seed x metric should be enough to JOIN the two data frames, if needed.

this script aggregates performance metrics and clustering scores from
several runs.

it expects the backend and run timestamp to be encoded in the output
folder, in the form:

out_BACKEND_TIMESTAMP

as produced by the Makefile in this repo.

the script can produce both csv and parquet dataframes.

it produces two different files to keep normalization of the data
structure. some of the metadata are properties of the method leaves, and
some others are properties of the metric leaves.

indexing by run ID (the timestamp), together with dataset x method x
seed x metric should be enough to JOIN the two data frames, if needed.
@btraven00
Copy link
Contributor Author

Here just for documentation purposes, since this was used to produce the dataframes in #5

@btraven00 btraven00 self-assigned this Jun 24, 2025
@btraven00 btraven00 changed the title feat: add convenience aggregation script for multiple runs feat: add convenience aggregation script for multiple runs, closes #2 Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant