Skip to content

scorer.py with event detection #11

@francisco-liam

Description

@francisco-liam

scorer.py does not support event detection algorithms

Summary

scorer.py is built around comparing two continuous, time-aligned signals. This makes it unsuitable for evaluating algorithms whose output is a set of discrete events rather than a contiguous data stream (e.g., sit-to-stand transition detection, step detection, gesture recognition).

Current Behavior

scorer.py via resample() expects two CSV files, each containing:

  • A column of continuous numeric values (e.g., heart rate, SpO2)
  • A column of timestamps

It trims both files to their overlapping time range, resamples both to a common frequency using ebsig.periodize(), and then computes point-to-point correlation statistics (Pearson r, Spearman ρ, Kendall τ) and a Bland-Altman mean difference plot.

This pipeline assumes:

  1. The algorithm output is a dense, regularly-sampled signal
  2. Ground truth is also a dense, regularly-sampled signal from a reference device
  3. The two signals can be meaningfully compared value-by-value at each timestep

Problem

Some algorithms produce discrete event outputs — a list of detected timestamps or intervals — rather than a continuous signal. Examples include:

  • Sit-to-stand / stand-to-sit transition detection
  • Step detection
  • Fall detection
  • Gesture or posture classification

For these algorithms:

  • There is no continuous stream to resample or correlate
  • Ground truth is typically an annotation file (e.g., a JSON with labeled start/end times), not a reference device CSV
  • The appropriate metrics are precision, recall, and F1 score, computed by matching detected events to ground-truth events within some time tolerance window

Attempting to use scorer.py for this class of algorithm is not possible without significant changes, and the current README implies scorer.py is the standard evaluation path for all algorithms.

Proposed Solution

Add support for event detection evaluation, either as:

  1. A new script (e.g., event_scorer.py) that:

    • Accepts a ground-truth annotation JSON (list of labeled intervals with start_time / end_time)
    • Accepts an algorithm output file (list of detected event timestamps or intervals)
    • Matches detections to ground-truth events within a configurable time tolerance
    • Reports per-class and overall precision, recall, and F1
    • Optionally generates a timeline visualization overlaying detections and ground truth
  2. Or an extension to scorer.py with a new score_events() function that handles this case

The README's "evaluate a new algorithm" section should also be updated to describe both evaluation paths — continuous signal comparison and event detection — so users know which tool to use for their algorithm type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions