Skip to content

the dataset leaks future information #227

@siaochuan

Description

@siaochuan

the dataset leaks future information into every sample via per-window normalization. In finetune_base_model.py (line 127) and finetune_base_model.py (line 132), x_mean/x_std are computed from the full window, which already includes the nominal forecast horizon. Both tokenizer and predictor use this same dataset path, so the leakage contaminates the whole pipeline; see finetune_tokenizer.py (line 20) and finetune_tokenizer.py (line 97). If your goal is unseen-future prediction, this is a stop-ship issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions