Skip to content

PySATL/pysatl-cpd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySATL-CPD

Checks MIT License

PySATL Change point detection subproject (abbreviated pysatl-cpd) is a module, designed for detecting anomalies in time series data, which refer to significant deviations from expected patterns or trends. Anomalies can indicate unusual events or changes in a system, making them crucial for monitoring and analysis in various fields such as finance, healthcare, and network security.

At the moment, the module implements the following CPD algorithms:

  • Bayesian algorithm (scrubbing, online and linear heuristic online versions)
  • Density based algorithms:
    • KLIEP
    • RuLSIF
  • Graph algorithm
  • k-NN based algorithm
  • Algorithms, based on classifiers:
    • SVM
    • KNN
    • Decision Tree
    • Logistic Regression
    • Random Forest

Requirements

  • Python 3.10+

  • Poetry 2.1.0+

Installation

Clone repository:

git clone https://github.com/PySATL/pysatl-cpd.git

Linux

Go to repository directory and run:

cd pysatl-cpd
poetry install

Windows

Check if python and poetry in PATH (Optional)

py --version
poetry --version

If python not in the PATH, reinstall it and make sure that you checked the box "Add Python to PATH"

If poetry not in the PATH, add it by running:

setx PATH "%PATH%;%APPDATA%\Python\Scripts"

After that, do not forget to restart the terminal.


Go to repository folder and run:

Set-Location pysatl-cpd
poetry install

Change point detection example:

from pathlib import Path

from pysatl_cpd.labeled_data import LabeledCpdData

# import change point detection solver
from pysatl_cpd.online_cpd_solver import OnlineCpdSolver
from pysatl_cpd.core.problem import CpdProblem

# import algorithm
from pysatl_cpd.core.algorithms.bayesian_online_algorithm import BayesianOnline
from pysatl_cpd.core.algorithms.bayesian.likelihoods.gaussian_conjugate import GaussianConjugate
from pysatl_cpd.core.algorithms.bayesian.hazards.constant import ConstantHazard
from pysatl_cpd.core.algorithms.bayesian.detectors.threshold import ThresholdDetector
from pysatl_cpd.core.algorithms.bayesian.localizers.argmax import ArgmaxLocalizer


labeled_data = LabeledCpdData.generate_cp_datasets(Path("examples/configs/test_config_exp.yml"))["example"]

# specify CPD algorithm with parameters
algorithm = BayesianOnline(
    learning_sample_size=5,
    likelihood=GaussianConjugate(),
    hazard=ConstantHazard(rate=1.0 / (1.0 - 0.5 ** (1.0 / 500))),
    detector=ThresholdDetector(threshold=0.005),
    localizer=ArgmaxLocalizer(),
)
# make a solver object
solver = OnlineCpdSolver(CpdProblem(True), algorithm, labeled_data)


# then run algorithm
cpd_results = solver.run()

# print the results
print(cpd_results)
# output:
# Located change points: (200;400)
# Expected change point: (200;400)
# Difference: ()
# Computation time (sec): 0.2

# visualize data with located changepoints
cpd_results.visualize()

example_of_output

Experiment pipeline example:

from pathlib import Path

from benchmarking.pipeline.pipeline import Pipeline
from benchmarking.steps.data_generation_step.data_generation_step import DataGenerationStep
from benchmarking.steps.data_generation_step.data_handlers.generators.cpd_generator import CpdGenerator
from benchmarking.steps.experiment_execution_step.experiment_execution_step import ExperimentExecutionStep
from benchmarking.steps.experiment_execution_step.workers.run_complete_algorithm_worker import (
  RunCompleteAlgorithmWorker,
)
from benchmarking.steps.report_generation_step.report_builders.change_point_builder import CpBuilder
from benchmarking.steps.report_generation_step.report_generation_step import ReportGenerationStep
from benchmarking.steps.report_generation_step.report_visualizers.change_point_text_visualizer import CpTextVisualizer
from benchmarking.steps.report_generation_step.reporters.reporter import Reporter
from pysatl_cpd.core.algorithms.bayesian.detectors.threshold import ThresholdDetector
from pysatl_cpd.core.algorithms.bayesian.hazards.constant import ConstantHazard
from pysatl_cpd.core.algorithms.bayesian.likelihoods.heuristic_gaussian_vs_exponential import (
  HeuristicGaussianVsExponential,
)
from pysatl_cpd.core.algorithms.bayesian.localizers.argmax import ArgmaxLocalizer
from pysatl_cpd.core.algorithms.bayesian_algorithm import BayesianAlgorithm

# Generate data with example config and save as my_experiment_dataset
generator = CpdGenerator(
  name="cpd_generator", output_storage_names={"example"}, config=Path("examples/configs/test_config_exp.yml")
)
step_1 = DataGenerationStep(
  data_handler=generator,
  name="cpd_generation_test_config_exp_step",
  output_storage_names={"example": "my_experiment_dataset"},
)

# Initialize BayesianAlgorithm and run with generated data
algorithm = BayesianAlgorithm(
  learning_steps=5,
  likelihood=HeuristicGaussianVsExponential(),
  hazard=ConstantHazard(rate=1.0 / (1.0 - 0.5 ** (1.0 / 500))),
  detector=ThresholdDetector(threshold=0.005),
  localizer=ArgmaxLocalizer(),
)
algo_worker = RunCompleteAlgorithmWorker(algorithm=algorithm, name="run_bayesian_algorithm_worker")
step_2 = ExperimentExecutionStep(
  worker=algo_worker, name="run_bayesian_algorithm_step", input_storage_names={"my_experiment_dataset": "dataset"}
)

# Generate text report with change points from Result Storage
builder = CpBuilder()
visualizer = CpTextVisualizer(file_name="my_experiment_change_points_report")
reporter = Reporter(builder, visualizer, name="text_reporter")
step_3 = ReportGenerationStep(reporter, name="ReportGeneration", input_storage_names={"change_points"})

# configure pipeline and start the experiment
steps = [step_1, step_2, step_3]
pipeline = Pipeline(steps)
pipeline.run()

output in results/my_experiment_change_points_report.txt:

Located change points: [25, 201, 396]

Development

If you want to contribute, you should create development environment as follows:

poetry install --with-dev

Pre-commit

Install pre-commit hooks:

poetry run pre-commit install

Starting manually:

poetry run pre-commit run --all-files --color always --verbose --show-diff-on-failure

License

This project is licensed under the terms of the MIT license. See the LICENSE for more information.

About

Python module for change point detection in a time series with experimental environment

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10

Languages