PySATL Change point detection subproject (abbreviated pysatl-cpd) is a module, designed for detecting anomalies in time series data, which refer to significant deviations from expected patterns or trends. Anomalies can indicate unusual events or changes in a system, making them crucial for monitoring and analysis in various fields such as finance, healthcare, and network security.
At the moment, the module implements the following CPD algorithms:
- Bayesian algorithm (scrubbing, online and linear heuristic online versions)
- Density based algorithms:
- KLIEP
- RuLSIF
- Graph algorithm
- k-NN based algorithm
- Algorithms, based on classifiers:
- SVM
- KNN
- Decision Tree
- Logistic Regression
- Random Forest
-
Python 3.10+
-
Poetry 2.1.0+
Clone repository:
git clone https://github.com/PySATL/pysatl-cpd.git
Go to repository directory and run:
cd pysatl-cpd
poetry install
py --version
poetry --version
If python not in the PATH, reinstall it and make sure that you checked the box "Add Python to PATH"
If poetry not in the PATH, add it by running:
setx PATH "%PATH%;%APPDATA%\Python\Scripts"
After that, do not forget to restart the terminal.
Go to repository folder and run:
Set-Location pysatl-cpd
poetry install
from pathlib import Path
from pysatl_cpd.labeled_data import LabeledCpdData
# import change point detection solver
from pysatl_cpd.online_cpd_solver import OnlineCpdSolver
from pysatl_cpd.core.problem import CpdProblem
# import algorithm
from pysatl_cpd.core.algorithms.bayesian_online_algorithm import BayesianOnline
from pysatl_cpd.core.algorithms.bayesian.likelihoods.gaussian_conjugate import GaussianConjugate
from pysatl_cpd.core.algorithms.bayesian.hazards.constant import ConstantHazard
from pysatl_cpd.core.algorithms.bayesian.detectors.threshold import ThresholdDetector
from pysatl_cpd.core.algorithms.bayesian.localizers.argmax import ArgmaxLocalizer
labeled_data = LabeledCpdData.generate_cp_datasets(Path("examples/configs/test_config_exp.yml"))["example"]
# specify CPD algorithm with parameters
algorithm = BayesianOnline(
learning_sample_size=5,
likelihood=GaussianConjugate(),
hazard=ConstantHazard(rate=1.0 / (1.0 - 0.5 ** (1.0 / 500))),
detector=ThresholdDetector(threshold=0.005),
localizer=ArgmaxLocalizer(),
)
# make a solver object
solver = OnlineCpdSolver(CpdProblem(True), algorithm, labeled_data)
# then run algorithm
cpd_results = solver.run()
# print the results
print(cpd_results)
# output:
# Located change points: (200;400)
# Expected change point: (200;400)
# Difference: ()
# Computation time (sec): 0.2
# visualize data with located changepoints
cpd_results.visualize()
from pathlib import Path
from benchmarking.pipeline.pipeline import Pipeline
from benchmarking.steps.data_generation_step.data_generation_step import DataGenerationStep
from benchmarking.steps.data_generation_step.data_handlers.generators.cpd_generator import CpdGenerator
from benchmarking.steps.experiment_execution_step.experiment_execution_step import ExperimentExecutionStep
from benchmarking.steps.experiment_execution_step.workers.run_complete_algorithm_worker import (
RunCompleteAlgorithmWorker,
)
from benchmarking.steps.report_generation_step.report_builders.change_point_builder import CpBuilder
from benchmarking.steps.report_generation_step.report_generation_step import ReportGenerationStep
from benchmarking.steps.report_generation_step.report_visualizers.change_point_text_visualizer import CpTextVisualizer
from benchmarking.steps.report_generation_step.reporters.reporter import Reporter
from pysatl_cpd.core.algorithms.bayesian.detectors.threshold import ThresholdDetector
from pysatl_cpd.core.algorithms.bayesian.hazards.constant import ConstantHazard
from pysatl_cpd.core.algorithms.bayesian.likelihoods.heuristic_gaussian_vs_exponential import (
HeuristicGaussianVsExponential,
)
from pysatl_cpd.core.algorithms.bayesian.localizers.argmax import ArgmaxLocalizer
from pysatl_cpd.core.algorithms.bayesian_algorithm import BayesianAlgorithm
# Generate data with example config and save as my_experiment_dataset
generator = CpdGenerator(
name="cpd_generator", output_storage_names={"example"}, config=Path("examples/configs/test_config_exp.yml")
)
step_1 = DataGenerationStep(
data_handler=generator,
name="cpd_generation_test_config_exp_step",
output_storage_names={"example": "my_experiment_dataset"},
)
# Initialize BayesianAlgorithm and run with generated data
algorithm = BayesianAlgorithm(
learning_steps=5,
likelihood=HeuristicGaussianVsExponential(),
hazard=ConstantHazard(rate=1.0 / (1.0 - 0.5 ** (1.0 / 500))),
detector=ThresholdDetector(threshold=0.005),
localizer=ArgmaxLocalizer(),
)
algo_worker = RunCompleteAlgorithmWorker(algorithm=algorithm, name="run_bayesian_algorithm_worker")
step_2 = ExperimentExecutionStep(
worker=algo_worker, name="run_bayesian_algorithm_step", input_storage_names={"my_experiment_dataset": "dataset"}
)
# Generate text report with change points from Result Storage
builder = CpBuilder()
visualizer = CpTextVisualizer(file_name="my_experiment_change_points_report")
reporter = Reporter(builder, visualizer, name="text_reporter")
step_3 = ReportGenerationStep(reporter, name="ReportGeneration", input_storage_names={"change_points"})
# configure pipeline and start the experiment
steps = [step_1, step_2, step_3]
pipeline = Pipeline(steps)
pipeline.run()
output in results/my_experiment_change_points_report.txt
:
Located change points: [25, 201, 396]
If you want to contribute, you should create development environment as follows:
poetry install --with-dev
Install pre-commit hooks:
poetry run pre-commit install
Starting manually:
poetry run pre-commit run --all-files --color always --verbose --show-diff-on-failure
This project is licensed under the terms of the MIT license. See the LICENSE for more information.