Skip to content

fufuhd/NumCompute-Stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NumCompute-Stream

NumCompute-Stream is a small NumPy-based streaming machine learning package. It supports incremental preprocessing, streaming metrics, a decision tree classifier, a tree ensemble classifier, pipeline training, benchmarking, and matplotlib visualisation.

Features

  • CSV loading and stream chunk generation
  • partial_fit() support for preprocessing, tree, ensemble, pipeline, and stream trainer
  • Decision tree classifier with Gini or entropy splitting
  • Random-forest-style ensemble using bootstrap sampling and majority voting
  • Streaming accuracy, precision, recall, F1, and confusion matrix
  • Streaming statistics including mean, variance, quantile, and histogram
  • Built-in visualisation through visualise.py

Installation

python -m pip install -e .

Run tests

pytest

Run demo

python demo/stream_demo.py

The demo loads demo/sample_stream_data.csv, splits it into chunks, trains a single tree and an ensemble incrementally, then saves plots into the demo/ folder.

Run benchmark

python benchmark/benchmark_stream.py

Example

from numcompute_stream.preprocessing import StandardScaler
from numcompute_stream.ensemble import EnsembleClassifier
from numcompute_stream.pipeline import Pipeline

pipe = Pipeline([
    ("scale", StandardScaler()),
    ("forest", EnsembleClassifier(n_estimators=5, max_depth=4))
])

pipe.partial_fit(X_chunk, y_chunk)
predictions = pipe.predict(X_chunk)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages