Scikit Learn Clone

README.md

Scikit Learn Clone

Scikit Learn Clone is a custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn. This project is designed to provide a comprehensive set of tools for data preprocessing, model selection, evaluation, and supervised learning.

Project Structure

The project is organized into several modules, each focusing on a specific aspect of machine learning. Below is the detailed structure:

scikit_learn_clone/  
├── data/  
├── ensemble_methods/  
│   ├── __init__.py  
│   ├── adaBoost.py  
│   ├── bagging.py  
│   ├── gradient_boosting.py  
│   ├── randomForest.py  
│   ├── stacking.py  
├── metrics_model_evaluation/  
│   ├── __init__.py  
│   ├── accuracy.py  
│   ├── confusion_matrix.py  
│   ├── f1_score.py  
│   ├── mean_absolute_error.py  
│   ├── mean_squared_error.py  
│   ├── precision.py  
│   ├── r2_score.py  
│   ├── recall.py  
├── model_selection/  
│   ├── __init__.py  
│   ├── cross_validation.py  
│   ├── grid_search.py  
│   ├── kfold.py  
│   ├── make_scorer.py  
│   ├── param_grid.py  
│   ├── train_test_split.py  
├── preprocessing/  
│   ├── __init__.py  
│   ├── impute_missing_values_mean.py  
│   ├── impute_missing_values_median.py  
│   ├── impute_missing_values_mode.py  
│   ├── labelencoder.py  
│   ├── normalize_features.py  
│   ├── numerical_categorical_variable.py  
│   ├── onehotencoder.py  
│   ├── outliers.py  
│   ├── scale_features_min_max.py  
│   ├── scale_features_standard.py  
│   ├── select_features.py  
├── supervised_learning/  
│   ├── __init__.py  
│   ├── DecisionTree.py  
│   ├── knn.py  
│   ├── LinearRegression.py  
│   ├── LogisticRegression.py
│   ├── NaiveBayes.py 
│   ├── NeuralNetwork.py 
├── testing/  
├── utilities/  
│   ├── __init__.py  
│   ├── Estimator.py  
│   ├── MetaEstimator.py  
│   ├── ModelSelector.py  
│   ├── Pipeline.py  
│   ├── Predictor.py  
│   ├── Transformer.py  
├── .gitignore  
├── README.md  
└── setup.py

Installation

To install the package, use pip:

pip install sktlearn-clone

Usage

Here are some examples of how to use the various modules in this package.

Example: Decision Tree Classifier

from scikit_learn_clone.supervised_learning.DecisionTree import DecisionTreeClassifier
from scikit_learn_clone.model_selection.train_test_split import train_test_split
from scikit_learn_clone.metrics_model_evaluation.accuracy import accuracy_score

# Sample dataset
X = [[0, 0], [1, 1]]
y = [0, 1]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

# Initialize and train the classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Example: K-Fold Cross-Validation

from scikit_learn_clone.model_selection.kfold import KFold
from scikit_learn_clone.supervised_learning.LinearRegression import LinearRegression

# Sample dataset
X = [[i] for i in range(10)]
y = [2 * i for i in range(10)]

# Initialize KFold
kf = KFold(n_splits=5)

# Initialize model
model = LinearRegression()

# Perform K-Fold Cross-Validation
for train_index, test_index in kf.split(X):
    X_train, X_test = [X[i] for i in train_index], [X[i] for i in test_index]
    y_train, y_test = [y[i] for i in train_index], [y[i] for i in test_index]
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    print(f"Fold results: {predictions}")

Features

Ensemble Methods

AdaBoost: Adaptive Boosting algorithm.
Bagging: Bootstrap Aggregating algorithm.
Gradient Boosting: Gradient Boosting algorithm.
Random Forest: Ensemble of Decision Trees.
Stacking: Stacked generalization.

Metrics and Model Evaluation

Accuracy: Classification accuracy.
Confusion Matrix: Performance measurement for classification.
F1 Score: Harmonic mean of precision and recall.
Mean Absolute Error: Regression metric.
Mean Squared Error: Regression metric.
Precision: Classification precision.
R2 Score: Coefficient of determination.
Recall: Classification recall.

Model Selection

Cross-Validation: Split the dataset into k consecutive folds.
Grid Search: Exhaustive search over specified parameter values.
K-Fold: K-Fold cross-validation iterator.
Make Scorer: Convert metrics into callables.
Param Grid: Define the parameter grid for search.
Train-Test Split: Split arrays or matrices into random train and test subsets.

Preprocessing

Imputation: Handle missing values.
- Mean, Median, Mode imputation.
Label Encoding: Encode categorical features as an integer array.
Normalization: Scale input vectors individually to unit norm.
One-Hot Encoding: Encode categorical integer features as a one-hot numeric array.
Outlier Detection: Identify and handle outliers in the data.
Feature Scaling: Standardize features by removing the mean and scaling to unit variance.
- Min-Max scaling.
Feature Selection: Select features based on importance or correlation.

Supervised Learning

Decision Tree: Decision Tree classifier.
k-Nearest Neighbors: k-Nearest Neighbors algorithm.
Linear Regression: Linear Regression algorithm.
Logistic Regression: Logistic Regression algorithm.
Naive Bayes: Naive Bayes algorithm.
Neural Network: Neural Network algorithm.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


This revised version ensures the project structure is properly separated and formatted. This will improve readability and help users quickly find relevant information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README.md

Scikit Learn Clone

Project Structure

Installation

Usage

Example: Decision Tree Classifier

Example: K-Fold Cross-Validation

Features

Ensemble Methods

Metrics and Model Evaluation

Model Selection

Preprocessing

Supervised Learning

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Testing		Testing
__pycache__		__pycache__
build/lib		build/lib
data		data
dist		dist
ensemble_methods		ensemble_methods
metrics_model_evaluation		metrics_model_evaluation
model_selection		model_selection
preprocessing		preprocessing
scikit_learn_clone.egg-info		scikit_learn_clone.egg-info
sklearn_clone.egg-info		sklearn_clone.egg-info
supervised_learning		supervised_learning
utilities		utilities
.gitignore		.gitignore
Estimator.py		Estimator.py
MetaEstimator.py		MetaEstimator.py
ModelSelector.py		ModelSelector.py
Pipeline.py		Pipeline.py
Predictor.py		Predictor.py
README.md		README.md
Transformer.py		Transformer.py
__init__.py		__init__.py
setup.py		setup.py

iboud0/scikit_learn_clone

Folders and files

Latest commit

History

Repository files navigation

README.md

Scikit Learn Clone

Project Structure

Installation

Usage

Example: Decision Tree Classifier

Example: K-Fold Cross-Validation

Features

Ensemble Methods

Metrics and Model Evaluation

Model Selection

Preprocessing

Supervised Learning

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages