Skip to content

Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn

Notifications You must be signed in to change notification settings

iboud0/scikit_learn_clone

Repository files navigation

README.md

Scikit Learn Clone

Scikit Learn Clone is a custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn. This project is designed to provide a comprehensive set of tools for data preprocessing, model selection, evaluation, and supervised learning.

Project Structure

The project is organized into several modules, each focusing on a specific aspect of machine learning. Below is the detailed structure:

scikit_learn_clone/  
├── data/  
├── ensemble_methods/  
│   ├── __init__.py  
│   ├── adaBoost.py  
│   ├── bagging.py  
│   ├── gradient_boosting.py  
│   ├── randomForest.py  
│   ├── stacking.py  
├── metrics_model_evaluation/  
│   ├── __init__.py  
│   ├── accuracy.py  
│   ├── confusion_matrix.py  
│   ├── f1_score.py  
│   ├── mean_absolute_error.py  
│   ├── mean_squared_error.py  
│   ├── precision.py  
│   ├── r2_score.py  
│   ├── recall.py  
├── model_selection/  
│   ├── __init__.py  
│   ├── cross_validation.py  
│   ├── grid_search.py  
│   ├── kfold.py  
│   ├── make_scorer.py  
│   ├── param_grid.py  
│   ├── train_test_split.py  
├── preprocessing/  
│   ├── __init__.py  
│   ├── impute_missing_values_mean.py  
│   ├── impute_missing_values_median.py  
│   ├── impute_missing_values_mode.py  
│   ├── labelencoder.py  
│   ├── normalize_features.py  
│   ├── numerical_categorical_variable.py  
│   ├── onehotencoder.py  
│   ├── outliers.py  
│   ├── scale_features_min_max.py  
│   ├── scale_features_standard.py  
│   ├── select_features.py  
├── supervised_learning/  
│   ├── __init__.py  
│   ├── DecisionTree.py  
│   ├── knn.py  
│   ├── LinearRegression.py  
│   ├── LogisticRegression.py
│   ├── NaiveBayes.py 
│   ├── NeuralNetwork.py 
├── testing/  
├── utilities/  
│   ├── __init__.py  
│   ├── Estimator.py  
│   ├── MetaEstimator.py  
│   ├── ModelSelector.py  
│   ├── Pipeline.py  
│   ├── Predictor.py  
│   ├── Transformer.py  
├── .gitignore  
├── README.md  
└── setup.py  

Installation

To install the package, use pip:

pip install sktlearn-clone

Usage

Here are some examples of how to use the various modules in this package.

Example: Decision Tree Classifier

from scikit_learn_clone.supervised_learning.DecisionTree import DecisionTreeClassifier
from scikit_learn_clone.model_selection.train_test_split import train_test_split
from scikit_learn_clone.metrics_model_evaluation.accuracy import accuracy_score

# Sample dataset
X = [[0, 0], [1, 1]]
y = [0, 1]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

# Initialize and train the classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Example: K-Fold Cross-Validation

from scikit_learn_clone.model_selection.kfold import KFold
from scikit_learn_clone.supervised_learning.LinearRegression import LinearRegression

# Sample dataset
X = [[i] for i in range(10)]
y = [2 * i for i in range(10)]

# Initialize KFold
kf = KFold(n_splits=5)

# Initialize model
model = LinearRegression()

# Perform K-Fold Cross-Validation
for train_index, test_index in kf.split(X):
    X_train, X_test = [X[i] for i in train_index], [X[i] for i in test_index]
    y_train, y_test = [y[i] for i in train_index], [y[i] for i in test_index]
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    print(f"Fold results: {predictions}")

Features

Ensemble Methods

  • AdaBoost: Adaptive Boosting algorithm.
  • Bagging: Bootstrap Aggregating algorithm.
  • Gradient Boosting: Gradient Boosting algorithm.
  • Random Forest: Ensemble of Decision Trees.
  • Stacking: Stacked generalization.

Metrics and Model Evaluation

  • Accuracy: Classification accuracy.
  • Confusion Matrix: Performance measurement for classification.
  • F1 Score: Harmonic mean of precision and recall.
  • Mean Absolute Error: Regression metric.
  • Mean Squared Error: Regression metric.
  • Precision: Classification precision.
  • R2 Score: Coefficient of determination.
  • Recall: Classification recall.

Model Selection

  • Cross-Validation: Split the dataset into k consecutive folds.
  • Grid Search: Exhaustive search over specified parameter values.
  • K-Fold: K-Fold cross-validation iterator.
  • Make Scorer: Convert metrics into callables.
  • Param Grid: Define the parameter grid for search.
  • Train-Test Split: Split arrays or matrices into random train and test subsets.

Preprocessing

  • Imputation: Handle missing values.
    • Mean, Median, Mode imputation.
  • Label Encoding: Encode categorical features as an integer array.
  • Normalization: Scale input vectors individually to unit norm.
  • One-Hot Encoding: Encode categorical integer features as a one-hot numeric array.
  • Outlier Detection: Identify and handle outliers in the data.
  • Feature Scaling: Standardize features by removing the mean and scaling to unit variance.
    • Min-Max scaling.
  • Feature Selection: Select features based on importance or correlation.

Supervised Learning

  • Decision Tree: Decision Tree classifier.
  • k-Nearest Neighbors: k-Nearest Neighbors algorithm.
  • Linear Regression: Linear Regression algorithm.
  • Logistic Regression: Logistic Regression algorithm.
  • Naive Bayes: Naive Bayes algorithm.
  • Neural Network: Neural Network algorithm.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature).
  3. Make your changes.
  4. Commit your changes (git commit -m 'Add some feature').
  5. Push to the branch (git push origin feature/your-feature).
  6. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


This revised version ensures the project structure is properly separated and formatted. This will improve readability and help users quickly find relevant information.

About

Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages