Skip to content

This repository contains implementations of logistic regression using high-performance computing techniques, supporting both CPU and GPU architectures.

License

Notifications You must be signed in to change notification settings

NechbaMohammed/SwiftLogisticReg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwiftLogisticReg: Accelerated Logistic Regression Package for High-Performance Computing

PyPI - Downloads PyPI License

What is it?

SwiftLogisticReg is a package that offers efficient implementations of logistic regression using high-performance computing techniques, with support for both CPU and GPU architectures. The algorithms are implemented in Python 3.8, and the GPU utilization is enhanced through CUDA programming, significantly accelerating the training process.

Table of Contents

Main Features

  • logistic_cpu: Implementation of logistic regression using multi-core parallelism on the CPU. It is designed to efficiently handle large datasets and perform binary classification tasks.

  • logistic_gpu: Implementation of logistic regression using CUDA programming to harness the power of modern GPUs. The GPU implementation aims to expedite the training process, particularly for big data scenarios.

Where to get it

The source code is currently hosted on GitHub at: https://github.com/NechbaMohammed/SwiftLogisticReg

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install SwiftLogisticReg

Dependencies

Documentation

This documentation provides information about GPU and CPU usage, data description, and performance comparison between different versions of logistic regression.

GPU and CPU Information

GPU Information

Index Name Memory.Total [MiB] Memory.Used [MiB] Memory.Free [MiB] Temperature.GPU
0 Tesla T4 15360 MiB 0 MiB 15101 MiB 58

CPU Information

  • Current CPU frequency: 2199.998 MHz
  • Minimum CPU frequency: 0.0 MHz
  • Maximum CPU frequency: 0.0 MHz
  • Number of CPU cores: 2

CPU Usage per Core

  • Core 0: 60.8%
  • Core 1: 62.7%

Data Description

The HIGGS dataset was originally introduced in a research paper titled "Discovering the Higgs boson in the noise" by Baldi et al. (Nature Communications, 2014). The authors of the paper are Pierre Baldi, Peter Sadowski, and Daniel Whiteson. The dataset is used for searching for exotic particles in high-energy physics with deep learning. logo

Reference:

Load data:

import numpy as np
import pandas as pd

df  =  pd.read_csv("./data/HIGGS_2M_Row.csv")

y= df['label']
X = df.drop('label',axis=1)
X = X.to_numpy()
y = y.to_numpy().reshape(1,y.shape[0])

Logistic Regression GPU-version

from SwiftLogisticReg.logistic_gpu import LogisticRegressionGPU
from sklearn.metrics import f1_score
import time

# Measure the execution time of the logistic_regression function
start_time = time.time()
model = LogisticRegressionGPU()
model.fit(X, y)
end_time = time.time()

# Print the execution time
print("Execution time:", end_time - start_time, "seconds")

# Use the trained model to make predictions on the training data
y_pred = model.predict(X)

# Calculate and print F1 score
f1Score = f1_score(y_pred[0], y[0])

print("f1_score is", f1Score)

Results:

Execution time: 9.497852802276611 seconds
f1_score is 0.6741570975215551

Logistic Regression CPU-version

from SwiftLogisticReg.logistic_cpu import LogisticRegression 
from sklearn.metrics import accuracy_score, f1_score
import time

# Measure the execution time of the logistic_regression function
start_time = time.time()

# Create an instance of the LogisticRegression class
log_reg = LogisticRegression()

# Train the model on the training data
log_reg.fit(X, y)

end_time = time.time()

predictions = log_reg.predict(X)
# Print the execution time
print("Execution time:", end_time - start_time, "seconds")


# Reshape predictions to match the expected shape for accuracy_score and f1_score
predictions = predictions.reshape(1, predictions.shape[0])

# Calculate F1 score
f1Score = f1_score(predictions[0], y[0])

# Print the results
print("f1_score is", f1Score)

Results:

Execution time: 19.423811674118042 seconds
f1_score is 0.6689305175558841

Logistic Regression Sklearn-version

import time
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score

# Create a LogisticRegression classifier with specified parameters
clf = LogisticRegression(l1_ratio=0.2, tol=2e-2)

# Start timer for model training
t1 = time.time()

# Fit the classifier to the data
clf.fit(X, y[0])

# End timer for model training
t2 = time.time()
print("The execution time: ", t2 - t1)

# Make predictions on the data
y_pred = clf.predict(X)

# Calculate f1 score
f1Score = f1_score(y_pred, y[0])

# Print the results
print("F1 score: ", f1Score)

Results:

The execution time:  22.600391149520874
F1 score:  0.6869399278895209

Authors and Contributors

👤 Mohammed Nechba

👤 Mohamed Mouhajir

👤 Yassine Sedjari

Go to Top

About

This repository contains implementations of logistic regression using high-performance computing techniques, supporting both CPU and GPU architectures.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages