Skip to content

jieweigrantli/EV_ToolBox_Regreesion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EV Toolbox Regression

Overview

The EV Toolbox Regression project provides a streamlined solution for performing regression analysis on datasets stored in Box. The workflow involves downloading datasets, combining them, running regression analysis, and uploading results back to Box, ensuring an end-to-end solution for data handling and analysis.

This tool is built using Python, leveraging libraries such as pandas and scikit-learn for data manipulation and machine learning, and the Box SDK for seamless integration with Box.


Features

  • Box Integration: Securely connect to Box to download and upload files.
  • Dataset Combination: Automatically combine datasets using a specified key column.
  • Regression Analysis: Perform linear regression and output:
    • Regression coefficients in a single-row CSV format.
    • R² score, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) in the terminal.
  • File Overwriting: Automatically replaces existing files in Box when uploading new results.

Requirements

  • Python: Version 3.8 or later.
  • Libraries:
    • pandas
    • scikit-learn
    • boxsdk
    • dotenv
    • requests
  • Box Account: Required for file storage and authentication.

Setup Instructions

1. Clone the Repository

git clone https://github.com/your-repo/ev-toolbox-regression.git
cd ev-toolbox-regression

2. Install Dependencies

Create a virtual environment and install the required Python packages:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Set Up .env File

Create a .env file in the project root with the following content:

# Box Credentials
BOX_CLIENT_ID=your_client_id
BOX_CLIENT_SECRET=your_client_secret
BOX_ACCESS_TOKEN=your_access_token
BOX_REFRESH_TOKEN=your_refresh_token
BOX_REDIRECT_URI=http://localhost

# File IDs
BOX_FILE_IDS=your_file_id_1,your_file_id_2

# Output Folder ID
BOX_OUTPUT_FOLDER_ID=your_output_folder_id

# Regression Parameters
TARGET_COLUMN=Y_house_price_of_unit_area
FEATURE_COLUMNS=X1_transaction_date,X2_house_age,X3_distance_to_MRT,X4_number_of_convenience_stores,X5_latitude,X6_longitude

Usage

1. Run the Workflow

Run the main.py script to execute the end-to-end process:

python main.py

The workflow performs the following steps: 1. Authenticate with Box.

2.	Download datasets specified in BOX_FILE_IDS.

3.	Combine the datasets using the No column as the key.

4.	Perform regression analysis:

    • Outputs regression coefficients as a CSV (**'reg_coefficients.csv'**).

    • Displays model statistics (R², MSE, RMSE) in the terminal.

5.	Upload the results back to the Box folder specified in **'BOX_OUTPUT_FOLDER_ID'**.

2. Reauthorize the Application

If tokens expire, the script will prompt you to reauthorize the application:

1.	Follow the provided authorization URL.

2.  Log in your Box account, click "Grant Access".

3.	Paste the received authorization code (at the end of link of the redicted page, after "**'code='**") into the terminal.

File Descriptions

  • main.py: Orchestrates the entire workflow, including authentication, file handling, dataset combination, regression, and result upload.
  • data_utils.py: Contains functions for combining datasets and handling file downloads/uploads with Box.
  • regression_utils.py: Handles regression analysis, calculates statistics, and saves coefficients to a CSV file.
  • token_manager.py: Manages Box API authentication, token refresh, and reauthorization.
  • .env: Configuration file for Box credentials, file IDs, output folder ID, and regression parameters.

Example Output

Terminal Output

Authenticated User: John Doe (ID: 12345678901)
Datasets combined successfully.
Regression Statistics:
R² Score: 0.5824
Mean Squared Error: 77.1317
Root Mean Squared Error: 8.7825
Regression coefficients saved to data/reg_coefficients.csv.
File uploaded successfully: reg_coefficients.csv

reg_coefficients.csv

X1_transaction_date X2_house_age X3_distance_to_MRT X4_number_of_convenience_stores X5_latitude X6_longitude Intercept
5.146227462979936 -0.269695448 -0.004487461 1.133276905 225.472976 -12.423601 -14437.101

Troubleshooting

  • ValueError: One or more specified columns are missing in the combined dataset.

    • Ensure all specified columns in the .env file exist in the combined dataset.
    • Verify the input datasets contain the required columns and are properly combined.
  • BoxAPIException: item_name_in_use

    • This error occurs when attempting to upload a file that already exists in the Box folder.
    • The script is configured to overwrite files in Box. If the issue persists, verify that the upload_file_to_box function is correctly replacing files.
  • Authentication Error: Refresh token has expired

    • Reauthorize the application by following the URL provided in the terminal and entering the new authorization code.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages