This project focuses on handling supply chain data, inserting it into MongoDB, and performing predictive analysis using machine learning techniques in the Databricks environment.
-
Data Insertion (
insert_data_mongo.ipynb
)- Reads supply chain data from a CSV file.
- Inserts the data into MongoDB under the
supply_chain_data
collection.
-
Predictive Analysis (
predictive_analysis_supply_chain.ipynb
)- Loads supply chain data from MongoDB.
- Uses Apache Spark in Databricks for data preprocessing.
- Trains a Random Forest Regressor to predict inventory pricing.
- Stores predictions in MongoDB under the
prediction_data
collection.
- MongoDB (for data storage)
- Databricks (Apache Spark environment) (for predictive analysis)
- Python Libraries:
pymongo
(for MongoDB interaction)pyspark
(for machine learning and data processing in Databricks)pandas
(for handling CSV data)
-
Install Dependencies (If running locally needs pyspark installation, not needed in Databricks)
pip install pymongo pyspark pandas
-
Insert Data into MongoDB
- Run
insert_data_mongo.ipynb
in Databricks to load supply chain data into MongoDB.
- Run
-
Run Predictive Analysis
- Run
predictive_analysis_supply_chain.ipynb
in Databricks to train the model and store predictions.
- Run
- Uses Random Forest Regression for price prediction.
- Features:
Price
andStock Levels
. - Trained model predicts pricing trends in supply chain management.
- Predictions are stored in MongoDB for further business insights.
- Ensure Databricks Runtime includes Apache Spark.
- Upload and run notebooks in Databricks.
- Connect Databricks to MongoDB for data storage and retrieval.