Date: 13-Dec-24
END TO END ML PROJECTS - KRISH NAIK YOUTUBE CHANNEL
https://www.youtube.com/watch?v=Rv6UFGNmNZg&list=PLZoTAELRMXVPS-dOaVbAux22vzqdgoGhG&index=2
I. Set up project with GitHub
- Data Ingestion
- Data Transformation
- Model Trainer
- Model Evaluation
- Model Deployment
II. CI/CD Pipelines - GitHub Actions
III. Deployment AWS
1. Set up the GitHub repositorya) new environment
Create new venv in VScode terminal
conda create -p venv python==3.12 -y
conda activate venv/
git init
git add README.md (first create README.md file in VScode)
git commit -m "first commit"
Add this for first time:
git config --global user.email "[email protected]"
git config --global user.name "Akshay Kumar"
git branch -M main
git remote add origin https://github.com/akshaykr7/mlproject.git
git remote -v (optional)
git push -u origin main
create .gitignore file in GitHub webpage – create new file and select python
git pull (to merge changes done in GitHub webpage)
b) setup.py - create ML application as a package and send to Pypi
Whenever setup.py will run it will search for folder which have init.py and consider that package as a package.
c) requirements.txt
-e . -> map to the setup.py and it will run
d) src folder and build the packages
pip install -r requirements.txt
-
Create components folder in src folder.
To create different modules like data ingestion, transformation
-
Create pipeline folder – training and prediction pipeline
-
Create logger.py, exception.py, utils.py
exception.py -> create custom error class
logger.py -> logs every execution into the text file
ML Basics - This part will be used in modular coding (i.e. data_ingestion.py, etc...) for deployement.
- Project - Student data link
- EDA - Explorator Data Analysis
- Model training steps
Load data, train test split, save data
data preprocessing - Onehotencoder, StandardScaler
save preprocessor.pkl file
Model training and evaluation
Run data_ingestion.py file to create .pkl file in artifacts folder
save model.pkl file -> It has best model from src/components/model_trainer.py
GridSearchcv
- create app.py
- create templates folder > index.html file inside it (for UI part)
- create home.html file under templates -> remember under form action, predict_datapoint function should be same as in app.py file.
- src/pipeline/predict_pipeline.py