This project focuses on predicting whether a breast tumor is malignant (cancerous) or benign (non-cancerous) using machine learning classification models. The dataset utilized in this project is sourced from the UCI Machine Learning Repository, and a simplified version is attached.
- Non-cancerous growths that do not invade nearby tissues.
- Typically have well-defined borders.
- Removal is often done for symptom relief or to rule out cancer.
- Cancerous growths with the potential to invade surrounding tissues and spread to other parts of the body.
- May grow uncontrollably, posing a threat to health.
- Early detection and treatment are crucial for improved outcomes.
The project employs seven classification models to predict tumor type:
- Support Vector Machine (SVM)
- Random Forest Classification
- Naive Bayes Classification
- Logistic Regression Classification
- Kernel SVM
- K-Nearest Neighbors (KNN)
- Decision Tree Classification
The accuracy scores for each classification model before hyperparameter tuning are as follows:
- SVM: 94.152%
- Random Forest: 92.982%
- Naive Bayes: 96.491%
- Logistic Regression: 97.076%
- Kernel SVM: 96.491%
- KNN: 98.245%
- Decision Tree: 95.906%
The classification models exhibit promising accuracy in predicting breast tumor types. The K-Nearest Neighbors (KNN) model achieved the highest accuracy at 98.245%, followed closely by Logistic Regression at 97.076%. These results highlight the efficacy of machine learning in distinguishing between malignant and benign tumors based on the provided dataset.
It's essential to consider additional metrics, such as precision, recall, and F1 score, to comprehensively evaluate model performance. Furthermore, the choice of the optimal model may vary based on specific requirements, computational resources, and the importance of different evaluation metrics.
In summary, this project showcases the potential of machine learning in breast cancer prediction, offering a valuable tool for early diagnosis and intervention. Further fine-tuning and evaluation can be conducted to enhance the models' performance and robustness.