Automated Classification and Cell Counting of Lung Cancer from Histopathological Images Using Machine Learning (LPAI)
LPAI integrates a convolutional neural network model in PyTorch for automatic classification and quantification of lung cancer cells in histopathological images. This project implements preprocessing for enhanced image analysis, includes functionalities for model training, performance evaluation, and provides visualization of predictions alongside cell counts.
| Timesheet | Slack channel | Project report |
|---|
After executing src/train.py in a virtual environment, here are some example outputs:
The classification report below corresponds to the confusion matrix above, summarizing the model's performance on the same subset of 5000 images. It was generated using the scikit-learn library's classification_report function, reflecting the precision, recall, f1-score, and support for each class based on the model's predictions within the virtual environment terminal:
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| lung_aca | 0.98 | 0.93 | 0.95 | 336 |
| lung_n | 0.99 | 1.00 | 1.00 | 344 |
| lung_scc | 0.94 | 0.97 | 0.96 | 320 |
| accuracy | 0.97 | 1000 | ||
| macro avg | 0.97 | 0.97 | 0.97 | 1000 |
| weighted avg | 0.97 | 0.97 | 0.97 | 1000 |
Overall Accuracy: 0.97
The visualization below showcases the model's predictions compared to the actual classifications for four randomly selected histopathological images from the loaded data. Each image is accompanied by a label indicating the true class and the class predicted by the model, along with the estimated cell count detected within the image.
Training data from Kaggle can be found here.
After downloading the datasets, unzip, then move lung_image_sets into your local repository as below.
2024_1_project_06/
│
├── data/ # upload training data downloaded from Kaggle
│ └── lung_image_sets/
│ ├── lung_aca/
│ ├── lung_n/
│ └── lung_scc/
│
├── src/
│ ├── __init__.py # makes src a Python package
│ ├── train.py # training script
│ └── test/
│ └── test.py
│
├── LICENSE
├── README.md
└── requirements.txtgit clone https://github.com/sfu-cmpt340/2024_1_project_06.git
cd 2024_1_project_06Create a virtual environment and activate it:
For Windows:
python -m venv env
env\Scripts\activateFor Unix systems (including Linux and macOS):
python -m venv env
source env/bin/activateInstall PyTorch by selecting the appropriate installation command from the PyTorch official Get Started page. The command you choose should correspond to your operating system, package manager (pip), Python version, and whether you need CUDA support.
For example, if you're installing PyTorch with CUDA 11.8 support:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
After PyTorch is installed, install the remaining project dependencies:
pip install -r requirements.txtTo start training the main script, run train.py
python src/train.pyTo reproduce the results of the LPAI project after cloning and setting up your development environment:
- Set up your dataset and follow the installation process.
- Navigate to the
srcdirectory within your project repository. - To start training the model, run the
train.pyscript:
python src/train.pyThe script train.py is configured with predefined parameters for training. If you need to adjust settings such as the total_subset_size or the number of epochs, you will have to do so directly within the script.
For example, to change the total_subset_size, locate and modify the following line in train.py:
train_loader, test_loader, class_names = load_data(total_subset_size=400) # Adjust this number as neededAfter making the necessary changes, save the script and execute it as shown above. The training process will begin using the new subset size you specified.



