Unified Mentor Internship Projects
Project 1: Animal Image Classification
Project 2: Forest Cover Type Prediction
This project involves building a deep learning model to classify images of animals into one of 15 distinct categories. The dataset consists of images for each class, and the model is designed to identify the correct animal based on the input image. Techniques such as Convolutional Neural Networks (CNN) and Transfer Learning are utilized to build an efficient and accurate classifier.
- Project Overview
- Dataset
- Project Structure
- Model Architecture
- Training Process
- Evaluation and Results
- How to Run
- Future Improvements
The dataset consists of 15 folders, each containing images of a particular animal species. All images are 224x224 pixels with 3 color channels (RGB), making them suitable for image classification tasks.
- Bear
- Bird
- Cat
- Cow
- Deer
- Dog
- Dolphin
- Elephant
- Giraffe
- Horse
- Kangaroo
- Lion
- Panda
- Tiger
- Zebra
- Image Dimensions: 224x224 pixels
- Color Channels: 3 (RGB)
- Total Classes: 15
├── data/
│ ├── train/
│ │ ├── Bear/
│ │ ├── Bird/
│ │ ├── ... (13 other classes)
│ └── test/
│ ├── Bear/
│ ├── Bird/
│ ├── ... (13 other classes)
├── notebooks/
│ ├── Image_Classification.ipynb
├── models/
│ └── model.h5 # Trained model saved here
├── README.md
└── requirements.txt
The project uses Convolutional Neural Networks (CNN) to extract features from the images and classify them into the correct animal category. Additionally, Transfer Learning is employed using pre-trained models like VGG16 or ResNet50 to improve performance.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(512, activation='relu'))
model.add(Dense(15, activation='softmax')) # 15 classes
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import GlobalAveragePooling2D
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False # Freeze the base model
model = Sequential([
Dense(512, activation='relu'),
Dense(15, activation='softmax')
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
The model is trained on the dataset using data augmentation techniques to increase model robustness and prevent overfitting.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
img_height, img_width = 224, 224
train_datagen = ImageDataGenerator(
validation_split=0.2 # 20% of data for validation
train_generator = train_datagen.flow_from_directory(
target_size=(img_height, img_width),
validation_generator = train_datagen.flow_from_directory(
target_size=(img_height, img_width),
epochs = 10
history = model.fit(
steps_per_epoch=train_generator.samples // train_generator.batch_size,
validation_steps=validation_generator.samples // validation_generator.batch_size
After training the model, it was evaluated on the validation set:
loss, accuracy = model.evaluate(validation_generator)
print(f'Validation accuracy: {accuracy * 100:.2f}%')
The current model achieved an accuracy of approximately 54.31% on the validation set. There is scope for improvement by using advanced techniques such as Transfer Learning and Hyperparameter Tuning.
- Python 3.x
- TensorFlow 2.x
- Keras
Clone the repository:
git clone https://github.com/yourusername/animal-image-classification.git
Install the required dependencies:
pip install -r requirements.txt
Organize the dataset into
folders. -
Train the model by running the Jupyter notebook:
cd notebooks jupyter notebook Image_Classification.ipynb
- Transfer Learning: Implementing pre-trained models such as ResNet50, InceptionV3, or EfficientNet to boost model performance.
- Fine-Tuning: Unfreezing some layers of the pre-trained models for fine-tuning on the specific dataset.
- Hyperparameter Tuning: Experimenting with batch size, learning rate, and number of layers for better accuracy.
- Data Augmentation: Further experimenting with data augmentation techniques like brightness and contrast shifts.
Feel free to fork this repository and contribute by creating pull requests!
The objective of this project is to build a machine learning model that predicts the type of forest cover in a given area using various environmental features. The dataset used for this project is derived from the Roosevelt National Forest in northern Colorado, provided by the forest department.
We aim to classify the type of forest cover in a 30m x 30m patch of land into one of the seven forest cover types:
- Spruce/Fir
- Lodgepole Pine
- Ponderosa Pine
- Cottonwood/Willow
- Aspen
- Douglas-fir
- Krummholz
The model will use 12 primary features, including elevation, slope, soil type, and other environmental variables, to make accurate predictions.
The dataset consists of both categorical and numerical features. The key features include:
- Elevation: Elevation in meters.
- Aspect: Aspect in degrees azimuth.
- Slope: Slope in degrees.
- Horizontal and Vertical Distances: Distance to nearest hydrological, roadways, and fire points.
- Hillshade: Hillshade index at various times of the day.
- Wilderness Area: Binary columns indicating different wilderness areas.
- Soil Type: Binary columns for different soil types.
- Cover_Type: The target variable representing the forest cover type.
- Data Preprocessing: The dataset is preprocessed, including handling missing values (if any), feature scaling, and encoding categorical variables.
- Exploratory Data Analysis (EDA): Initial exploration of the dataset to understand feature distributions and relationships.
- Model Building: Various machine learning models are tested, including:
- Random Forest Classifier
- Support Vector Machine (SVM)
- XGBoost
- Model Evaluation: Models are evaluated using metrics like accuracy, precision, recall, and a confusion matrix.
- Model Saving: The trained model is saved using
for future use or deployment.
To run this project, you'll need to install the following libraries:
pip install numpy pandas scikit-learn matplotlib seaborn joblib
- Clone this repository:
git clone https://github.com/your_username/forest-cover-type-prediction.git
- Upload your dataset to the working directory (e.g.,
). - Open the Jupyter notebook or Google Colab file and follow the steps provided in the notebook.
├── forest_cover_type_model.pkl # Trained Random Forest model
├── forest_cover.csv # Dataset used for training and testing
├── README.md # Project documentation
├── requirements.txt # Required libraries
└── notebook.ipynb # Jupyter/Colab notebook for the project
First, load the dataset into a pandas DataFrame:
import pandas as pd
df = pd.read_csv('forest_cover.csv')
Preprocess the dataset by scaling numerical features and splitting the data into training and test sets:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Separate features and target
X = df.drop('Cover_Type', axis=1)
y = df['Cover_Type']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale numerical features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.iloc[:, :10]) # Scale only numerical columns
X_test_scaled = scaler.transform(X_test.iloc[:, :10])
Train the Random Forest Classifier:
from sklearn.ensemble import RandomForestClassifier
# Initialize and train the model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train_scaled, y_train)
Evaluate the model on test data:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Predictions
y_pred = rf_model.predict(X_test_scaled)
# Accuracy and Classification Report
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
print(f'Classification Report:\n{classification_report(y_test, y_pred)}')
# Confusion Matrix
print(f'Confusion Matrix:\n{confusion_matrix(y_test, y_pred)}')
Save the trained model for future use:
import joblib
joblib.dump(rf_model, 'forest_cover_type_model.pkl')
# Load the model for future predictions
loaded_model = joblib.load('forest_cover_type_model.pkl')
The Random Forest model achieved an accuracy of XX.XX% on the test set.
[[TP, FP],
[FN, TN]]
Class | Precision | Recall | F1-score |
Spruce/Fir | x.xx | x.xx | x.xx |
Lodgepole Pine | x.xx | x.xx | x.xx |
... | ... | ... | ... |
- Saurabh Yadav - GitHub Profile