Skip to content

Asma123-code/Assignment10

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assignment10:

Using publicly available datasets from "https://github.com/awesomedata/awesome-public-datasets/tree/master/Datasets" GitHub repositories to perform data exploration, preprocessing, implement machine learning models, and visualize the results using Python programming only. Dataset

Dataset 1: Titanic Dataset Analysis and Machine Learning This repository contains code for analyzing the Titanic dataset and implementing machine learning models to predict passenger survival. The Titanic dataset is a famous dataset widely used for data analysis and machine learning tasks. It contains information about passengers on the Titanic, including their age, sex, class, fare, and survival status. Dataset Selection Rationale: The Titanic dataset is chosen for analysis due to its popularity and the rich information it provides about the passengers. It is an excellent dataset for practicing data pre-processing, exploratory data analysis (EDA), and implementing machine learning algorithms.

Dataset 2: Scorecard.csv, a single CSV file with all the years data combined. In it, we've converted categorical variables represented by integer keys in the original data to their labels and added a Year column database.sqlite, a SQLite database containing a single Scorecard table that contains the same information as Scorecard.csv

Instructions for Running the Code: To run the code and perform the analysis, follow these steps:

  1. Clone this repository to your local machine.
  2. Make sure you have the required Python packages installed. If not, you can install them using pip: Assignment Overview: Titanic Dataset Analysis and Machine Learning This repository contains code for analyzing the Titanic dataset and implementing machine learning models to predict passenger survival. The Titanic dataset is a famous dataset widely used for data analysis and machine learning tasks. It contains information about passengers on the Titanic, including their age, sex, class, fare, and survival status. Dataset Selection Rationale: The Titanic dataset is chosen for analysis due to its popularity and the rich information it provides about the passengers. It is an excellent dataset for practicing data pre-processing, exploratory data analysis (EDA), and implementing machine learning algorithms. Instructions for Running the Code: To run the code and perform the analysis, follow these steps:
  3. Clone this repository to your local machine.
  4. Make sure you have the required Python packages installed. If not, you can install them using pip:

Assignment Overview: Titanic Dataset Analysis and Machine Learning This repository contains code for analyzing the Titanic dataset and implementing machine learning models to predict passenger survival. The Titanic dataset is a famous dataset widely used for data analysis and machine learning tasks. It contains information about passengers on the Titanic, including their age, sex, class, fare, and survival status. Dataset Selection Rationale: The Titanic dataset is chosen for analysis due to its popularity and the rich information it provides about the passengers. It is an excellent dataset for practicing data preprocessing, exploratory data analysis (EDA), and implementing machine learning algorithms. Instructions for Running the Code: To run the code and perform the analysis, follow these steps:

  1. Clone this repository to your local machine.
  2. Make sure you have the required Python packages installed. If not, you can install them using pip: bashCopy code pip install rpy2 pandas matplotlib seaborn scikit-learn
  3. Download the Titanic dataset from Kaggle (https://www.kaggle.com/c/titanic/data) and save it as "titanic.csv" in the "Python_Assignment10" folder.
  4. Open the "Assignment10.py" file in your preferred Python IDE or text editor.
  5. Execute the code in your Python environment. The code will read the dataset, perform data preprocessing, implement two machine learning models (Logistic Regression and Random Forest Classifier), and print the accuracy and confusion matrix for each model.
  6. After executing the code, you will see a heatmap visualization representing the correlation between features in the dataset. Note: Ensure that you have both R and Python environments set up correctly. The code uses the "rpy2" package to interact with R and perform some operations in R. If you encounter any issues related to R environment setup or R packages, please refer to the relevant documentation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages