Assignment10:

Using publicly available datasets from "https://github.com/awesomedata/awesome-public-datasets/tree/master/Datasets" GitHub repositories to perform data exploration, preprocessing, implement machine learning models, and visualize the results using Python programming only. Dataset

Dataset 1: Titanic Dataset Analysis and Machine Learning This repository contains code for analyzing the Titanic dataset and implementing machine learning models to predict passenger survival. The Titanic dataset is a famous dataset widely used for data analysis and machine learning tasks. It contains information about passengers on the Titanic, including their age, sex, class, fare, and survival status. Dataset Selection Rationale: The Titanic dataset is chosen for analysis due to its popularity and the rich information it provides about the passengers. It is an excellent dataset for practicing data pre-processing, exploratory data analysis (EDA), and implementing machine learning algorithms.

Dataset 2: Scorecard.csv, a single CSV file with all the years data combined. In it, we've converted categorical variables represented by integer keys in the original data to their labels and added a Year column database.sqlite, a SQLite database containing a single Scorecard table that contains the same information as Scorecard.csv

Instructions for Running the Code: To run the code and perform the analysis, follow these steps:

Clone this repository to your local machine.
Make sure you have the required Python packages installed. If not, you can install them using pip: Assignment Overview: Titanic Dataset Analysis and Machine Learning This repository contains code for analyzing the Titanic dataset and implementing machine learning models to predict passenger survival. The Titanic dataset is a famous dataset widely used for data analysis and machine learning tasks. It contains information about passengers on the Titanic, including their age, sex, class, fare, and survival status. Dataset Selection Rationale: The Titanic dataset is chosen for analysis due to its popularity and the rich information it provides about the passengers. It is an excellent dataset for practicing data pre-processing, exploratory data analysis (EDA), and implementing machine learning algorithms. Instructions for Running the Code: To run the code and perform the analysis, follow these steps:
Clone this repository to your local machine.
Make sure you have the required Python packages installed. If not, you can install them using pip:

Assignment Overview: Titanic Dataset Analysis and Machine Learning This repository contains code for analyzing the Titanic dataset and implementing machine learning models to predict passenger survival. The Titanic dataset is a famous dataset widely used for data analysis and machine learning tasks. It contains information about passengers on the Titanic, including their age, sex, class, fare, and survival status. Dataset Selection Rationale: The Titanic dataset is chosen for analysis due to its popularity and the rich information it provides about the passengers. It is an excellent dataset for practicing data preprocessing, exploratory data analysis (EDA), and implementing machine learning algorithms. Instructions for Running the Code: To run the code and perform the analysis, follow these steps:

Clone this repository to your local machine.
Make sure you have the required Python packages installed. If not, you can install them using pip: bashCopy code pip install rpy2 pandas matplotlib seaborn scikit-learn
Download the Titanic dataset from Kaggle (https://www.kaggle.com/c/titanic/data) and save it as "titanic.csv" in the "Python_Assignment10" folder.
Open the "Assignment10.py" file in your preferred Python IDE or text editor.
Execute the code in your Python environment. The code will read the dataset, perform data preprocessing, implement two machine learning models (Logistic Regression and Random Forest Classifier), and print the accuracy and confusion matrix for each model.
After executing the code, you will see a heatmap visualization representing the correlation between features in the dataset. Note: Ensure that you have both R and Python environments set up correctly. The code uses the "rpy2" package to interact with R and perform some operations in R. If you encounter any issues related to R environment setup or R packages, please refer to the relevant documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Assignment10a.py		Assignment10a.py
AssignmentTen.py		AssignmentTen.py
README.md		README.md
ReadMe.docx		ReadMe.docx
titanic.csv		titanic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assignment10:

About

Uh oh!

Releases

Packages

Languages

Asma123-code/Assignment10

Folders and files

Latest commit

History

Repository files navigation

Assignment10:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages