This Python script analyzes and predicts IMDB scores for Netflix original shows and movies using a machine learning model. The code performs data preprocessing, feature engineering, model building, and evaluation. Below, you'll find an explanation of the code's main components and how to use it.
To run this code, you need the following libraries and tools installed:
- Python 3.x
- NumPy
- pandas
- Matplotlib
- Seaborn
- Scikit-Learn
- Jupyter Notebook (optional, for interactive exploration)
-
Clone or download this repository to your local machine.
-
Make sure you have Python and the required libraries installed.
-
Place your dataset "NetflixOriginals.csv" in the same directory as the script.
-
Open the Python script in your preferred environment (e.g., Jupyter Notebook, Python IDE, or command-line).
-
Run the code in sections or as a whole to perform the following tasks:
- Data Loading and Preprocessing
- Data Exploration and Visualization
- Data Preparation for Machine Learning
- Model Building and Training
- Model Evaluation
-
Review the model evaluation results, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R^2), to assess the model's performance.
-
Customize the code as needed, such as trying different machine learning algorithms or hyperparameter tuning, to improve the model's performance.
The code expects a CSV file named "NetflixOriginals.csv" with data on Netflix original shows and movies. The dataset should include features such as Title, Genre, Premiere Date, Runtime, Language, and IMDB Score.
The code uses a Random Forest Regressor to predict IMDB scores. You can explore and experiment with other regression algorithms based on your needs.
Contributions, bug fixes, and suggestions for improvements are welcome. Please create an issue or a pull request on the GitHub repository.
This code is provided under the MIT License. You are free to use, modify, and distribute it as needed. See the LICENSE file for details.