📅 Duration: 3 Months
🎯 Objective: Predict daily audience counts for 827 theatres across India.
This repository includes the baseline notebook, the final high-scoring solution, and all original datasets used during the competition.
Forecast daily audience attendance using multi-source data, combining:
- BookNow platform visits & bookings
- Theatre metadata
- Calendar features (weekday, weekend, holidays)
- Historical audience behaviour
This forms a panel time-series forecasting problem with strong seasonality, structural shifts, and theatre-level variability.
The original Kaggle dataset consisted of seven CSVs:
cinePOS_theaters.csv– CinePOS theatre metadatabooknow_theaters.csv– BookNow theatre metadatamovie_theater_id_relation.csv– Mapping between CinePOS and BookNow theatrescinePOS_booking.csv– CinePOS bookingsbooknow_booking.csv– BookNow bookingsbooknow_visits.csv– Daily audience countsdate_info.csv– Calendar informationsample_submission.csv– Submission ID structure
- Cleaned and explored each dataset individually
- Merged relevant files into a unified modeling dataframe
- Performed time-based train/validation split
- Experimented with multiple ML models (GBR, LightGBM, XGBoost, Random Forest)
- Applied RandomizedSearchCV for hyperparameter tuning
- Retrained the best model on the full dataset
- Created the final predictions in the required submission format