diff --git a/.ipynb_checkpoints/README-checkpoint.md b/.ipynb_checkpoints/README-checkpoint.md new file mode 100644 index 0000000..fa828b0 --- /dev/null +++ b/.ipynb_checkpoints/README-checkpoint.md @@ -0,0 +1,75 @@ +# The Hotel Dilemma +- UW FinTech Boot Camp - Project 3 Submission - May 2021 + +## **Team Members** +- Monique T. +- Tony H. +- April A. +- Nick N. + +## **Motivation** + +- After a 1-year pandemic, we all need a vacation... +- Our group wanted to conduct a project using the tools learned in both Module 1 & Module 2 of the boot camp (python + machine learning). We set out looking for data sets and questions related to supply chain management and demand forecasting, which led us to a very interesting prompt related to forecasting hotel reservation cancellations. Given we all need a good vacation, we pursued the the following project... + +## **Research Questions** +- Using data and machine learning models, can hotel reservation cancellations be predicted? +- If yes to the above question, what models and methods most accurately predict hotel reservation cancellations? + +## **Objectives** +- Build a ML model that can predict whether a hotel reservation will be cancelled +- Analyze and understand data via organization, visualization, and dashboards + +## **Data Sources** +- [Hotel Booking Demand](https://www.kaggle.com/jessemostipak/hotel-booking-demand) + +## **Action Items** +- Data Cleaning & Shaping + - Data comes from/affiliated with an article: Hotel Booking Demand Datasets + - Data was cleaned by Thomas Mock and Antoine Bichat (additional cleaning and shaping conducted by our team) + - Is there further noise/info we want to weed out? Label encoding? +- Machine Learning Model + - Which model to use (try multiple models) + - Ensemble/Classifier/Decision Tree/Regression? Pick several and also apply resampling techniques if needed. We predict classifier models will be the most effective given we will be classifying a binary outcome (cancelled vs not cancelled) + - Which parameters/inputs produce the best outcomes (train/test split; different inputs for each ML model type aka reference documentation; which models are most efficient; what features in the dataset can we eliminate) + - Look at data in different ways? Is the model/data better for predicting in the summer/winter/fall/etc? Should we try forecasting for specific date ranges, like spring break, holiday breaks, etc. This will be a reach if we have time. +- Data Visualization + - Visualize by city hotel & resort hotel + - Visualize by season + - Visualize different demographics + - Visualize different ML model outcomes? + - Explore other means and methods of visualization that may give unique insight into the data set + +## **Work Assignments** +- ML Models & Workbook - April + Nick +- Data Visualizations & Dashboard - Monique + Tony +- Slide Show - Whole Team + +## **Technologies** +- Jupyter lab +- Python +- Pandas +- Numpy +- Sklearn +- Pyviz +- More to be imported and utilized in our python files + +## **Attachments** +- [Analysis Folder](Analysis/final_analysis.ipynb) - ML python files +- [Visualizations Folder](Visualizations/Dashboard.ipynb) - visualization python files +- [Data Folder](Data/hotel_bookings.csv)- original data set +- Final presentation deck + +## **Outcomes** +- Using machine learning models, our Team was able to predict hotel cancellations with confidence (particularly using the SMOTEENN Resampling + BalancedRandomForestClassifier model, which rendered a ~90% accuracy score). Vizualizations of our final accuracy score outcomes are below. Please see our uploaded slide show for more information on the outcomes. + +![](Images/accuracy_score_grid.PNG) + +![](Images/accuracy_score_viz.PNG) + +## **Presentation Assignments** +- Intro & Hypothesis: Tony +- Visualizations & Intro to Data: Monique +- Data Preparation & Model Selection: Nick +- Model Outcomes & Takeaways: April + diff --git a/Analysis/final_analysis.ipynb b/Analysis/final_analysis.ipynb index d5d7ee0..c443396 100644 --- a/Analysis/final_analysis.ipynb +++ b/Analysis/final_analysis.ipynb @@ -9,7 +9,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 44, "metadata": {}, "outputs": [], "source": [ @@ -30,7 +30,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 45, "metadata": {}, "outputs": [ { @@ -249,7 +249,7 @@ "[5 rows x 32 columns]" ] }, - "execution_count": 2, + "execution_count": 45, "metadata": {}, "output_type": "execute_result" } @@ -309,7 +309,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 46, "metadata": {}, "outputs": [ { @@ -318,7 +318,7 @@ "(119390, 32)" ] }, - "execution_count": 3, + "execution_count": 46, "metadata": {}, "output_type": "execute_result" } @@ -330,7 +330,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 47, "metadata": {}, "outputs": [ { @@ -371,7 +371,7 @@ "dtype: int64" ] }, - "execution_count": 4, + "execution_count": 47, "metadata": {}, "output_type": "execute_result" } @@ -383,7 +383,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 48, "metadata": {}, "outputs": [], "source": [ @@ -400,7 +400,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 49, "metadata": {}, "outputs": [ { @@ -420,7 +420,7 @@ "Length: 119390, dtype: bool" ] }, - "execution_count": 6, + "execution_count": 49, "metadata": {}, "output_type": "execute_result" } @@ -432,7 +432,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 50, "metadata": {}, "outputs": [ { @@ -472,7 +472,7 @@ "dtype: int64" ] }, - "execution_count": 7, + "execution_count": 50, "metadata": {}, "output_type": "execute_result" } @@ -491,7 +491,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 51, "metadata": {}, "outputs": [ { @@ -789,7 +789,7 @@ "max 8.000000 5.000000 " ] }, - "execution_count": 8, + "execution_count": 51, "metadata": {}, "output_type": "execute_result" } @@ -801,7 +801,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 52, "metadata": {}, "outputs": [ { @@ -826,7 +826,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 53, "metadata": {}, "outputs": [ { @@ -1045,7 +1045,7 @@ "[5 rows x 29 columns]" ] }, - "execution_count": 10, + "execution_count": 53, "metadata": {}, "output_type": "execute_result" } @@ -1060,7 +1060,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 54, "metadata": {}, "outputs": [ { @@ -1113,7 +1113,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 55, "metadata": {}, "outputs": [ { @@ -1207,7 +1207,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 56, "metadata": {}, "outputs": [ { @@ -1426,7 +1426,7 @@ "[5 rows x 29 columns]" ] }, - "execution_count": 13, + "execution_count": 56, "metadata": {}, "output_type": "execute_result" } @@ -1438,7 +1438,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 57, "metadata": {}, "outputs": [], "source": [ @@ -1456,7 +1456,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 58, "metadata": {}, "outputs": [], "source": [ @@ -1467,7 +1467,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 59, "metadata": {}, "outputs": [ { @@ -1478,7 +1478,7 @@ "Name: is_canceled, dtype: int64" ] }, - "execution_count": 16, + "execution_count": 59, "metadata": {}, "output_type": "execute_result" } @@ -1490,7 +1490,39 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEHCAYAAACEKcAKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWYUlEQVR4nO3df5Bd5X3f8ffHko2xYzA/FkoksGjRuAVqQ9BQNZ66rpUGZZJYNAZ33TooiWbkYahrz7TNQDvTtPWoNflRatzAjBJsJJIaVMUOaqaYUhHKtFYlLzYGBNawNQ4oIkgGArgeSMV8+8d9Nlyt7q5We3R32ez7NXPmnPu9z/PsczR4Pj4/b6oKSZJm6y3zPQFJ0sJmkEiSOjFIJEmdGCSSpE4MEklSJ0vnewJz7cwzz6wVK1bM9zQkaUF56KGHvl9VI4O+W3RBsmLFCsbGxuZ7GpK0oCT5o6m+89SWJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKmTRfdk+4lw2T/bOt9T0JvQQ792zXxPQZoXHpFIkjoxSCRJnRgkkqRODBJJUicGiSSpE4NEktSJQSJJ6sQgkSR1YpBIkjoxSCRJnQwtSJK8N8nDfcvLST6T5PQk9yV5sq1P6+tzQ5LxJPuSXNFXvyzJo+27m5Ok1U9Kcler706yYlj7I0kabGhBUlX7quqSqroEuAz4IfBV4HpgZ1WtBHa2zyS5EBgFLgLWArckWdKGuxXYCKxsy9pW3wC8WFUXADcBNw5rfyRJg83Vqa01wP+pqj8C1gFbWn0LcGXbXgfcWVWvVdVTwDhweZJzgFOqaldVFbB1Up+JsbYDayaOViRJc2OugmQU+HLbPruqngVo67NafRnwTF+f/a22rG1Prh/Rp6oOAy8BZ0z+40k2JhlLMnbo0KETskOSpJ6hB0mStwEfAf7zsZoOqNU09en6HFmo2lxVq6pq1cjIyDGmIUk6HnNxRPJTwDer6rn2+bl2uoq2Ptjq+4Fz+/otBw60+vIB9SP6JFkKnAq8MIR9kCRNYS6C5OO8cVoLYAewvm2vB+7uq4+2O7HOp3dRfU87/fVKktXt+sc1k/pMjHUVcH+7jiJJmiND/YXEJO8A/i7wyb7y54BtSTYATwNXA1TV3iTbgMeBw8B1VfV663MtcDtwMnBPWwBuA+5IMk7vSGR0mPsjSTraUIOkqn7IpIvfVfU8vbu4BrXfBGwaUB8DLh5Qf5UWRJKk+eGT7ZKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInQw2SJO9Osj3Jd5I8keRvJjk9yX1Jnmzr0/ra35BkPMm+JFf01S9L8mj77uYkafWTktzV6ruTrBjm/kiSjjbsI5LPA1+rqr8KvB94Arge2FlVK4Gd7TNJLgRGgYuAtcAtSZa0cW4FNgIr27K21TcAL1bVBcBNwI1D3h9J0iRDC5IkpwAfBG4DqKo/q6o/BdYBW1qzLcCVbXsdcGdVvVZVTwHjwOVJzgFOqapdVVXA1kl9JsbaDqyZOFqRJM2NYR6R/GXgEPClJN9K8ttJ3gmcXVXPArT1Wa39MuCZvv77W21Z255cP6JPVR0GXgLOGM7uSJIGGWaQLAV+DLi1qi4F/i/tNNYUBh1J1DT16focOXCyMclYkrFDhw5NP2tJ0nEZZpDsB/ZX1e72eTu9YHmuna6irQ/2tT+3r/9y4ECrLx9QP6JPkqXAqcALkydSVZuralVVrRoZGTkBuyZJmjC0IKmqPwGeSfLeVloDPA7sANa32nrg7ra9Axhtd2KdT++i+p52+uuVJKvb9Y9rJvWZGOsq4P52HUWSNEeWDnn8TwG/m+RtwHeBX6QXXtuSbACeBq4GqKq9SbbRC5vDwHVV9Xob51rgduBk4J62QO9C/h1JxukdiYwOeX8kSZMMNUiq6mFg1YCv1kzRfhOwaUB9DLh4QP1VWhBJkuaHT7ZLkjoxSCRJnRgkkqRODBJJUicGiSSpE4NEktSJQSJJ6sQgkSR1YpBIkjoxSCRJnRgkkqRODBJJUicGiSSpE4NEktSJQSJJ6sQgkSR1YpBIkjoxSCRJnRgkkqROhhokSb6X5NEkDycZa7XTk9yX5Mm2Pq2v/Q1JxpPsS3JFX/2yNs54kpuTpNVPSnJXq+9OsmKY+yNJOtpcHJH8naq6pKpWtc/XAzuraiWws30myYXAKHARsBa4JcmS1udWYCOwsi1rW30D8GJVXQDcBNw4B/sjSeozH6e21gFb2vYW4Mq++p1V9VpVPQWMA5cnOQc4pap2VVUBWyf1mRhrO7Bm4mhFkjQ3hh0kBfy3JA8l2dhqZ1fVswBtfVarLwOe6eu7v9WWte3J9SP6VNVh4CXgjMmTSLIxyViSsUOHDp2QHZMk9Swd8vgfqKoDSc4C7kvynWnaDjqSqGnq0/U5slC1GdgMsGrVqqO+lyTN3lCPSKrqQFsfBL4KXA48105X0dYHW/P9wLl93ZcDB1p9+YD6EX2SLAVOBV4Yxr5IkgYbWpAkeWeSd01sAz8JPAbsANa3ZuuBu9v2DmC03Yl1Pr2L6nva6a9Xkqxu1z+umdRnYqyrgPvbdRRJ0hwZ5qmts4GvtmvfS4H/VFVfS/INYFuSDcDTwNUAVbU3yTbgceAwcF1Vvd7Guha4HTgZuKctALcBdyQZp3ckMjrE/ZEkDTC0IKmq7wLvH1B/HlgzRZ9NwKYB9THg4gH1V2lBJEmaHz7ZLknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHUy7NfIS5pDT/+bvz7fU9Cb0Hn/8tGhju8RiSSpkxkFSZKdM6lJkhafaU9tJXk78A7gzCSn8cYvEp4C/OiQ5yZJWgCOdY3kk8Bn6IXGQ7wRJC8Dvzm8aUmSFoppg6SqPg98PsmnquoLczQnSdICMqO7tqrqC0l+HFjR36eqtg5pXpKkBWJGQZLkDuCvAA8DEz9/W4BBIkmL3EyfI1kFXFhVdbx/IMkSYAz446r6mSSnA3fRO7r5HvCxqnqxtb0B2EAvrP5xVd3b6pfxxm+2/1fg01VVSU6iF2aXAc8Df7+qvne8c5Qkzd5MnyN5DPhLs/wbnwae6Pt8PbCzqlYCO9tnklwIjAIXAWuBW1oIAdwKbARWtmVtq28AXqyqC4CbgBtnOUdJ0izNNEjOBB5Pcm+SHRPLsTolWQ78NPDbfeV1wJa2vQW4sq9+Z1W9VlVPAePA5UnOAU6pql3tiGjrpD4TY20H1iSZuLNMkjQHZnpq61/Ncvz/APwy8K6+2tlV9SxAVT2b5KxWXwb87752+1vt/7XtyfWJPs+0sQ4neQk4A/j+LOcrSTpOM71r638c78BJfgY4WFUPJfnQTLoM+tPT1KfrM3kuG+mdGuO8886bwVQkSTM101ekvJLk5ba8muT1JC8fo9sHgI8k+R5wJ/DhJL8DPNdOV9HWB1v7/cC5ff2XAwdaffmA+hF9kiwFTgVemDyRqtpcVauqatXIyMhMdlmSNEMzCpKqeldVndKWtwMfBf7jMfrcUFXLq2oFvYvo91fVJ4AdwPrWbD1wd9veAYwmOSnJ+fQuqu9pp8FeSbK6Xf+4ZlKfibGuan/juO8skyTN3qxeI19Vv5/k+ln+zc8B25JsAJ4Grm5j7k2yDXgcOAxcV1UTz6xcyxu3/97TFoDbgDuSjNM7Ehmd5ZwkSbM00wcSf67v41voPVcy4//nX1UPAA+07eeBNVO02wRsGlAfAy4eUH+VFkSSpPkx0yOSn+3bPkzvQcJ1J3w2kqQFZ6Z3bf3isCciSVqYZnrX1vIkX01yMMlzSX6vPWwoSVrkZvpk+5fo3SH1o/QeAvwvrSZJWuRmGiQjVfWlqjrcltsBH8iQJM04SL6f5BNJlrTlE/TetitJWuRmGiS/BHwM+BPgWXoP/3kBXpI049t/Pwus7/vdkNOBX6cXMJKkRWymRyTvmwgRgKp6Abh0OFOSJC0kMw2StyQ5beJDOyKZ1etVJEl/scw0DH4D+HqS7fRejfIxBrzKRJK0+Mz0yfatScaAD9P7DZCfq6rHhzozSdKCMOPTUy04DA9J0hFmeo1EkqSBDBJJUicGiSSpE4NEktSJQSJJ6sQgkSR1MrQgSfL2JHuSfDvJ3iT/utVPT3Jfkifbuv+J+RuSjCfZl+SKvvplSR5t392cJK1+UpK7Wn13khXD2h9J0mDDPCJ5DfhwVb0fuARYm2Q1cD2ws6pWAjvbZ5JcCIwCFwFrgVuSLGlj3QpsBFa2ZW2rbwBerKoLgJuAG4e4P5KkAYYWJNXzg/bxrW0pYB2wpdW3AFe27XXAnVX1WlU9BYwDlyc5BzilqnZVVQFbJ/WZGGs7sGbiaEWSNDeGeo2k/QjWw8BB4L6q2g2cXVXPArT1Wa35MuCZvu77W21Z255cP6JPVR0GXgLOGDCPjUnGkowdOnToBO2dJAmGHCRV9XpVXQIsp3d0cfE0zQcdSdQ09en6TJ7H5qpaVVWrRkb8hWBJOpHm5K6tqvpT4AF61zaea6eraOuDrdl+4Ny+bsuBA62+fED9iD5JlgKnAi8MYx8kSYMN866tkSTvbtsnAz8BfAfYAaxvzdYDd7ftHcBouxPrfHoX1fe001+vJFndrn9cM6nPxFhXAfe36yiSpDkyzB+nOgfY0u68eguwrar+IMkuYFuSDcDTwNUAVbU3yTZ6bxg+DFxXVa+3sa4FbgdOBu5pC8BtwB1JxukdiYwOcX8kSQMMLUiq6hEG/BxvVT0PrJmizyYG/GBWVY0BR11fqapXaUEkSZofPtkuSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6MUgkSZ0YJJKkTgwSSVInBokkqRODRJLUiUEiSerEIJEkdWKQSJI6GVqQJDk3yR8meSLJ3iSfbvXTk9yX5Mm2Pq2vzw1JxpPsS3JFX/2yJI+2725OklY/Kcldrb47yYph7Y8kabBhHpEcBv5JVf01YDVwXZILgeuBnVW1EtjZPtO+GwUuAtYCtyRZ0sa6FdgIrGzL2lbfALxYVRcANwE3DnF/JEkDDC1IqurZqvpm234FeAJYBqwDtrRmW4Ar2/Y64M6qeq2qngLGgcuTnAOcUlW7qqqArZP6TIy1HVgzcbQiSZobc3KNpJ1yuhTYDZxdVc9CL2yAs1qzZcAzfd32t9qytj25fkSfqjoMvAScMeDvb0wylmTs0KFDJ2ivJEkwB0GS5EeA3wM+U1UvT9d0QK2mqU/X58hC1eaqWlVVq0ZGRo41ZUnScRhqkCR5K70Q+d2q+korP9dOV9HWB1t9P3BuX/flwIFWXz6gfkSfJEuBU4EXTvyeSJKmMsy7tgLcBjxRVf++76sdwPq2vR64u68+2u7EOp/eRfU97fTXK0lWtzGvmdRnYqyrgPvbdRRJ0hxZOsSxPwD8PPBokodb7Z8DnwO2JdkAPA1cDVBVe5NsAx6nd8fXdVX1eut3LXA7cDJwT1ugF1R3JBmndyQyOsT9kSQNMLQgqar/yeBrGABrpuizCdg0oD4GXDyg/iotiCRJ88Mn2yVJnRgkkqRODBJJUicGiSSpE4NEktSJQSJJ6sQgkSR1YpBIkjoxSCRJnRgkkqRODBJJUicGiSSpE4NEktSJQSJJ6sQgkSR1YpBIkjoxSCRJnRgkkqROhhYkSb6Y5GCSx/pqpye5L8mTbX1a33c3JBlPsi/JFX31y5I82r67OUla/aQkd7X67iQrhrUvkqSpDfOI5HZg7aTa9cDOqloJ7GyfSXIhMApc1PrckmRJ63MrsBFY2ZaJMTcAL1bVBcBNwI1D2xNJ0pSGFiRV9SDwwqTyOmBL294CXNlXv7OqXquqp4Bx4PIk5wCnVNWuqipg66Q+E2NtB9ZMHK1IkubOXF8jObuqngVo67NafRnwTF+7/a22rG1Prh/Rp6oOAy8BZwz6o0k2JhlLMnbo0KETtCuSJHjzXGwfdCRR09Sn63N0sWpzVa2qqlUjIyOznKIkaZC5DpLn2ukq2vpgq+8Hzu1rtxw40OrLB9SP6JNkKXAqR59KkyQN2VwHyQ5gfdteD9zdVx9td2KdT++i+p52+uuVJKvb9Y9rJvWZGOsq4P52HUWSNIeWDmvgJF8GPgScmWQ/8CvA54BtSTYATwNXA1TV3iTbgMeBw8B1VfV6G+paeneAnQzc0xaA24A7kozTOxIZHda+SJKmNrQgqaqPT/HVminabwI2DaiPARcPqL9KCyJJ0vx5s1xslyQtUAaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEklSJwaJJKkTg0SS1MmCD5Ika5PsSzKe5Pr5no8kLTYLOkiSLAF+E/gp4ELg40kunN9ZSdLisqCDBLgcGK+q71bVnwF3AuvmeU6StKgsne8JdLQMeKbv837gb0xulGQjsLF9/EGSfXMwt8XiTOD78z2JN4P8+vr5noKO5H+bE34lJ2KU90z1xUIPkkH/OnVUoWozsHn401l8koxV1ar5noc0mf9tzp2FfmprP3Bu3+flwIF5moskLUoLPUi+AaxMcn6StwGjwI55npMkLSoL+tRWVR1O8o+Ae4ElwBerau88T2ux8ZSh3qz8b3OOpOqoSwqSJM3YQj+1JUmaZwaJJKkTg0Sz4qtp9GaV5ItJDiZ5bL7nslgYJDpuvppGb3K3A2vnexKLiUGi2fDVNHrTqqoHgRfmex6LiUGi2Rj0appl8zQXSfPMINFszOjVNJIWB4NEs+GraST9OYNEs+GraST9OYNEx62qDgMTr6Z5Atjmq2n0ZpHky8Au4L1J9ifZMN9z+ovOV6RIkjrxiESS1IlBIknqxCCRJHVikEiSOjFIJEmdGCSSpE4MEmkKSb4+33OYiSQfSvIHx9nngSSrhjUnLS4GiTSFqvrx+Z6DtBAYJNIUkvygrc9J8mCSh5M8luRvTdNnbZJvJvl2kp2tdnmSryf5Vlu/t9V/IclXknwtyZNJfvUY47yz/WjTN9pYR726f6o2SU5OcmeSR5LcBZx8Qv+xtKgtne8JSAvAPwDurapN7Ue93jGoUZIR4LeAD1bVU0lOb199p9UOJ/kJ4N8CH23fXQJcCrwG7EvyBeDVKcb5F8D9VfVLSd4N7Eny3ydNY6o2nwR+WFXvS/I+4Jud/kWkPgaJdGzfAL6Y5K3A71fVw1O0Ww08WFVPAVTVxI8rnQpsSbKS3uv239rXZ2dVvQSQ5HHgPcBpU4zzk8BHkvzT9vntwHmT5jBVmw8CN7fxHknyyHHsvzQtg0Q6hqp6MMkHgZ8G7kjya1W1dUDTMPh3WT4L/GFV/b0kK4AH+r57rW/7dXr/m5xqnAAfrap9RxSTs2fQhinGlDrzGol0DEneAxysqt8CbgN+bIqmu4C/neT81m/ilNSpwB+37V+YwZ+capx7gU+lpUKSSwf0narNg8A/bLWLgffNYB7SjBgk0rF9CHg4ybfoXdv4/KBGVXUI2Ah8Jcm3gbvaV78K/Lsk/wtYcqw/Ns04n6V3WuyRJI+1z5NN1eZW4EfaKa1fBvYcax7STPkaeUlSJx6RSJI68WK7NAtJdgMnTSr/fFU9Oh/zkeaTp7YkSZ14akuS1IlBIknqxCCRJHVikEiSOvn/7s+aj965v/oAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "sns.countplot(data = hotel, x = 'is_canceled')" + ] + }, + { + "cell_type": "code", + "execution_count": 62, "metadata": {}, "outputs": [], "source": [ @@ -1519,7 +1551,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 63, "metadata": {}, "outputs": [ { @@ -1528,7 +1560,7 @@ "BalancedRandomForestClassifier(random_state=1)" ] }, - "execution_count": 18, + "execution_count": 63, "metadata": {}, "output_type": "execute_result" } @@ -1542,7 +1574,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 64, "metadata": {}, "outputs": [], "source": [ @@ -1552,7 +1584,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 65, "metadata": {}, "outputs": [ { @@ -1591,7 +1623,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 66, "metadata": {}, "outputs": [ { @@ -1627,7 +1659,7 @@ " (0.0008159363355435853, 'babies')]" ] }, - "execution_count": 21, + "execution_count": 66, "metadata": {}, "output_type": "execute_result" } @@ -1640,7 +1672,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 67, "metadata": {}, "outputs": [ { @@ -1649,7 +1681,7 @@ "" ] }, - "execution_count": 22, + "execution_count": 67, "metadata": {}, "output_type": "execute_result" }, @@ -1685,7 +1717,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 68, "metadata": {}, "outputs": [ { @@ -1694,7 +1726,7 @@ "LogisticRegression(random_state=1)" ] }, - "execution_count": 23, + "execution_count": 68, "metadata": {}, "output_type": "execute_result" } @@ -1708,7 +1740,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 69, "metadata": {}, "outputs": [], "source": [ @@ -1718,7 +1750,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 70, "metadata": {}, "outputs": [ { @@ -1727,7 +1759,7 @@ "0.7966697936210131" ] }, - "execution_count": 25, + "execution_count": 70, "metadata": {}, "output_type": "execute_result" } @@ -1740,7 +1772,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 71, "metadata": {}, "outputs": [ { @@ -1772,7 +1804,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 72, "metadata": {}, "outputs": [ { @@ -1781,7 +1813,7 @@ "EasyEnsembleClassifier(n_estimators=100, random_state=1)" ] }, - "execution_count": 27, + "execution_count": 72, "metadata": {}, "output_type": "execute_result" } @@ -1795,7 +1827,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 73, "metadata": {}, "outputs": [], "source": [ @@ -1805,7 +1837,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 74, "metadata": {}, "outputs": [ { @@ -1814,7 +1846,7 @@ "0.824316620485913" ] }, - "execution_count": 29, + "execution_count": 74, "metadata": {}, "output_type": "execute_result" } @@ -1828,7 +1860,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 75, "metadata": {}, "outputs": [ { @@ -1859,7 +1891,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 76, "metadata": {}, "outputs": [ { @@ -1868,7 +1900,7 @@ "Counter({0: 56313, 1: 56313})" ] }, - "execution_count": 31, + "execution_count": 76, "metadata": {}, "output_type": "execute_result" } @@ -1885,7 +1917,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 77, "metadata": {}, "outputs": [ { @@ -1894,7 +1926,7 @@ "BalancedRandomForestClassifier(random_state=1)" ] }, - "execution_count": 32, + "execution_count": 77, "metadata": {}, "output_type": "execute_result" } @@ -1906,7 +1938,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 78, "metadata": {}, "outputs": [], "source": [ @@ -1916,7 +1948,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 79, "metadata": {}, "outputs": [ { @@ -1925,7 +1957,7 @@ "0.8841154223841445" ] }, - "execution_count": 34, + "execution_count": 79, "metadata": {}, "output_type": "execute_result" } @@ -1938,7 +1970,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 80, "metadata": {}, "outputs": [ { @@ -1969,7 +2001,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 81, "metadata": {}, "outputs": [ { @@ -1978,7 +2010,7 @@ "Counter({0: 46617, 1: 56485})" ] }, - "execution_count": 36, + "execution_count": 81, "metadata": {}, "output_type": "execute_result" } @@ -1994,7 +2026,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 82, "metadata": {}, "outputs": [ { @@ -2003,7 +2035,7 @@ "BalancedRandomForestClassifier(random_state=1)" ] }, - "execution_count": 37, + "execution_count": 82, "metadata": {}, "output_type": "execute_result" } @@ -2015,7 +2047,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 83, "metadata": {}, "outputs": [], "source": [ @@ -2025,7 +2057,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 84, "metadata": {}, "outputs": [ { @@ -2034,7 +2066,7 @@ "0.9027391310000517" ] }, - "execution_count": 39, + "execution_count": 84, "metadata": {}, "output_type": "execute_result" } @@ -2047,7 +2079,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 85, "metadata": {}, "outputs": [ { @@ -2078,7 +2110,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 86, "metadata": {}, "outputs": [ { @@ -2104,7 +2136,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 87, "metadata": {}, "outputs": [ { @@ -2165,7 +2197,7 @@ "SMOTEENN Oversampling 0.90" ] }, - "execution_count": 46, + "execution_count": 87, "metadata": {}, "output_type": "execute_result" } @@ -2182,7 +2214,7 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 88, "metadata": {}, "outputs": [ { diff --git a/PresentationDeck.pdf b/PresentationDeck.pdf new file mode 100644 index 0000000..d7044f3 Binary files /dev/null and b/PresentationDeck.pdf differ diff --git a/README.md b/README.md index eb9ffae..fa828b0 100644 --- a/README.md +++ b/README.md @@ -55,10 +55,10 @@ - More to be imported and utilized in our python files ## **Attachments** -- Analysis Folder - Please find our ML python files -- Visualizations Folder - Please find our visualization python files -- Data Folder - Please find our original data set -- Main Branch - Please find a pdf of our final presentation +- [Analysis Folder](Analysis/final_analysis.ipynb) - ML python files +- [Visualizations Folder](Visualizations/Dashboard.ipynb) - visualization python files +- [Data Folder](Data/hotel_bookings.csv)- original data set +- Final presentation deck ## **Outcomes** - Using machine learning models, our Team was able to predict hotel cancellations with confidence (particularly using the SMOTEENN Resampling + BalancedRandomForestClassifier model, which rendered a ~90% accuracy score). Vizualizations of our final accuracy score outcomes are below. Please see our uploaded slide show for more information on the outcomes. @@ -71,5 +71,5 @@ - Intro & Hypothesis: Tony - Visualizations & Intro to Data: Monique - Data Preparation & Model Selection: Nick -- Model Outcomes, Conclusions, & Takeaways: April +- Model Outcomes & Takeaways: April