Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Interactive Data Science HW2


# Assignment Goals

1. Inspecting trends of travellers from different countries during different seasons, years and times of the year.
2. Comparing and analyzing the length of stay at any hotel based on whether the travellers have children or not.
3. Finding trends of cancellations with respect to number of days before the booking was made, and also what type of customers are more prone to cancelling in what type of stays.

# Dataset Description

Our dataset is the **Hotel booking demand** dataset [[Dataset Link](https://www.kaggle.com/jessemostipak/hotel-booking-demand)]. This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things.

The data is originally from the article [Hotel Booking Demand Datasets](https://www.sciencedirect.com/science/article/pii/S2352340918315191), written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

# Questions and Explanations

**What is the trend for travellers from different countries through the different years or the various months of the year?**

**Hypothesis**

We aim to analyze if there is a seasonal trend to travellers from different countries. For example, say a country like Canada, we have a hypothesis that there would be more travellers in the summer months of November, December, January and February when it gets really cold in Russia, since people are looking for respite from cold weather. Also, we are looking to find if there are particular years when there were more travellers from a country. We see that the hypothesis is not quite true.

**Explanation**

There are other more powerful factors governing the most popular months for travel. We find that travellers from most countries like the United States, Russia, Brazil, France etc. prefer to travel in the summer months of April, May, June, and July. A possible explanation for this is that children have school vacations during that time and so families are able to travel together. Another noticeable trend is that contrary to our hypothesis, December and January, which are generally the coldest months have least travellers. A possible explanation for this is that December and January are christmas and new year months. Hence, there are very few business travellers and people prefer to spend time at home with their families. Also, we find that there is much less data for the year 2015 compared to the years 2016 and 2017. For the years 2016 and 2017, the number of travellers are more or less similar across countries.

**How does the length of stay vary depending on the number of adults and children who are travelling together?**

**Hypothesis**

Our aim is to analyze if there is a correlation between length of stay and number of adults and children travelling together. Typically we expect that the length of stay will be longer when a person is travelling with family ie. with more adults and children compared to when he/she is travelling alone.

**Explanation**

From the data we find that when only 1 adult is traveling, the stay nights are less than 3 for over 60% of the observations. As the number increases to 3 adults, less than 30% of the observations correspond to less than 3 night stays. These results are for when there are no children. It indicates that groups tend to travel for longer length compared to single adults. With 1 child and 1 adult, more than 70% of the stays are 3 days or longer. With 3 children and 1 adult, more than 75% of the stays are 5 days or longer. Similar trend is seen with 2 adults and 1 child, and 2 adults and 2 children. With 1 child and 2 adults, around 67.5% of stays are 3 nights or more, while with 2 adults and 2 children, around 67.5% of stays are 3 nights or more.

**What is the trend of cancellations with respect to the number of days before which a reservation is made?**

**Hypothesis**

Here, our aim is to analyze if there is a correlation between the booking gap (ie. the number of days between reservation and arrival) and the cancellations. Typically we expect that the earlier someone has booked a stay, the more likely they are to cancel it due to the emergence of unknown circumstances. In this graph, we also stack the data based on various other variables - customer type, hotel. These variables give us more insight into the cancellations - for example, customer types that are 'groups' seem to be less likely to cancel a reservation if it is nearer to their arrival date than if it is farther.

**Explanation**

From this dataset, we find out that maximum cancellations are done if the time frame between reservation and arrival date is shorter. \ Typically, we would expect people to cancel if their arrival dates are farther away. But that is not what we observe from this data, thus proving our hypothesis wrong. From the graph, we can also observe that transient customers are more prone to cancel their reservations. We expect this, because transient customers are not a part of any group or contract and so they are more likely to change their mind and cancel their reservations. Next, we observe that city hotels had more cancellations compared to resorts. This is again typical because city hotels have more business travellers, whose plans may change. Whereas, resort bookings are usually planned vacations, hence less likely to be cancelled.

# Interactions and design decisions

1. The map contains the number of travellers from each country. We can get the country name and the number of travellers by hovering over the country location. Also the density of the color indicates the percentage of travellers from the country compared to the total number of travellers overall. This design gives a visual representation of the relative number of travellers from the different countries without knowing the exact numerical values. Also, simply by choosing different years, and months, we can form a visual image about the relative number of travellers from various countries. Choosing a world map helps in visualizing the geographical data better compared to other forms of visualization.
2. Length of stay is an important data analysis feature for hotel stays. We plot the top-10 values of the stay only since the data is very sparse for larger values of stay. Also, we can change the number of adults and children travelling with the help of the slider. We selected the range for the number of adults and children keeping in mind the values for which sufficient data was available. Also, choosing the barplot helps in relative comparison between the different night stay durations.
3. We have used a stacked histogram to find the trend between number of cancellations and type of customer/hotel
Why? This design decision was made because a stacked histogram is the best way to represent the number of cancellations made in particular range of days, and we can use stacks to show the influence of another variable over it - that is, how much part of the bar belongs to a particular hotel or a particular customer type. We have also added a zoom in-out feature to change the granularity or the range of the number of days.


# Development Process

We worked in a team of 2 for this project. The first step of the assignment was to determine the appropriate dataset which could provide us with interesting and innovative questions to answer. We looked up the various Kaggle datasets and came up with the present data since it provided geographical data, along with plenty of other useful columns regarding hotel bookings. Several other hotel booking datasets were available as well but this was the most comprehensive in terms of the features and observations, hence we selected it. We began with first exploring the dataset, examining the various features available in the data, and performing exploratory data analysis. Then, we began figuring out the pertinent questions that could be answered from the data.

Once the questions were determined, we tried to see what features and feature correlations could be useful in answering those questions. The next step was to figure out the best visualizations that could answer the proposed questions. In total, the assignment took us a total of 25 hours. Much of the time was spent on finding the appropriate dataset, preprocessing the data to get it in the desired format for plotting the geographical map.

# Components of the Assignment

1. hotel\_bookings.csv : Original Dataset from Kaggle
2. hotel\_bookings\_mod1.csv : Preprocessed data
3. country\_codes.csv : Used for obtaining the country IDs for plotting
4. streamlit\_app.py : Streamlit visualization code
5. Report.pdf : Analysis of the various components of the assignment
Binary file added Report.pdf
Binary file not shown.
250 changes: 250 additions & 0 deletions country_codes.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
English short name,Alpha-2 code,Alpha-3 code,Numeric
Afghanistan,AF,AFG,4
Albania,AL,ALB,8
Algeria,DZ,DZA,12
American Samoa,AS,ASM,16
Andorra,AD,AND,20
Angola,AO,AGO,24
Anguilla,AI,AIA,660
Antarctica,AQ,ATA,10
Antigua and Barbuda,AG,ATG,28
Argentina,AR,ARG,32
Armenia,AM,ARM,51
Aruba,AW,ABW,533
Australia,AU,AUS,36
Austria,AT,AUT,40
Azerbaijan,AZ,AZE,31
Bahamas (the),BS,BHS,44
Bahrain,BH,BHR,48
Bangladesh,BD,BGD,50
Barbados,BB,BRB,52
Belarus,BY,BLR,112
Belgium,BE,BEL,56
Belize,BZ,BLZ,84
Benin,BJ,BEN,204
Bermuda,BM,BMU,60
Bhutan,BT,BTN,64
Bolivia (Plurinational State of),BO,BOL,68
"Bonaire, Sint Eustatius and Saba",BQ,BES,535
Bosnia and Herzegovina,BA,BIH,70
Botswana,BW,BWA,72
Bouvet Island,BV,BVT,74
Brazil,BR,BRA,76
British Indian Ocean Territory (the),IO,IOT,86
Brunei Darussalam,BN,BRN,96
Bulgaria,BG,BGR,100
Burkina Faso,BF,BFA,854
Burundi,BI,BDI,108
Cabo Verde,CV,CPV,132
Cambodia,KH,KHM,116
Cameroon,CM,CMR,120
Canada,CA,CAN,124
Cayman Islands (the),KY,CYM,136
Central African Republic (the),CF,CAF,140
Chad,TD,TCD,148
Chile,CL,CHL,152
China,CN,CHN,156
Christmas Island,CX,CXR,162
Cocos (Keeling) Islands (the),CC,CCK,166
Colombia,CO,COL,170
Comoros (the),KM,COM,174
Congo (the Democratic Republic of the),CD,COD,180
Congo (the),CG,COG,178
Cook Islands (the),CK,COK,184
Costa Rica,CR,CRI,188
Croatia,HR,HRV,191
Cuba,CU,CUB,192
Curaçao,CW,CUW,531
Cyprus,CY,CYP,196
Czechia,CZ,CZE,203
Côte d'Ivoire,CI,CIV,384
Denmark,DK,DNK,208
Djibouti,DJ,DJI,262
Dominica,DM,DMA,212
Dominican Republic (the),DO,DOM,214
Ecuador,EC,ECU,218
Egypt,EG,EGY,818
El Salvador,SV,SLV,222
Equatorial Guinea,GQ,GNQ,226
Eritrea,ER,ERI,232
Estonia,EE,EST,233
Eswatini,SZ,SWZ,748
Ethiopia,ET,ETH,231
Falkland Islands (the) [Malvinas],FK,FLK,238
Faroe Islands (the),FO,FRO,234
Fiji,FJ,FJI,242
Finland,FI,FIN,246
France,FR,FRA,250
French Guiana,GF,GUF,254
French Polynesia,PF,PYF,258
French Southern Territories (the),TF,ATF,260
Gabon,GA,GAB,266
Gambia (the),GM,GMB,270
Georgia,GE,GEO,268
Germany,DE,DEU,276
Ghana,GH,GHA,288
Gibraltar,GI,GIB,292
Greece,GR,GRC,300
Greenland,GL,GRL,304
Grenada,GD,GRD,308
Guadeloupe,GP,GLP,312
Guam,GU,GUM,316
Guatemala,GT,GTM,320
Guernsey,GG,GGY,831
Guinea,GN,GIN,324
Guinea-Bissau,GW,GNB,624
Guyana,GY,GUY,328
Haiti,HT,HTI,332
Heard Island and McDonald Islands,HM,HMD,334
Holy See (the),VA,VAT,336
Honduras,HN,HND,340
Hong Kong,HK,HKG,344
Hungary,HU,HUN,348
Iceland,IS,ISL,352
India,IN,IND,356
Indonesia,ID,IDN,360
Iran (Islamic Republic of),IR,IRN,364
Iraq,IQ,IRQ,368
Ireland,IE,IRL,372
Isle of Man,IM,IMN,833
Israel,IL,ISR,376
Italy,IT,ITA,380
Jamaica,JM,JAM,388
Japan,JP,JPN,392
Jersey,JE,JEY,832
Jordan,JO,JOR,400
Kazakhstan,KZ,KAZ,398
Kenya,KE,KEN,404
Kiribati,KI,KIR,296
Korea (the Democratic People's Republic of),KP,PRK,408
Korea (the Republic of),KR,KOR,410
Kuwait,KW,KWT,414
Kyrgyzstan,KG,KGZ,417
Lao People's Democratic Republic (the),LA,LAO,418
Latvia,LV,LVA,428
Lebanon,LB,LBN,422
Lesotho,LS,LSO,426
Liberia,LR,LBR,430
Libya,LY,LBY,434
Liechtenstein,LI,LIE,438
Lithuania,LT,LTU,440
Luxembourg,LU,LUX,442
Macao,MO,MAC,446
Madagascar,MG,MDG,450
Malawi,MW,MWI,454
Malaysia,MY,MYS,458
Maldives,MV,MDV,462
Mali,ML,MLI,466
Malta,MT,MLT,470
Marshall Islands (the),MH,MHL,584
Martinique,MQ,MTQ,474
Mauritania,MR,MRT,478
Mauritius,MU,MUS,480
Mayotte,YT,MYT,175
Mexico,MX,MEX,484
Micronesia (Federated States of),FM,FSM,583
Moldova (the Republic of),MD,MDA,498
Monaco,MC,MCO,492
Mongolia,MN,MNG,496
Montenegro,ME,MNE,499
Montserrat,MS,MSR,500
Morocco,MA,MAR,504
Mozambique,MZ,MOZ,508
Myanmar,MM,MMR,104
Namibia,NA,NAM,516
Nauru,NR,NRU,520
Nepal,NP,NPL,524
Netherlands (the),NL,NLD,528
New Caledonia,NC,NCL,540
New Zealand,NZ,NZL,554
Nicaragua,NI,NIC,558
Niger (the),NE,NER,562
Nigeria,NG,NGA,566
Niue,NU,NIU,570
Norfolk Island,NF,NFK,574
North Macedonia,MK,MKD,807
Northern Mariana Islands (the),MP,MNP,580
Norway,NO,NOR,578
Oman,OM,OMN,512
Pakistan,PK,PAK,586
Palau,PW,PLW,585
"Palestine, State of",PS,PSE,275
Panama,PA,PAN,591
Papua New Guinea,PG,PNG,598
Paraguay,PY,PRY,600
Peru,PE,PER,604
Philippines (the),PH,PHL,608
Pitcairn,PN,PCN,612
Poland,PL,POL,616
Portugal,PT,PRT,620
Puerto Rico,PR,PRI,630
Qatar,QA,QAT,634
Romania,RO,ROU,642
Russian Federation (the),RU,RUS,643
Rwanda,RW,RWA,646
Réunion,RE,REU,638
Saint Barthélemy,BL,BLM,652
"Saint Helena, Ascension and Tristan da Cunha",SH,SHN,654
Saint Kitts and Nevis,KN,KNA,659
Saint Lucia,LC,LCA,662
Saint Martin (French part),MF,MAF,663
Saint Pierre and Miquelon,PM,SPM,666
Saint Vincent and the Grenadines,VC,VCT,670
Samoa,WS,WSM,882
San Marino,SM,SMR,674
Sao Tome and Principe,ST,STP,678
Saudi Arabia,SA,SAU,682
Senegal,SN,SEN,686
Serbia,RS,SRB,688
Seychelles,SC,SYC,690
Sierra Leone,SL,SLE,694
Singapore,SG,SGP,702
Sint Maarten (Dutch part),SX,SXM,534
Slovakia,SK,SVK,703
Slovenia,SI,SVN,705
Solomon Islands,SB,SLB,90
Somalia,SO,SOM,706
South Africa,ZA,ZAF,710
South Georgia and the South Sandwich Islands,GS,SGS,239
South Sudan,SS,SSD,728
Spain,ES,ESP,724
Sri Lanka,LK,LKA,144
Sudan (the),SD,SDN,729
Suriname,SR,SUR,740
Svalbard and Jan Mayen,SJ,SJM,744
Sweden,SE,SWE,752
Switzerland,CH,CHE,756
Syrian Arab Republic (the),SY,SYR,760
Taiwan (Province of China),TW,TWN,158
Tajikistan,TJ,TJK,762
"Tanzania, the United Republic of",TZ,TZA,834
Thailand,TH,THA,764
Timor-Leste,TL,TLS,626
Togo,TG,TGO,768
Tokelau,TK,TKL,772
Tonga,TO,TON,776
Trinidad and Tobago,TT,TTO,780
Tunisia,TN,TUN,788
Turkey,TR,TUR,792
Turkmenistan,TM,TKM,795
Turks and Caicos Islands (the),TC,TCA,796
Tuvalu,TV,TUV,798
Uganda,UG,UGA,800
Ukraine,UA,UKR,804
United Arab Emirates (the),AE,ARE,784
United Kingdom of Great Britain and Northern Ireland (the),GB,GBR,826
United States Minor Outlying Islands (the),UM,UMI,581
United States of America (the),US,USA,840
Uruguay,UY,URY,858
Uzbekistan,UZ,UZB,860
Vanuatu,VU,VUT,548
Venezuela (Bolivarian Republic of),VE,VEN,862
Viet Nam,VN,VNM,704
Virgin Islands (British),VG,VGB,92
Virgin Islands (U.S.),VI,VIR,850
Wallis and Futuna,WF,WLF,876
Western Sahara*,EH,ESH,732
Yemen,YE,YEM,887
Zambia,ZM,ZMB,894
Zimbabwe,ZW,ZWE,716
Åland Islands,AX,ALA,248
Loading