Course: Google Data Analytics Capstone: Complete a Case Study
Cyclistic Cyclistic, a bike-share program operating with over 5,800 bicycles and 600 docking stations, distinguishes itself by providing a variety of bike options, including reclining bikes, hand tricycles, and cargo bikes, catering to individuals with disabilities and those who prefer alternative bike designs. While the majority of users opt for traditional bikes, approximately 8% utilize assistive options. Cyclistic's user base primarily comprises leisure riders, although roughly 30% use the service for daily commutes.
Previously, Cyclistic's marketing strategy focused on raising overall awareness and appealing to broad consumer demographics, facilitated by flexible pricing plans such as single-ride passes, full-day passes, and annual memberships. Customers purchasing single-ride or full-day passes are classified as casual riders, while those opting for annual memberships are Cyclistic members.
Financial analysis reveals that annual members are significantly more profitable than casual riders. Consequently, Moreno, the marketing director, believes that driving annual membership subscriptions will drive future growth. Rather than targeting entirely new customers, Moreno aims to convert casual riders into members, leveraging their existing awareness of the Cyclistic program and their preference for the service.
Moreno has outlined a clear objective: Develop marketing strategies focused on transitioning casual riders into annual members. To achieve this goal, the marketing analyst team must gain a deeper understanding of the differences between annual members and casual riders, discern why casual riders might opt for a membership, and assess how digital media could influence marketing tactics. Moreno and her team intend to analyze historical bike trip data to uncover relevant trends.
Scenario As a junior data analyst within Cyclistic's marketing team, my role involves assisting in the analysis of user behavior to inform the development of a new marketing strategy aimed at converting casual riders into annual members. Our recommendations must be supported by compelling data insights and professional data visualizations to secure executive approval.
Business Task Devise marketing strategies to convert casual riders to members.
Analysis Questions Three questions will guide the future marketing program:
- How do annual members and casual riders use Cyclistic bikes differently?
- Why would casual riders buy Cyclistic annual memberships?
- How can Cyclistic use digital media to influence casual riders to become members?
Moreno has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
Data Source
I will use Cyclistic’s historical trip data to analyze and identify trends from Jan 2023 to Dec 2023 which can be downloaded from divvy_tripdata. The data has been made available by Motivate International Inc under this license. This is public data that can be used to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit from using riders’ personally identifiable information. This means that we won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
Data Organization There are 12 files with naming convention of YYYYMM-divvy-tripdata and each file includes information for one month, such as the ride id, bike type, start time, end time, start station, end station, start location, end location, and whether the rider is a member or not. The corresponding column names are ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng and member_casual.
BigQuery is used to combine the various datasets into one dataset and clean it.
Reason: A worksheet can only have 1,048,576 rows in Microsoft Excel because of its inability to manage large amounts of data, MySQL Workbench couldn’t handle the data size as well. Since the Cyclistic dataset has more than 5.6 million rows, it is essential to use a platform like BigQuery that supports huge volumes of data.
Combining the Data:
SQL Query: Data Combining
12 csv files are uploaded as tables in the dataset 'GDA. Another table named GDA.2023_combined_data
is created, containing 5719877 rows of data for the entire year.
SQL Query:Data Exploration
Before cleaning the data, I am familiarizing myself with the data to find the inconsistencies.
Observations:
-
The table below shows the all column names and their data types. The ride_id column is our primary key.
-
The following table shows number of null values in each column.
-
We observed that certain columns exhibit identical numbers of missing values. This suggests a potential issue where entire rows might be missing specific data points. For example, a row might lack both the station name and ID for the starting station, or the latitude and longitude for the ending station.
As ride_id has no null values, let's use it to check for duplicates.
There are no duplicate rows in the data.
-
All ride_id values have length of 16 so no need to clean it.
-
There are 3 unique types of bikes(rideable_type) in our data.
-
Our data contains timestamps for the start (started_at) and end (ended_at) of each trip, both in YYYY-MM-DD hh:mm:ss UTC format. To analyze trip durations, we can create a new column named ride_length that calculates the difference between these timestamps. There are 6418 trips which has duration longer than a day and 151069 trips having less than a minute duration or having end time earlier than start time so need to remove them. Other columns day_of_week and month can also be helpful in analysis of trips at different times in a year.
- Total of 875848 rows have both start_station_name and start_station_id missing which needs to be removed.
- Total of 929343 rows have both end_station_name and end_station_id missing which needs to be removed.
- Total of 6990 rows have both end_lat and end_lng missing which needs to be removed.
- member_casual column has 2 uniqued values as member or casual rider.
- Columns that need to be removed are start_station_id and end_station_id as they do not add value to analysis of our current problem.
SQL Query: Data Cleaning
-
To ensure data accuracy, any rows with missing values were eliminated.
-
Three new columns were added to enhance the analysis: ride_length to capture trip duration, day_of_week to understand usage patterns by day, and month to identify seasonal trends.
-
Trips with unrealistic durations (less than a minute or longer than a day) were excluded as outliers. This step resulted in the removal of 1,476,445 rows.
Data Visualization: Tableau
Having thoroughly cleaned and organized the data, we're ready to dive into the analysis. By querying relevant datasets and visualizing them using Tableau, we can answer a key question:_ How does usage of Cyclistic bikes differ between annual members and casual riders?_
To begin, let's compare the types of bikes preferred by each user group.
Our data shows that members account for a significant portion of ridership, at 64.53% (2,738,451 trips). Casual riders make up the remaining 35.47% (1,504,981 trips).
Each bike type chart shows percentage from the total. Most used bike is classic bike followed by the electric bike. Docked bikes are used the least and by only casual riders. To understand usage patterns better, we'll examine how trip volume is distributed across months, days of the week, and hours of the day.
Monthly Trends: Interestingly, both casual and member riders exhibit similar overall trip patterns across months. Notably, usage peaks in July and dips towards the year's end, likely due to seasonal temperature changes.
Day of the Week Variations: Casual and member riders demonstrate distinct day-of-week usage patterns. Casual riders show a surge in weekend trips, while members experience a decrease, suggesting members are likely workers and students using the program for weekday commutes. Casual riders, on the other hand, likely utilize the program for leisure activities on weekends.
Hourly Ridership: Member trips exhibit two distinct peaks: early mornings (6 am - 8 am) and evenings (4 pm - 8 pm), coinciding with typical commute times. Conversely, casual rider trips show a steady increase throughout the day, peaking in the evening before declining. This further reinforces the hypothesis that members prioritize commuting, while casual riders use the program for recreational purposes.
Overall: The data suggests members primarily use bikes for weekday work/school commutes, while casual riders utilize them for leisure activities throughout the day, particularly on weekends. Both user groups exhibit peak activity in spring and summer.
By comparing ride durations, we can understand the distinct behaviors of casual and member riders.
Ride Duration Analysis:
Casual Riders: Interestingly, casual riders take significantly longer trips compared to members, with an average journey length roughly twice that of members. This duration, however, varies throughout the year, week, and day. Casual riders embark on their longest rides during weekends, spring, and summer months, and between 10 am and 2 pm. Conversely, their commutes during weekdays (particularly between 5-8 am) tend to be much shorter.
Members: In contrast, members exhibit consistent ride durations regardless of season, weekday/weekend, or time of day. Their trips are typically shorter in comparison to casual riders.
Usage Patterns**
Casual Riders: Favor riding throughout the day, with peak usage on weekends during spring and summer. These rides are typically for leisure activities and tend to be twice as long as member rides but occur less frequently.
Members: Primarily use bikes for commuting on weekdays, with peak activity during morning and evening commutes (around 8 am and 5 pm) in spring and summer. Members take more frequent rides, but their trip durations are roughly half that of casual riders.
Understanding casual riders is key to attracting them as members. By analyzing the differences between casual and member riders, we can develop targeted marketing strategies to convert them. Here are some recommendations:
- Seasonal Marketing: Focus marketing campaigns in spring and summer at popular tourist and recreational locations frequented by them.
- Flexible Membership Options: Since casual riders are most active on weekends and during warmer months, consider offering seasonal or weekend-only memberships to cater to their preferences.
- Ride-Time Incentives: We observed that casual riders use their bikes for longer durations. Offering discounts for longer rides can attract casual riders and encourage existing members to extend their trips.
- Targeted Discounts for Busy Riders: Knowing that members are likely to be workers and students, consider special discounts encouraging them to ride more, particularly on weekends.
- Off-Season Strategy: To address the sales drop during colder months, it's recommended to invest in promoting alternative transportation options suitable for those weather conditions.