Tech Catalyst Data Engineering Final Capstone - The PANDAS

Project Overview

For our capstone project, we were tasked with creating a full ETL pipeline using High Volume for-Hire (HVFHV) data, Yellow Taxi data, and Green Taxi data. Our goal was to extract data from a raw AWS S3 bucket, clean up the data, and load back into a conformed S3 bucket. Once doing that, we had to perform necessary transformations and load it back into a transformed S3 bucket, from which we loaded into Snowflake for final analysis before visualizing. We leveraged various AWS services and tools, such as AWS S3, AWS Glue Crawler, AWS Athena, Databricks/PySpark, GitHub Codespaces, Amazon Bedrock, Snowflake, ThoughtSpot, and Tableau.

Problem Statement

For our use case, we decided to focus more specifically on traffic congestion pattern analysis within New York City boroughs and the comparison of landmark distribution on these patterns. To do this, we leveraged the taxi datasets given to us, as well as explored external landmark data that was used to supplement our analysis, visualizations, and solutions. Our end goal was to create a better way for tourists to navigate NYC while reducing travel time.

Data Description

Yellow Taxi Data: 9 files (September 2023 to May 2024)
Green Taxi Data: 9 files (September 2023 to May 2024)
HVFHV Data: 5 files (January 2024 to May 2024)
Individual Landmark Data: 1 file (17 columns)

Contacts

Name	Email
Peter Alonzo	[email protected]
Nithila Annadurai	[email protected]
Alina Baby	[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
Pipeline Stages		Pipeline Stages
Group 1 Capstone Presentation- Alina, Nithila, Peter.pptx		Group 1 Capstone Presentation- Alina, Nithila, Peter.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tech Catalyst Data Engineering Final Capstone - The PANDAS

Project Overview

Problem Statement

Data Description

Contacts

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

notnithila/tech-cat-capstone

Folders and files

Latest commit

History

Repository files navigation

Tech Catalyst Data Engineering Final Capstone - The PANDAS

Project Overview

Problem Statement

Data Description

Contacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages