Skip to content

notnithila/tech-cat-capstone

Repository files navigation

Tech Catalyst Data Engineering Final Capstone - The PANDAS

Project Overview

For our capstone project, we were tasked with creating a full ETL pipeline using High Volume for-Hire (HVFHV) data, Yellow Taxi data, and Green Taxi data. Our goal was to extract data from a raw AWS S3 bucket, clean up the data, and load back into a conformed S3 bucket. Once doing that, we had to perform necessary transformations and load it back into a transformed S3 bucket, from which we loaded into Snowflake for final analysis before visualizing. We leveraged various AWS services and tools, such as AWS S3, AWS Glue Crawler, AWS Athena, Databricks/PySpark, GitHub Codespaces, Amazon Bedrock, Snowflake, ThoughtSpot, and Tableau.

Problem Statement

For our use case, we decided to focus more specifically on traffic congestion pattern analysis within New York City boroughs and the comparison of landmark distribution on these patterns. To do this, we leveraged the taxi datasets given to us, as well as explored external landmark data that was used to supplement our analysis, visualizations, and solutions. Our end goal was to create a better way for tourists to navigate NYC while reducing travel time.

Data Description

Contacts

Name Email
Peter Alonzo [email protected]
Nithila Annadurai [email protected]
Alina Baby [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •