This ETL (Extract, Transform, Load) pipeline is designed to retrieve weather data from OpenWeatherMap API, process it, and create a comprehensive weather data analysis solution.
- Python 3.12+
- Apache Airflow
- OpenWeatherMap API Key
- AWS S3 Bucket
- MySQL Database
- Power BI Desktop
-
Fetch Data from OpenWeather API: Utilize the OpenWeather API to retrieve the necessary weather data for your analysis. This will serve as the primary data source for your ETL pipeline.
-
Analyze the Data Structure: Examine the data returned from the OpenWeather API to understand the schema, data types, and any relevant metadata that will inform your data processing.
-
Construct a mock production data lake in AWS S3: Create an S3 bucket and the necessary table schema to serve as the data source for your project.
-
Build an ETL Pipeline using Airflow:
-
Set up MySQL instance on Amazon RDS and make sure MySQL Workbench is setup on your computer
-
Configure AWS Glue and Crawlers
- Utilize the AWS Glue service to automate the data cataloging and schema management tasks.
- Set up Glue crawlers to discover and ingest the data stored in your S3 data lake.
-
Visualize in Power BI
- Connect the processed data from the MySQL database to Power BI.
OpenWeather API
: Primary data source for weather informationAirflow
: Scheduling and orchestration of data pipeline workflowsAmazon S3
: Data lake for storing raw, unprocessed weather dataMySQL
: Data warehouse for structured and transformed weather dataPower BI
: Data visualization and dashboard creation
- Date Slider: The date slider allows you to view weather data for different dates, which is a useful feature for tracking changes over time.
- Avg Temp By Humidity: This chart shows the relationship between average temperature and humidity levels. The fluctuations in the line graph indicate how these two factors vary together.
- Sunrise and Sunset Local Times: Displaying the local sunrise and sunset times is helpful for understanding the daylight hours.
- Avg Wind Speed by Month: This bar chart compares the average wind speeds between November and December, providing insight into seasonal wind patterns.
- Actual Temp (F) vs Feels Like (F): The scatter plot visualizes the relationship between the actual temperature and the "feels like" temperature, which takes into account factors like wind and humidity.
- Machine learning weather prediction models
- Real-time alerting for extreme weather conditions
- Expand geographical coverage
- Implement more advanced data visualization techniques