🤖📈📊 midterm assignment

For this assignment, you will conduct exploratory analysis on a dataset and present your findings with polished visualizations. You will work in Python using a Jupyter notebook, using pandas for your initial analysis. To create visualizations, you may use matplotlib, seaborn, or a combination of the 2 libraries.

You may start with the following steps:

Pick a 1 dataset from the options listed here: sports data (Major League Soccer), astronomical data (near-earth objects), climate data (earth surface temperature data), or NYC Open Data on film permits in the city.
Examine the dataset's basic attributes and determine a question or 2 that you would like to answer about the data. What relationships do you hope to explore between various attributes? What trends do you wonder about?
Research any basic aspect of the dataset that you need - what do certain attributes mean to the domain (subject matter)? Do you understand what all the attributes and variables mean?
Explore with rough visualizations, using features such as pairplot to get a visual overview of the dataset.
Finalize 2 visualizations that help answer your initial questions.
Write a short report about your process, your conclusions, and your final visualizations, including any further questions you have about the data. (This written element should be 500-700 words.)

datasets!

⚽️ major league soccer, matches dataset
- every match from 1996 - 2022, various stats
- download dataset
- sources: ESPN data, via jvmohr + kaggle
💫 open asteroid dataset
- 33k+ "near-earth objects" and their recorded data
- download dataset
- sources: NASA JPL + kaggle
- note: for a "data dictionary" of what each column means, check out this page in the "Output Selection Controls" section
🌎 climate, global land temperatures dataset
- 3 tables: temperatures by major city, country, and global overview, 1750-today
- download dataset
- sources: Berkeley Earth + kaggle
🎬 nyc film permits dataset
- information about all film permits given by the city in 2023
- download dataset
- sources: NYC Open Data

💥 remember, you do not have to visualize every attribute! what columns will you focus on?
have fun!

🔎 evaluation criteria

Your work will be evaluated based on your analysis process, final visualizations, and your short report. In general I am looking for:

Clear research questions related to the chosen data set
Relevant data transformation in pandas or other methods (csv, excel, etc.)
Breadth of analysis, exploring multiple questions
Depth of analysis, with appropriate follow up questions
Clear explanation of your process and choices in your report
2 final visualizations are polished, they can "stand alone" and tell a story
Expressive and effective visualizations, good design choices based on what we've learned (color, shape, other channels, etc.)
Clear and relevant captions, titles, axes, and any other necessary labels

✉️ submit your work

The report text can be inside the Jupyter notebook, at the end after your final visualization renderings.
Please use this form to submit your work by Oct. 29, 11:59pm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

midterm.md

midterm.md

🤖📈📊 midterm assignment

datasets!

🔎 evaluation criteria

✉️ submit your work

Files

midterm.md

Latest commit

History

midterm.md

File metadata and controls

🤖📈📊 midterm assignment

datasets!

🔎 evaluation criteria

✉️ submit your work