Skip to content

Latest commit

 

History

History
55 lines (42 loc) · 4.1 KB

midterm.md

File metadata and controls

55 lines (42 loc) · 4.1 KB

🤖📈📊 midterm assignment

For this assignment, you will conduct exploratory analysis on a dataset and present your findings with polished visualizations. You will work in Python using a Jupyter notebook, using pandas for your initial analysis. To create visualizations, you may use matplotlib, seaborn, or a combination of the 2 libraries.

You may start with the following steps:

  1. Pick a 1 dataset from the options listed here: sports data (Major League Soccer), astronomical data (near-earth objects), climate data (earth surface temperature data), or NYC Open Data on film permits in the city.
  2. Examine the dataset's basic attributes and determine a question or 2 that you would like to answer about the data. What relationships do you hope to explore between various attributes? What trends do you wonder about?
  3. Research any basic aspect of the dataset that you need - what do certain attributes mean to the domain (subject matter)? Do you understand what all the attributes and variables mean?
  4. Explore with rough visualizations, using features such as pairplot to get a visual overview of the dataset.
  5. Finalize 2 visualizations that help answer your initial questions.
  6. Write a short report about your process, your conclusions, and your final visualizations, including any further questions you have about the data. (This written element should be 500-700 words.)

datasets!

  • ⚽️ major league soccer, matches dataset

  • 💫 open asteroid dataset

    • 33k+ "near-earth objects" and their recorded data
    • download dataset
    • sources: NASA JPL + kaggle
    • note: for a "data dictionary" of what each column means, check out this page in the "Output Selection Controls" section
  • 🌎 climate, global land temperatures dataset

  • 🎬 nyc film permits dataset

💥 remember, you do not have to visualize every attribute! what columns will you focus on?
have fun!

🔎 evaluation criteria

Your work will be evaluated based on your analysis process, final visualizations, and your short report. In general I am looking for:

  • Clear research questions related to the chosen data set
  • Relevant data transformation in pandas or other methods (csv, excel, etc.)
  • Breadth of analysis, exploring multiple questions
  • Depth of analysis, with appropriate follow up questions
  • Clear explanation of your process and choices in your report
  • 2 final visualizations are polished, they can "stand alone" and tell a story
  • Expressive and effective visualizations, good design choices based on what we've learned (color, shape, other channels, etc.)
  • Clear and relevant captions, titles, axes, and any other necessary labels

✉️ submit your work

  • The report text can be inside the Jupyter notebook, at the end after your final visualization renderings.
  • Please use this form to submit your work by Oct. 29, 11:59pm.