Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions 02_activities/assignments/Assignment_3/Assignment3_vis1_Qs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Assignment 3

The first image for the averaged trends across all lakes.
![Annual Chlorophyll trends](Assignment_3/Assignment3_graph1.png)

Answering the questions

1. What software did you use to create your data visualization?
This graph was created with python code, mostly codes learned on this course.

2. Who is your intended audience?
The intended audience can be policy makers or researchers.

3. What information or message are you trying to convey with your visualization?
This graph is to indicate the water nutrients in all lakes in Ontario has been declining for many years.

4. What aspects of design did you consider when making your visualization? How did you apply them? With what elements of your plots?
Since I wanted to demonstrate the trend/dynamics of thewater nutrients, I thought the graph needs to efficiently visualize the changes between years. Therefore, my initial decision was using a line graph. Then, I was thinking how to make the graph more convincing and rigrous. Considering mean values might be biased or strongly influenced by skewness or outliers, I included a graph of median value in additional to the means. The consistency of these two graphs should support the integrity of this declining pattern. Besides, since the sampling across years was also not consistent, I included a thrid subplot to show the pattern after accounting for the different sampling frequency in different years. Lastly, I did not include error bars for the plotted averages. In this context, variability primarily reflects seasonal effects rather than random measurement error. Because seasonal variability was not the focus of this figure, including error bars could obscure the primary interannual trend and reduce interpretability.

5. How did you ensure that your data visualizations are reproducible? If the tool you used to make your data visualization is not reproducible, how will this impact your data visualization?
Neither the data nor my graph production procedure involves randomizations, so it should be reproducible as long as the people use the same dataset and my code.

6. How did you ensure that your data visualization is accessible?
This graph was not designed for general public. For the intended audience, terms like 'mean' or 'median' should be interpretable. Appropriate units were also included for y axis to ensure the interpretablity.

7. Who are the individuals and communities who might be impacted by your visualization?
This graph indicates the declining trend of water neutrients, which should be an issue considered by environment departments.

8. How did you choose which features of your chosen dataset to include or exclude from your visualization?
What ‘underwater labour’ contributed to your final data visualization product?
Chlorophyll was chosen as the representative indicator of water nutrients because it was the most consistently measured chemical in the dataset and is widely recognized as a proxy for nutrient levels. Including all measured chemicals would reduce interpretability for non-specialist readers and introduce redundancy, unless there were reason to expect opposing patterns between chlorophyll and other nutrient indicators.
32 changes: 32 additions & 0 deletions 02_activities/assignments/Assignment_3/Assignment3_vis2_Qs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Assignment 3

The first image for the averaged trends across all lakes.
![Lake Specific water nutrient changes](Assignment3_graph2.png)


Answering the questions

1. What software did you use to create your data visualization?
This graph was created with R code, with Rstudio.

2. Who is your intended audience?
The intended audience can be policy makers or general publics.

3. What information or message are you trying to convey with your visualization?
The main message is that Ontario's major lakes have distinct, relatively stable neutrient identities, indicating a strong spatial heterogeneity (instead of uniform pattern) across lakes. Besides, the overall color from warmer to colder indicates a long-term decline in neutrient(despite some modest improvement in certain lakes recently), consistent with the pattern revealed in the first visualization.

4. What aspects of design did you consider when making your visualization? How did you apply them? With what elements of your plots?
This visualization is designed to highlight the distinct nutrient profiles of different lakes and the spatial heterogeneity of lakes across Ontario. Because temporal dynamics are not the primary focus, a heatmap was chosen instead of a multi-line graph, as it conveys the main patterns more clearly and visually. To avoid unintended emphasis from color choices, non-salient colors were selected, with warmer tones used to indicate higher nutrient levels. A numeric legend was placed on the right of the heatmap to facilitate quantitative interpretation of the patterns.

5. How did you ensure that your data visualizations are reproducible? If the tool you used to make your data visualization is not reproducible, how will this impact your data visualization?
Same to the first visualization. Neither the data nor my graph production procedure involves randomizations, so it should be reproducible as long as the people use the same dataset and my code.

6. How did you ensure that your data visualization is accessible?
Since the intended audience of this graph includes general public, it does not include any stats or academic term. The title was also more straightforward and message-conveying.

7. Who are the individuals and communities who might be impacted by your visualization?
This graph highlights the distinct nutrient profiles of individual lakes, which may be of interest to water companies when evaluating potential water sources for their products. It also demonstrates substantial geographic variation in lake nutrient levels across Ontario. Investigating how nutrient concentrations relate to geographic factors—and identifying the drivers of these patterns—could help environmental agencies develop more targeted and effective strategies to improve water quality in different regions.

8. How did you choose which features of your chosen dataset to include or exclude from your visualization?
What ‘underwater labour’ contributed to your final data visualization product?
This graph continues to use chlorophyll as a representative indicator of water nutrients. Log₁₀-transformed values, rather than raw concentrations, were displayed to improve visual interpretability. The transformation reduces the influence of extreme values, allowing variation across both low- and high-concentration lakes to be represented more evenly and preventing highly nutrient-rich lakes from dominating the color scale.
48 changes: 48 additions & 0 deletions 02_activities/assignments/Assignment_3/Graphing_code_assg3.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@

###### Assignment 3 Second Visualization R codes

### import necessary pacakges
library(readr)
library(dplyr)
library(ggplot2)
library(stringr)
library(readxl)
library(viridis)

### load the dataset
URL = "https://files.ontario.ca/moe_mapping/downloads/2Water/GLIP/All_Lakes_GLIP_1976_2024.xlsx"
temp_file <- tempfile(fileext = ".xlsx")

download.file(URL, temp_file, mode = "wb")
water_quality <- read_excel(temp_file)


## obtain the data related to chlorophyll
chl <- water_quality %>%
filter(str_detect(tolower(PARAMETER),"chlorophyll")) %>%
filter(!is.na(RESULT_VALUE), !is.na(YEAR), !is.na(LAKE))

## calculate the lake-specific chlorophyll means

lake_depend_chl_mean <- chl %>%
group_by(LAKE, YEAR) %>%
summarise(mean_chl = mean(RESULT_VALUE), .groups = "drop") %>%
mutate(log10_chl = log10(mean_chl))


## create a heatmap graph for lake nutrient stability
ggplot(lake_depend_chl_mean, aes(x = YEAR, y = LAKE, fill = log10_chl)) +
geom_tile() +
scale_fill_viridis_c(name = "log10 Chlorophyll (mg/L)", na.value = "white") +
labs(
title = "Lake Specific Nutrient Enrichment (Chlorophyll) Dynamics from 1976 to 2024",
x = "Year",
y = "Lake"
) +
theme_minimal(base_size = 12)






162 changes: 162 additions & 0 deletions 02_activities/assignments/Assignment_3/Graphing_code_assg3.ipynb

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions 02_activities/assignments/Assignment_3/assignment_3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Data Visualization

## Assignment 3: Final Project

### Requirements:
- We will finish this class by giving you the chance to use what you have learned in a practical context, by creating data visualizations from raw data.
- Choose a dataset of interest from the [City of Toronto’s Open Data Portal](https://www.toronto.ca/city-government/data-research-maps/open-data/) or [Ontario’s Open Data Catalogue](https://data.ontario.ca/).
- Using Python and one other data visualization software (Excel or free alternative, Tableau Public, any other tool you prefer), create two distinct visualizations from your dataset of choice.
- For each visualization, describe and justify:
> What software did you use to create your data visualization?

> Who is your intended audience?

> What information or message are you trying to convey with your visualization?

> What aspects of design did you consider when making your visualization? How did you apply them? With what elements of your plots?

> How did you ensure that your data visualizations are reproducible? If the tool you used to make your data visualization is not reproducible, how will this impact your data visualization?

> How did you ensure that your data visualization is accessible?

> Who are the individuals and communities who might be impacted by your visualization?

> How did you choose which features of your chosen dataset to include or exclude from your visualization?

> What ‘underwater labour’ contributed to your final data visualization product?

- This assignment is intentionally open-ended - you are free to create static or dynamic data visualizations, maps, or whatever form of data visualization you think best communicates your information to your audience of choice!
- Total word count should not exceed **(as a maximum) 1000 words**

### Why am I doing this assignment?:
- This ongoing assignment ensures active participation in the course, and assesses the learning outcomes:
* Create and customize data visualizations from start to finish in Python
* Apply general design principles to create accessible and equitable data visualizations
* Use data visualization to tell a story
- This would be a great project to include in your GitHub Portfolio – put in the effort to make it something worthy of showing prospective employers!

### Rubric:

| Component | Scoring | Requirement |
|-------------------|----------|-----------------------------------------------------------------------------|
| Data Visualizations | Complete/Incomplete | - Data visualizations are distinct from each other<br>- Data visualizations are clearly identified<br>- Different sources/rationales (text with two images of data, if visualizations are labeled)<br>- High-quality visuals (high resolution and clear data)<br>- Data visualizations follow best practices of accessibility |
| Written Explanations | Complete/Incomplete | - All questions from assignment description are answered for each visualization<br>- Explanations are supported by course content or scholarly sources, where needed |
| Code | Complete/Incomplete | - All code is included as an appendix with your final submissions<br>- Code is clearly commented and reproducible |

## Submission Information

🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.

### Submission Parameters:
* Submission Due Date: `23:59 - 11/02/2025`
* The branch name for your repo should be: `assignment-3`
* What to submit for this assignment:
* A folder/directory containing:
* This file (assignment_3.md)
* Two data visualizations
* Two markdown files for each both visualizations with their written descriptions.
* Link to your dataset of choice.
* Complete and commented code as an appendix (for your visualization made with Python, and for the other, if relevant)
* What the pull request link should look like for this assignment: `https://github.com/<your_github_username>/visualization/pull/<pr_id>`
* Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.

Checklist:
- [ ] Create a branch called `assignment-3`.
- [ ] Ensure that the repository is public.
- [ ] Review [the PR description guidelines](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions) and adhere to them.
- [ ] Verify that the link is accessible in a private browser window.

If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ dependencies = [
"ipykernel>=6.30.1",
"matplotlib>=3.10.6",
"numpy>=2.3.3",
"openpyxl>=3.1.5",
"pandas>=2.3.2",
"seaborn>=0.13.2",
]
Loading