-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathplots-more.qmd
More file actions
714 lines (517 loc) · 36.8 KB
/
Copy pathplots-more.qmd
File metadata and controls
714 lines (517 loc) · 36.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
# Deeper into ggplot
In the last chapter, you were introduced to [ggplot2](https://ggplot2.tidyverse.org/index.html), the `tidyverse` package that helps you build graphics, charts, and figures. In this chapter, we'll take your `ggplot` knowledge to the next level. We encourage you to treat this chapter as a reference.
## Learning goals for this chapter
In this chapter, we will cover the following topics:
- How to prepare and build column and line charts
- How to use themes to change the looks of a chart
- More about aesthetics in layers!
- Faceting, or making multiple charts from the same data
- How to make interactive plots with `Plotly`
## References
`ggplot2` has a LOT to it and we'll cover only the basics. Check out the [Resources](resources.qmd) chapter for more references, tutorials, etc.
## Set up your notebook
This week, we'll return to our Military Surplus project, but add a new Quarto Document.
1. Open your `name-military-surplus` project.
2. Create a new Quarto Document.
3. Set the title as "Military Surplus figures"
4. Remove the rest of the boilerplate template.
5. Save the file and name it `03-ggplot.qmd`.
6. Create a setup chunk and load the following libraries: `tidyverse`, `janitor`, `palmerpenguins` and `scales`. Even though this is the first time we've used the [scales](https://ggplot2.tidyverse.org/reference/index.html#scales) package you shouldn't have to install it as it comes with the tidyverse package.
```{r}
#| label: setup
#| echo: false
#| message: false
#| warning: false
library(tidyverse)
library(janitor)
library(palmerpenguins)
library(scales)
```
And lastly we need to create a folder that we'll need later.
7. In your **Files** pane, use the **New Folder** icon to create a new folder and call it `figures`. This is where we'll export copies of our plotted charts.
### Let's get the data
We'll be importing the cleaned data that has all the states.
1. Create a new section that notes you are importing
2. Use `read_rds()` to import the file and save it into an object called `leso`. The file should be at `data-processed/01-leso-all.rds` if you named it correctly in our [analysis](leso-cleaning.qmd#leso-export) .
4. Filter the data so it only has `control_type` of **TRUE**.
3. Glimpse the data so we can see the column names
```{r}
#| label: l-import
#| code-fold: true
#| code-summary: "You should be able to do this by now!"
#| message: false
#| warning: false
leso <-
read_rds("data-processed/01-leso-all.rds") |>
filter(control_type == TRUE)
glimpse(leso)
```
Above we can see the columns were working with.
## Goal 1: Vertical column chart
For our first chart here we are looking to plot the total acquisitions value for each year in the state of Texas. It can sometimes help to draw on paper the chart you are hoping to make so you can see how to format the data to make it.
This is the data and chart we hope to build:
```{r}
#| label: t-example
#| echo: false
tx_values_exmpl <- leso |>
filter(
state == "TX",
ship_date < "2025-01-01"
) |>
group_by(year = floor_date(ship_date, unit = "year")) |>
summarize(yearly_value = sum(total_value)) |>
filter(year < "2025-12-31") # not full year
tx_values_exmpl
tx_values_exmpl |>
ggplot(aes(x = year, y = yearly_value)) +
geom_col() +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_datetime(minor_breaks = breaks_width("1 year"),
) +
labs(title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office") +
theme_minimal()
```
In order to make that, we'll need to summarize our data by the year of the shipping date, then sum together all the value of the equipment.
But we don't have this data in the format, so we need to make it.
### Working through logic
Data visualization is an iterative process: you may prepare the data, do the visualization, and then realize you need to prepare the data a different way. There can be a lot of trial and error: you're often testing different things to see what works.
There are lots of choices we could make on *how* to do this, and I explored several different paths while making this chart. While I won't go through every possible scenario, I'll try to explain why I made the decisions I did. It's not like other ways are wrong, they just have different pros and cons.
Walking through this with the chart first:
- Along the X axis we want to plot each year. Since our data starts as individual transfers and their date, this is a clue that we need to group by the year of that shipping date.
- Along the Y axis we want to plot the value of all the equipment in Texas. That's a clue that we need to summarize our data to add up the value of the equipment.
The most exploration came around how I grouped this data by year. There are a myriad of ways to do this, like using `year()` to pluck the year from the shipping date to make a number like `2023` or convert that to text as "2023". I chose to create a "floor date" so I end up with a real date because it gives me more flexibility with plotting. ggplot has some tools where I can format the date the way I want, though I admit I had to go learn about them to do this lesson!
The function [`floor_date()`](https://lubridate.tidyverse.org/reference/round_date.html) rounds a date *down* to a common value, but it remains a real date. If I convert both "2023-10-18" and "2023-10-28" to a floor_date unit of "year" then they both end up as "2023-01-01", the first day of the year. If I choose to set floor_date as a unit of "month" then both dates end up as "2023-10-01", the first day of the month in each year. It's like we are rounding the date down to a specific unit. The key here is it remains a real date with properties like year, month, week and day. I'll draw upon those properties later.
We are going to "create" this date as we group our data. You can read more about this in appendix chapter [Grouping by dates](group-dates.qmd).
### Wrangling the data
Let's filter to the data we want (Texas data before 2025, since it is not a full year in this data) and group it by the year of the `ship_date` and summarize to sum the `total_value`.
1. Add a new section: Prepare Texas total values data
2. Add a new chunk and the code below. We'll pick it apart after.
```{r}
#| label: tx-data-prep
tx_values <- leso |> # <1>
filter(
state == "TX", # <2>
ship_date < "2025-01-01" # <2>
) |>
group_by(year = floor_date(ship_date, unit = "year")) |> # <3>
summarize(yearly_value = sum(total_value)) # <4>
tx_values # <5>
```
1. We start by creating a new object to shove all this stuff into: `tx_values`. Then we start with the data ...
2. You should be familiar enough with the filtering by now to understand what we are doing, but might not realize *why*. Filtering for just Texas rows makes sense because that is our focus, but you might not snap that the data for the most recent year is incomplete until you see the plot. It's important to recognize the date range of the data you are plotting so you don't compare a partial year to a complete one.
3. Inside this `group_by()` we have the code that makes our floor date.
We start as we often do by naming the new column made by the group: `year`.
We then use the `floor_date()` function on the `ship_date` variable, setting the unit to "year".
You can see the result of this in the return: Any row with a date within the year "2010" was given a date of "2010-01-01" and then summed together.
4. The last line is the `summarize()` function where we add all the `total_value`s together. We first named the column `yearly_value` since that is what we are creating.
5. At the end, we print out the contents of the `tx_values` object we created.
### Plot the chart
Alright, so now let's get ready to use the `ggplot()` function. I want you to create the plot here one step at a time so you can review how the layers are added.
1. Start a new section. Note we are plotting Texas values.
2. Add and run the ggplot() line first (but without the `+`)
3. Then add the `+` and the `geom_col()` and run it.
5. Then add the `+` and `labs()`, and run everything.
```{r}
#| label: t-plot
ggplot(tx_values, aes(x = year, y = yearly_value)) + # <1>
geom_col() + # <2>
labs( # <3>
title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office"
)
```
1. we create the graph
2. adding the columns
3. we add the labels. Note the `str_wrap()` function we've used with `subtitle =` which keeps the text from running outside the chart.
::: callout-important
### A new way to wrap text
I don't have time to rework the book and videos right now, but I did recently learn about another, perhaps easier way to wrap headlines and subtitles. This [R for the Rest of Us video](https://www.youtube.com/watch?v=N_bjKtckK7Q) explains it. Feel free to try that.
:::
### Cleaning Up
Alright, so we have a working plot! But there's a couple things that are a bit ugly about this plot. First, I'm not digging the weird numbers on the side (what the heck does `1e+07` even mean?!). If we go back up and look at the output from `tx_values` we can see the `yearly_value` numbers are pretty large. These large numbers are causing R to read our numbers as "scientific notation" (a math-y way of reading large numbers). For example, the total cost of supplies in 2014 was `42,949,729` (that's the first spike in our figure, around the `4e + 07` mark). But what a pain to read!
#### Using scales
There are a number of ways we could fix this. We could go back into the code block where we prepare the data and divide that value by "1000000" and call the values "per million". Or, we can do what I did and use the [scales](https://scales.r-lib.org/) package to "scale" the output for presentation without changing the underlying number.
Using functions from the scales package can be esoteric and unintuitive, but most of the things you would regularly need are covered in the [Position and Scales chapter of the ggplot2 book](https://ggplot2-book.org/scales-position.html). It is from there that we learn about using a function `scale_y_continuous()`, the `label =` argument and the the `scale_dollar()` function.
1. After the `geom_line()` line, add the `scale_y_continuous` function as indicated below.
```{r}
#| label: t-label-collar
ggplot(tx_values, aes(x = year, y = yearly_value)) +
geom_col() +
scale_y_continuous(labels = label_dollar()) + # <1>
labs(
title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office"
)
```
1. This is a layer to adjust the y axis scale.
I added this in the middle of the code because I like to keep the `labs()` at the bottom of the chart. It doesn't really matter. What it did was change the labels on the Y axis to use a currency format. Let's walk through it:
- The [`scale_y_continous()`](https://ggplot2.tidyverse.org/reference/scale_continuous.html) function allows modification to the Y axis, and there is a companion for the X axis. The term "continuous" refers to the values being numbers. There are similar functions for categories ([`scale_*_discrete()`](https://ggplot2.tidyverse.org/reference/scale_discrete.html)) and dates ([`scale_*_date()`](https://ggplot2.tidyverse.org/reference/scale_date.html)). They all allow modification of things like the labels, grid lines and axis names. (The `*` there is placeholder for `x` or `y`.)
- We use that function to set our "labels" with another function from the scales package, `label_dollar()`.
Again, the best resource for these type of things is the [Scales](https://ggplot2-book.org/scales.html) chapters in the ggplot2 book.
To show just how much granularity you have with ggplot, let's make one more change to this chart. Those dollar values are still pretty large, and it might be nice to show them in millions of dollars, like "\$10M".
2. **Edit** the `label_dollar()` function to add this argument inside: `scale_cut = cut_long_scale()`.
```{r}
#| label: t-scale-cut
ggplot(tx_values, aes(x = year, y = yearly_value)) +
geom_col() +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) + # <1>
labs(
title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office"
)
```
1. We're editing this line to add the `scale_cut` argument to `label_dollar()`.
Now those Y axis labels are nice and easy to read and understand.
#### Setting breaks
Let's show one more change. Notice how the grid lines along the X axis don't fall along every year? We are getting our "breaks" set every five years, but our "minor breaks" (the fainter lines between the five year increments) sit in the middle of a year. That makes it a little harder to read each year, or even to understand each of those dots fall on a year. We can use the `scale_x_date()` function to change those by adjust those minor breaks.
3. In the global `aes()` function, wrap the `date` value inside a function called `date()`. Currently our "date" is really a calendar date/time datatype called `<POSIXct>`. I've found our next function can be pretty picky about wanting a specific `<date>` datatype so we are adjusting it to avoid an error.
4. After the scale_y layer, add a new `scale_x_date()` later as indicated below.
```{r}
#| label: t-minor-breaks
tx_values |>
ggplot(aes(x = date(year), y = yearly_value)) + # <1>
geom_col() +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) + # <2>
labs(
title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office"
)
```
1. Surround "year" with `date()` function.
2. This is the added later for `scale_x_date()`.
OK, now the fainter "minor breaks" lines fall on each year, making them a little easier for the eye to track. Again, I had to [do some digging](https://ggplot2-book.org/scales-position.html#sec-date-minor-breaks) to figure that out.
### Saving plots as an objects
Sometimes it is helpful to push the results of a plot into an R object to "save" those configurations. You can continue to add layers to this object, but you won't need to rebuild the main portions of the chart each time. We'll do that here so we can explore themes next.
1. **Edit** your Texas plot chunk to save it into an R object called `tx_plot`.
2. Then call it after the code so you can see it.
```{r}
#| label: t-object
tx_plot <- ggplot(tx_values, aes(x = date(year), y = yearly_value)) + # <1>
geom_col() +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(
title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office"
)
tx_plot # <2>
```
1. We edit this line to add the object at the beginning.
2. Since we saved the plot into an R object above, we have to call it again to see it. We save graphs like this so we can reuse them.
We can continue to build upon the `tx_plot` object like we do below with themes, but those changes won't be "saved" into the R environment unless you assign it to an R object.
## Themes
The *look* of the graph is controlled by the theme. There are a number of [preset complete themes](https://ggplot2.tidyverse.org/reference/ggtheme.html) you can use. Let's look at a couple.
1. Create a new section saying we'll explore themes.
2. Add the chunk below and run it.
```{r theme-minimal}
tx_plot +
theme_minimal()
```
This takes our existing `tx_plot` and then applies the `theme_minimal()` look to it. This is actually our "Goal graphic" that we were shooting for at the beginning of the lesson!
There are a number of themes built into ggplot, most are pretty simplistic.
1. Edit your existing chunk to try different themes. Some you might try are `theme_classic()`, `theme_dark()` and `theme_void()`. You can see [a list here](https://ggplot2.tidyverse.org/reference/ggtheme.html).
### More with ggthemes
There are a number of other packages that build upon `ggplot2`, including [`ggthemes`](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/).
1. In your R console, install the `ggthemes` package using the `install.packages()` function: `install.packages("ggthemes")`
2. Add the `library(ggthemes)` at the top of your current chunk.
3. Update the theme line to view some of the others options noted below.
```{r theme-economist}
library(ggthemes)
tx_plot +
theme_economist()
```
```{r theme-538}
tx_plot +
theme_fivethirtyeight()
```
```{r theme-stata}
tx_plot +
theme_stata()
```
There is also a `theme()` function that allows you individually [adjust about every visual element](https://ggplot2.tidyverse.org/reference/theme.html) on your plot. It's a quite robust function with many arguments, but the website [ggplot2tor has a great app to help](https://ggplot2tor.com/theme).
We do a wee bit of that later.
## Goal 2: Multiple line chart
In this new plot, we'll learn about a new `geom` layer: `geom_line()` (recall that in [the last chapter](plots.qmd), we learned about `geom_point()` and `geom_col()`).
OK, our Texas military surplus information is fine ... but how does that compare to neighboring states? Let's work through building a new chart that shows all those steps. We can't use a column chart like we did above, because we need to display more than one state at a time to compare them.
Here is the data and chart we are trying to make:
```{r}
#| label: region-example
#| echo: false
#| warning: false
options(dplyr.summarise.inform = FALSE)
region_data_exp <- leso |>
filter(
state %in% c("TX", "OK", "AR", "NM", "LA")
) |>
mutate(year = floor_date(ship_date, unit = "year") |> date()) |>
group_by(state, year) |>
summarize(yearly_value = sum(total_value))
region_data_exp
ggplot(region_data_exp,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +
geom_line(aes(color = state)) +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(title = "Military surplus acquisitions track among Texas, bordering states",
subtitle = str_wrap("State and local law enforcement agencies can request surplus military equipment from the U.S. Department of Defence, paying only for the shipping. Agencies in Texas and bordering states have followed similar trends in requests since 2010, with declining requests in recent years."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State")
```
### Prepare the data
We need to go back to our original `leso` data to get the additional states. But we also know now that we need to create a "floor_date" that has a `<date>` datatype. Let's do that as we prepare the data instead of while we are grouping. I'll sometimes go back into my cleaning notebook to add something like this, but we won't go quite that far with this.
In the interest of time, I'll supply the code with annotations to explain each line.
1. Add a new headline and text to describe you are building this multi-state chart.
2. Add the code chunk below. Make sure you understand what each line is accomplishing. (You might even build it line by line so you can see that!)
```{r}
#| label: region-data
#| code-fold: false
#| code-summary: "Give it a go, but the code is here if you need it."
region_data <- leso |> #<1>
filter(state %in% c("TX", "OK", "AR", "NM", "LA")) |> #<2>
mutate(
year = floor_date(ship_date, unit = "year") |> date() #<3>
) |>
group_by(state, year) |> #<4>
summarize(yearly_value = sum(total_value)) #<5>
region_data #<6>
```
1. We create a new object to hold the data. We fill that starting with `leso`.
2. We filter the `state` column for our list of five states. Note we are using `%in%` like we did in Chapter 6, but we're doing it right inside the filter.
3. Here, inside a `mutate()`, we create the new variable `year` and fill it with the floor_date of our shipping date based on the year. We then turn that into a `<date>` datatype.
4. We group by `state` and `year`. This is much cleaner now that we've already created a `year` value.
5. Our summarize to total the costs.
6. And lastly, we print the new object so we can see it.
### Plot the chart
In our plot we need a different line for each state. To do this you would use the color aesthetic `aes()` in the `geom_line()` geom. Recall that geoms can have their own `aes()` variable information. This is especially useful for working with a third variable (like when making a stacked bar chart or line plot with multiple lines). Notice that the color aesthetic (meaning that it is in `aes`) takes a variable, not a color. You can learn how to change these colors [here](http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually). For now, though, we'll use the `ggplot` default colors.
1. Add a note that we'll now build the chart.
2. Add the code chunk below and run it. Look through the comments so you understand it.
```{r}
#| label: region-lcolor
ggplot(region_data, # <1>
aes(x = year, y = yearly_value)) + # <2>
geom_point() + # <3>
geom_line(aes(color = state)) + # <4>
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) + # <5>
scale_x_date(minor_breaks = breaks_width("1 year")) + # <6>
labs(title = "Military surplus acquisitions track among Texas, bordering states",
subtitle = str_wrap("State and local law enforcement agencies can request surplus military equipment from the U.S. Department of Defence, paying only for the shipping. Agencies in Texas and bordering states have followed similar trends in requests since 2010, with declining requests in recent years."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office") # <7>
```
1. Plotting the `region_data` object
2. Set set the x and y axis
3. Adds the points
4. Adds the line, coloring them based on the `state` values
5. Fixes our labels on the y axis.
6. Fixes our minor breaks on the x axis.
7. Adding all our labels with `labs()` arguments.
Adding this aesthetic not only added color to the lines, it added a legend so we can tell which is which. It's important to understand that we had to use an `aes()` function here because we were pulling from the data to decide how many colors to use. If we had tried to build this plot without the added color aesthetic, it wouldn't plot right because it wouldn't how how to split the lines.
Notice that only the lines changed colors, and not the points? This is because we only included the aesthetic in the `geom_line()` geom and not the `geom_point()` geom.
1. **Edit** your `geom_point()` to add `aes(color = state)`.
2. Also edit the `labs()` function to add `color = "State"`.
```{r}
#| label: region-pcolor
ggplot(region_data,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +#<1>
geom_line(aes(color = state)) +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(title = "Military surplus acquisitions track among Texas, bordering states",
subtitle = str_wrap("State and local law enforcement agencies can request surplus military equipment from the U.S. Department of Defence, paying only for the shipping. Agencies in Texas and bordering states have followed similar trends in requests since 2010, with declining requests in recent years."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State") #<2>
```
1. Color aesthetic added here.
2. Adding `color = "State"` here redraws the title above the color legend.
OK, you have a line chart for reference!
## On your own
I want you to make a line chart of military surplus acquisitions in three states that are **different** from the five we used above. This is very similar to the chart you just made, but with different values.
Some things to do/consider:
1. Do this in a new section and explain it.
2. You'll need to prepare the data just like we did above to get the right data points and the right states.
3. I really suggest you build both chunks (the data prep and the chart) one line at a time so you can see what each step adds.
## Tour of some other adjustments
These examples show how to tweak different parts of charts.
1. Add a headline/text that says you'll tour plot variables.
2. Add the code below and run it.
3. As you go through the other examples in this section you can just adjust your current chunk instead of adding new ones.
### Line width
```{r line-width}
ggplot(region_data,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +
geom_line(aes(color = state), size = 1.5) + #<1>
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State")
```
1. You can set the line width for a geom, but note it is taking an inputted value, not one from the data. This is why it is outside the `aes()` function.
### Line type
This example adds a `linetype = state` to the ggplot aesthetic. This gives each state a different type of line.
```{r}
#| label: linetype
ggplot(region_data,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +
geom_line(aes(color = state, linetype = state), size = .5) + #<1>
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State")
```
1. We add the linetype here, but have to put it INSIDE the `aes()` now because we are "mapping" the type based on values in the `state` variable.
But note that ggplot added a new legend because it takes the name from the variable "state". We can fix this through overiding it in the `labs()` function.
```{r}
#| label: linetype-lab
ggplot(region_data,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +
geom_line(aes(color = state, linetype = state), size = .5) +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State", linetype = "State") #<1>
```
1. We add `linetype = "State"` to the `labs()` function, which would collapse everything into a single legend.
## Facets
Facets are a way to make multiple graphs based on a variable in the data, as it creates a bunch of mini charts you can compare among each other. There are two types, the `facet_wrap()` and the `facet_grid()`. There is a good explanation of these in [R for Data Science](https://r4ds.hadley.nz/layers.html#facets).
We'll start with the chart we have already, and then split it into facets by state.
1. Start a new section about facets
2. Add the code below to create your chart and view it. This is the same plot we've already created
```{r}
#| label: facet-wrap
ggplot(region_data,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +
geom_line(aes(color = state, linetype = state), size = .5) +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(title = "Military surplus acquisitions track among Texas, bordering states",
subtitle = str_wrap("State and local law enforcement agencies can request surplus military equipment from the U.S. Department of Defence, paying only for the shipping. Agencies in Texas and bordering states have followed similar trends in requests since 2010, with declining requests in recent years."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State", linetype = "State") +
facet_wrap(~ state) #<1>
```
1. This is the only line we added, setting a facet wrap to split on the state.
Since each state facet is labeled, we don't really need the color legend, so we can use the [`theme()`](https://ggplot2.tidyverse.org/reference/theme.html) function to override that.
3. Add the last line here to remove the legend.
```{r}
#| label: facet-legend
ggplot(region_data,
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +
geom_line(aes(color = state, linetype = state), size = .5) +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(title = "Military surplus acquisitions track among Texas, bordering states",
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State", linetype = "State") +
facet_wrap(~ state) +
theme(legend.position = "none") #<1>
```
1. Here we set the legend.position to "none", which is the answer to remove it. We could also set that to "bottom" or "left" to move it, if that was our desire. It is not, but you can try it if you like.
### Facet grids
A `facet_grid()` allows you to plot on a combination of variables. We don't really have enough variables to compare with our military surplus data but we can show this with the `penguins` data we used in the last chapter. You should've added the library in your setup section.
This chart compares the flipper length and body mass, but it does so for each species of bird by their gender.
```{r}
#| label: facet-grid
#| message: false
#| warning: false
penguins |>
drop_na() |> #<1>
ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + #<2>
geom_point(aes(color = sex)) + # <3>
facet_grid(sex ~ species) # <4>
```
1. This drops any rows that have missing values for the variables we use, as some don't have sex marked.
2. Our ggplot chart, much like we did in the last chapter, comparing flippers to weight.
3. We add the points and color them to distinguish the sex of the birds.
4. We add the `facet_grid` that splits the chart by both sex and species.
## Saving plots as files
Getting a ggplot chart to look just right can be challenging because the rendering of it in your notebook is dependent on your screen size and a bunch of other variables. When I'm having trouble sizing everything right, I follow these tips:
- I use `str_wrap()` around my subtitle to keep it from running off the page.
- Once I get the labs and themes as close as I can, I push the chart into an R object.
- BEFORE printing that object to my screen, I save the plot to my hard drive using the `ggsave()` function. It's important to save it to disc before printing to your screen because that changes the sizing. Following R convention, I save plots into a folder called `figures/`. (You might need to create a "figures" folder first!)
- I then use [Markdown to print the graphic](https://quarto.org/docs/authoring/markdown-basics.html#links-images) to the notebook as an image using using this syntax. It's like making a hyperlink, but it starts with `!` before the square brackets. The text inside the square brackets end up as caption displayed below the chart: ``.
Here is how I would do this with our first line chart:
1. Make a new section saying you are practicing saving a plot
1. In your Files pane, use the **New folder** button to create a folder called `figures`.
1. Add the chunk below and run it.
```{r}
#| label: chart-to-save
tx_plot_2save <- ggplot(tx_values, aes(x = date(year), y = yearly_value)) + # <1>
geom_col() +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(
title = "Military surplus acquisitions in decline",
subtitle = str_wrap("After a spike in 2014, the value of controlled military surplus equipment transfered from the U.S. military to local law enforcement agencies has dropped in recent years. Controlled equipment includes items like weapons that must be returned to the military for disposal."),
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office"
) +
theme_minimal()
ggsave("figures/tx-plot.png")
```
Now that the chart is saved to your computer, you can use Markdown to display the image in your notebook.
1. After your code chunk add the Markdown code below. This is in the text part of your notebook like a sentence or headline, but you are inserting an image. Note I'm leaving out the caption text between the square brackets because we already have a headline on the chart.
```md

```
When you add this, it should renders like this in your notebook:

It might not look very different here, but in my experience it is the most consistent method to get a nice plot display.
Another key advantage to this is you can now use can reference that same plot image on your project index when you are explaining the findings of your project.
## Interactive plots
Want to make your plot interactive? You can use [plotly](https://plotly.com/r/)'s `ggplotly()` function to transform your graph into an interactive chart.
To use plotly, you'll want to install the plotly package, add the library, and then use the ggplotly() function:
1. In your R Console, run `install.packages("plotly")`. (You only have to do this once on your computer.)
2. Add a new section to note you are creating an interactive chart.
3. Add the code below and run it. Then play with the chart!
```{r}
#| label: plotly
#| message: false
#| warning: false
library(plotly)
region_plot <- ggplot(region_data, #<1>
aes(x = year, y = yearly_value)) +
geom_point(aes(color = state)) +#<1>
geom_line(aes(color = state)) +
scale_y_continuous(labels = label_dollar(scale_cut = cut_long_scale())) +
scale_x_date(minor_breaks = breaks_width("1 year")) +
labs(title = "Military police",
x = "Year", y = "Value of acquisitions",
caption = "Source: Law Enforcement Support Office",
color = "State")
region_plot |>
ggplotly() #<2>
```
1. We saved our good plot into an object.
2. Then we pipe that object into the `ggplotly()` function, which makes it interactive.
Now you have tool tips on your points when you hover over them.
The `ggplotly()` function is not perfect. For instance, I've shortened the title to the point of uselessness because it ran on top of the controls. It also doesn't support the labs `subtitle = ` argument. I'm sure these issues could be overcome with some effort. Alternatively, you can use plotly's own syntax to build some quite interesting charts, but it's a whole [new syntax to master](https://plotly.com/r/).
## What we learned
There is so much more to `ggplot2` than what we've shown here, but these are the basics that should get you through the class. At the top of this chapter are a list of other resources to learn more.