dRy/02-report-template-example.qmd at main · utdata/dRy · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
title: "02-report-template-example"
params:
  album: "1989"
echo: true
---

``` yaml
title: "02-report-template-example"
params:
  album: "1989"
echo: true
```
If you are looking at this code in RStudio you might see it twice, this is so we can the YAML when we render it on the Quarto website.

What does all of this YAML mean? \

* **title**: title of file.
* **params**: what we're filtering for.
* **echo**: 'true' repeats all code chunks in render file, 'false' excludes them.


## Set up

```{r}

#| label: setup
#| message: false
#| warning: false
#| echo: false

library(tidyverse)
library(janitor)

```


### Importing our files

```{r}
#| label: import

taylor_songs <- read_rds("data-processed-taylor/taylor_disco.rds")

taylor_songs |> glimpse()

```


# Defining paramater(s)

Here we'll create our parameter

```{r}
albums <- str_split_1( #<1>
  params$album, ",") #<2>

albums
```
1. Use str_split_1 to split a single string into pieces to return a single character vector.
2. We use 'params$album' to create the string that will be read through str_split_1 . In the initial template, that will be the single parameter '1989'. The comma is used to determine when one value stops and the other begins; this is useful when you are looking for multiples of one type of parameter.


Album Names: `r params$album` \
Use this to check your work ^


# Songs from `r params$album`

Now let's filter our data for the album(s) we are looking at.

```{r}
songs <- taylor_songs |>
  filter(album %in% albums) #<1>

songs
```

1. Filtering for every instance that album (the column) is equal to albums (the parameter.)


## Let's do some analysis!

Let's look at acousticness first. On Kaggle, where we got the data, the author defines acousticness like this: "A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic."

```{r}
acousticness <- songs |> arrange(acousticness |> desc()) |>
  select(
    name, album, acousticness
  )

acousticness
```

Now I want to look at the most danceable songs. Our data author tells us: "Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable."

```{r}
danceability <- songs |> arrange(danceability |> desc()) |>
  select(
    name, album, danceability
  )

danceability
```

What about popularity? It's unclear how popularity is calculated, but it is on a scale from 0 to 100.

```{r}
popularity <- songs |> arrange(popularity |> desc()) |>
  select(
    name, album, popularity
  )

popularity
```

Now, I want to look at tempo vs danceability. We already saw the definition of danceability but the tempo is in beats per minute (BPM). Let's make a chart.

```{r}
ggplot(songs, aes(x = tempo, y = danceability)) +
  geom_point() +
  scale_x_continuous(name = "Tempo (BPM)", n.breaks = 10) +
  scale_y_continuous(name = "Danceability", limits = c(0,1)) +
  labs(title = str_wrap(str_glue("How does the tempo affect danceability of Taylor Swift songs from album(s): {album_names}", album_names = params$album)))
```

Note above, we used our `r params$album` variable so that our chart title will change based on what we input in our render file for which albums.