-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathintro.qmd
More file actions
412 lines (255 loc) · 23.8 KB
/
Copy pathintro.qmd
File metadata and controls
412 lines (255 loc) · 23.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
# Introduction to R {#intro}
## About Quarto, R and scripted journalism
Before we dive into RStudio and programming and all that, I want to show you where we are heading, so you can "visualize success". As I wrote in the book intro, I'm a believer in Scripted Journalism ... data journalism that is repeatable, transparent and annotated. As such, the whole purpose of this book is to train you **to create documents to share your work**.
The best way to explain this is to show you an example.
1. Go to this link in a new browser window: [Major League Soccer salaries](https://christianmcdonald-mls-salaries.share.connect.posit.cloud).
This is a website with all the code from a data journalism project. The home page has an overview of the findings from the analysis. If you click on the navigation link for [Cleaning](https://christianmcdonald-mls-salaries.share.connect.posit.cloud/01-cleaning.html) you can read where the data come from and see all the steps I went through -- with code and explanation -- to process the data so I could work with it. And in the [Analysis](https://christianmcdonald-mls-salaries.share.connect.posit.cloud/02-analysis.html) notebook you'll see I set out with some questions for the data, and then I wrote the code to find my answers. Along with the way I wrote explanations of how and why I did what I did.

This website was created using Quarto and R, and the tool I used to write it was the RStudio IDE. Here's the crazy part: **I didn't have to write any HTML code,** I just wrote my thoughts in text and my code in R. With a few short lines of configuration and a two-word command `quarto publish` I was able to publish my work on the Internet for free.
Keep this in mind:
- We can also easily publish the same work in other formats, like PDF, Word or even a slide show.
- We can also choose NOT to "publish" our work. We don't *have* to share our work on the internet, we are just ready if we want to.
Creating shareable work is our goal **for every project**. Let's get started.
## RStudio tour
::: callout-warning
If you are using the online **posit.cloud version of RStudio**, then some steps have to be done differently, mainly in dealing with project creation and exporting. I'll try to note differences, but may not go into great detail in the book. See the [Using posit.cloud](posit-cloud.qmd) for more details.
In this case, you need go to your posit.cloud account and then use the blue **New Project** button to launch a new RStudio project, and then continue below.
:::
When you launch RStudio, you'll get a screen that looks like this (but without the pretty arrows, which we'll cover in class).

## Updating preferences
There are some preferences in RStudio that I would like you to change.
### Saving workspace
By default, the program wants to save the state of your work (all the variables and such) when you close a project, [but that is typically not good practice](https://www.r-bloggers.com/2017/04/using-r-dont-save-your-workspace/). We'll change that.
1. Go to the **Tools** menu and choose **Global Options**.
2. Under the **General** tab, uncheck the first four boxes.
3. On the option "Save Workspace to .Rdata on exit", change that to **Never**.
4. Click *Apply* to save the change (but don't close the box yet).

### Setting native pipe
Next we will set some values in the **Code** pane.
1. On the left options, click on the **Code** pane.
2. Check the box for **Use native pipe operator, \|\>**.
3. Click **OK** to save and close the box.

We'll get into why we did this part later.
### Setting list style
Lastly, we'll set my preferred list style.
1. Click on the **R Markdown** option
1. Click on the **Visual** tab.
1. For **Default spacing between list items**, choose _tight_.

Explaining this change in full gets far into the weeds, but basically we are setting the space between list items when we write Markdown.
## The R Package environment
R is an open-source language, which means that other programmers can contribute to how it works. It is what makes R beautiful.
What happens is developers will find it difficult to do a certain task, so they will write code that solves that problem and save it into an R "package" so they can use it later. They share that code with the community, and suddenly the R garage has an ["ultimate set of tools"](https://youtu.be/Y1En6FKd5Pk?t=24) that would make Spicoli's dad proud.
One set of these tools is the [tidyverse](https://www.tidyverse.org/) developed by Hadley Wickham and other developers at Posit. It's a set of R packages for data science that are designed to work together in similar ways. I worship the tidyverse worldview and we'll use these tools extensively. While not required reading, I highly recommend Wickham's book [R for Data Science](https://r4ds.hadley.nz/), which is free.
There are also a series of useful tidyverse [cheatsheets](https://posit.co/resources/cheatsheets/) that can help you as you use the packages and functions from the tidyverse. We'll refer to these throughout the course.
We will use these tidyverse packages extensively throughout the course. We'll install some other packages later when we need them.
### How we use packages
I'll show you how in a sec, but know there are two steps to using an R package:
- **Install the package** onto your computer by using `install.packages("package_name")`. You only have to do this once for each computer, so I usually do it using the R Console instead of in a notebook.
- **Include the library** using `library(package_name)`. This has to be done for each notebook or script that uses it, so it is usually one of the first things you'll see in a notebook.
::: callout-note
We use "quotes" around the package name when we are installing, but we DON'T need quotes when we load the library.
:::
### Install some packages
We need to install some packages before we can go further. To do this, we will use the **Console**, which we haven't talked about much yet.

1. Use the image above to orient yourself to the R Console and Terminal. (But note the code we use is a little different.)
2. Copy the code below and paste it into the Console, then hit **Return**.
``` r
install.packages(c("quarto", "rmarkdown", "tidyverse", "janitor"))
```
You'll see a bunch of response fly by in the Console. It's probably all fine unless it ends the last response with an error.
If you are using the RStudio IDE app on your computer, you only have to use `install.packages()` on each package once. However, if you are using the online posit.cloud version of RStudio, you'll have to do this for **each new project** because each project is a new virtual computer. Unless you start with our template, which is explained in the [Using posit.cloud](posit-cloud.qmd) chapter.
OK, we're done with all the computer setup. Let's get to work.
## Starting a new Quarto Website
::: callout-important
This is a case where posit.cloud differs greatly. See [Using posit.cloud](posit-cloud.qmd) to see how to use our template our build a new Quarto Website project.
:::
When we work in RStudio, we will create "Projects" to hold all the files related to one another. This sets the "working directory", which is a sort of home base for the project.
1. In the icon list at the top of your program, click on the second button that is a hexagon with a green `+R` sign.
2. That brings up a box to create the project with several options. You want **New Directory** in this case.
3. For **Project Type**, choose **Quarto Website**.
4. Next, for the **Directory name**, choose a new name for your project folder. For this project, use "firstname-practice" but use YOUR firstname.
5. For the sub directory, you want to use the **Browse** button to find your `rwd` folder we created earlier.
I want you to be anal retentive about naming your folders. It's a good programming habit.
- Use lowercase characters.
- Don't use spaces. Use dashes.
- For this class, start with your first name.

When you hit **Create Project**, your RStudio window will refresh and two things will happen:
- You'll see a number of files listed in your **Files** window.
- A new document window will open on the left.
### The files pane

Let's walk through the files created and explain what they are in order of importance.
- The `_quarto.yml` file is a YAML configuration file for your project. This allows us to set publishing rules for our Quarto project. In the future we'll adjust this file a bit, but not today.
- The `index.qmd` file is a Quarto document that makes the "home page" of your website. In this book you'll use the index to describe your project and record other important information related to it. Each Quarto file you create can become a new page in your website, and they all end in `.qmd`, which stands for Quarto Markdown. (We might also encounter and edit `.Rmd` files, which are very similar, but just less awesome.) The `about.qmd` file is another Quarto document created as a placeholder. We'll usually rename/reuse that or delete it.
- The `christian-practice.Rproj` file is your **Project** file. It sets the *working directory* or what is essentially "home base" for your project. When you go to open your project again, this is the file you will open.
- `styles.css` is file where you can assert extra control on your website. We won't use it.
- The `about.qmd` files is another Quarto document created as an example for you. We'll rarely use those after today.
The big thing to remember is this: Most of our work is done in files with the `.qmd` suffix.
## The Quarto document
The document that opened on the left is our Quarto document where we will do our work. The Quarto document is a method of authoring programming documents where the result can be output in a multiple formats: As HTML, PDFs, slideshows, books, etc. In fact, this very book you are reading is written in R using Quarto.

I could wax poetic about the birth of Quarto and the evolution of what came before it, but you won't really care. Let's break this down to what you need to know:
- These documents allow us to weave together our **thoughts** and our **code**. We'll use [Markdown](https://quarto.org/docs/authoring/markdown-basics.html) to record our thoughts, and R to write our code.
- We *write* the document and we *run* the code to see the results, but all the while we are creating documents that we will be **rendered** to share with others. Our goal with these documents is to explain to an "audience" what we are doing with our data analysis. Often our most important audience is our future self.
Think of *Render* like exporting or publishing a pretty version of your work.
### Render the document
Let's see what this basic document looks like when when we Render it.
1. In the toolbar of the document you'll see a blue arrow with **Render** after it.
2. Click on that button or word.
What this should do is open up the Console below the document and you'll see a bunch of feedback, but on the right side of RStudio. You should end up with the **Viewer** pane that shows your document. Here is a tour of sorts:

For the posit.cloud users, this will launch in a new web browser window. The gear next to the Render button has the option to view previews in a new browser Window or the RStudio Viewer pane.
### The metadata
The top of our document has what we call the **metadata** of our document written in YAML. These are commands to control the output of our document. Right now it only has the title of our document.
``` txt
---
title: "christian-first-project"
---
```
When we created our project, RStudio added the name of our folder here, **which kinda sucks** because that is not what the title is for. It should describe a title for your project like the top of a Microsoft Word or Google Docs document.
1. Edit the text inside the quotes to be more like a title you would find in a Word or Google Doc document, like below, but with your name:
``` txt
---
title: "Christian's First Quarto project"
---
```
2. **Re-render** the document so you can see the updates. (You can click on the **Render** button or do **Cmd-Shift-K** to update it.)
### R code chunks
The next bit of our document shows an R chunk:

This is where we write and execute our programming code. Let's break down parts of this:
- The three tick marks \` at the beginning and end indicate where the code chunk starts and ends.
- The `{r}` part notes that this is R code. (Quarto supports other languages like `{python}`, but we'll stick with R.)
- The `1 + 1` part is the code. In this case, it is some basic math. We do cooler stuff later.
- The green triangle that points to the right will **run all the code in this chunk**.
- The gray triangle that points down with the green bar will **run all the code above this chunk**.
Let's run this chunk of code to see what happens.
1. Click on the green triangle.

Doing this executes the code that is in the R block and it will print the result to your document below the chunk. You might have noticed something similar when you rendered your document earlier.
::: callout-important
There are about five keyboard commands that I will ***implore*** you to learn. Here are the first three. Remember if you are on a PC use **Cntl** instead of **Cmd**.
- **Cmd+option+i** will insert a code block.
- **Cmd+Return** will run a single line (or selection) of code within a code block.
- **Cmd+Shift+Return** will run a whole code chunk.
:::
## Let's do some data analysis
Remember the goal of this class is to use data analysis to make sense of the world and then explain to others what we are doing. There are multiple parts to this process:
- What are we thinking? What is our goal?
- Our code that gets us those answer.
- Our thoughts about the results, if any.
Today we will explore the [socially-acceptable age to date someone older or younger than ourselves](https://www.psychologytoday.com/us/blog/meet-catch-and-keep/201405/who-is-too-young-or-too-old-you-date).
I'll give you an example of this, starting with what we are thinking.
### Use Markdown to describe thoughts, goals
1. Copy the text below. Note you can use the copy-clipboard button in the book if you roll your cursor over the code and click on the clipboard icon at the top right.
2. Paste the code at the bottom of your notebook.
3. Re-render your document to see what it looks like.
``` txt
## My upper dating age
The following section details the [socially-acceptable maximum age of anyone you should date](https://en.wikipedia.org/wiki/Age_disparity_in_sexual_relationships#%22Half-your-age-plus-seven%22_rule).
The math works like this:
- Take your age
- subtract 7
- Double the result
```
Let's walk through this Markdown code. You might bookmark this [Markdown basics](https://quarto.org/docs/authoring/markdown-basics.html) page so you can refer back to it as you learn.
- The `##` line is a "Header 2" headline, meaning it is the second biggest. (The title is an H1/biggest.) The more hashmarks `###` you add the smaller headline you get.
- There is a full blank return between each element, including paragraphs of text. The exception is the bullet list.
- In the first paragraph we have embedded a hyperlink. We put the words we want to show inside square brackets and the URL in parenthesis DIRECTLY after the closing square bracket: `[words to link](https://the_url.org)`.
- Using `-` at the beginning of a line creates a bullet list. (You can also use `*`). Those lines need to be one after another without blank lines.
::: callout-note
I should note at this point there is a "Visual" editor where RStudio gives you more formatted look as your editing as it writes Markdown underneath the hood. I want you to use the "Source" editor so you can see and learn the underlying Markdown syntax. All my examples will be written in "Source" mode. After you pass this class, you can use "Visual" mode.
:::
### Adding new code chunks
Let's add a new R chunk to add the code that will calculate the maximum age of someone we should date.
1. Add a few blank lines after your Markdown text.
2. Use the keyboard command *Cmd+Option+i* to add an R chunk.
3. Your cursor will be inserted into the middle of the chunk. Type or copy this code in the space provided:
```{r}
#| output: false
# update 21 to your actual age
age <- 21
(age - 7) * 2
```
4. Change the number to your real age.
5. With your cursor somewhere in the code block, use the key command *Cmd+Shift+Return*, which is the key command to RUN ALL LINES of code chunk.
Congratulations! The answer given at the bottom of that code chunk is the upper end age of someone socially acceptable for you to date.
Throwing aside whether the formula is sound, let's break down the code.
- `# update 21 to your age` is a comment. It's a way to explain what is happening in the code without being considered part of the code. We create comments inside code chunks by starting with `#`. You can also add a comment at the end of a line. Comments will appear in grey text in your code chunks if formatted correctly.
- `age <- 21` is assigning a number (`21`) to an R object/variable called (`age`). A variable is a placeholder. It can hold numbers, text or even groups of numbers. Variables are key to programming because they allow you to change a value as you go along.
- The next part is simple math: `(age - 7) * 2` takes the value of `age` and subtracts `7`, then multiplies by `2`.
- When you run it, you get below it the result of the math equation, `[1] 28` in this case. That means there was one observation \[1\], and the value was "28", meaning a 21-year-old shouldn't date anyone older than 28.
Now you can play with the number assigned to the age variable to test out different ages.
### Practice adding code chunks
Now, I want you to add a similar section that calculates the **minimum** age of someone you should date, but using the formula `(age / 2) + 7`.
1. Add a Markdown headline and text describing what you are doing.
2. Use the keyboard command to create a new code chunk.
3. Include a comment within the code block that explains you are using the `age` variable already established.
4. Add the math statement `(age / 2) + 7` to the chunk.
5. Run the chunk and re-render your document.
```{r}
#| code-fold: true
#| code-summary: "Try it on your own before you peek here!"
#| output: false
# Using age variable from above
(age / 2) + 7
```
Now you know the youngest a person should be that you date. FWIW, we don't need to recreate the `age` variable since we already have it. A Quarto document is designed to run from top to bottom so that all the pieces work together.
### The toolbar
One last thing to point out in the document window: The toolbar that runs across the top of the document window. The image below explains some of the more useful tools, but you *REALLY* should learn and use the keyboard commands instead.

## A quick look back at files
We've been concentrating on editing the Quarto document, but let's peek back at the Files pane to note some new files that were created.
1. In the window with the viewer, you'll notice several other tabs: Files, Plots, Packages, etc. Click on the **Files** tab.

Note there is a new folder there, `_site`. All your rendered versions to into the folder to make it easy to share or publish them.
With our next project, we'll create some other folders to store our data and such.
## Publish your project
Let's publish our work to the Internet!!!!
1. Go to [Posit's Connect Cloud](https://connect.posit.cloud/) and sign up for a free account.
2. Come back to RStudio and down by the Console, click on the **Terminal** [^intro-1] pane.
3. Type in `quarto publish` and hit **Return** on your keyboard.
4. Use your arrow key to click down to **Posit Connect Cloud** (NOT just Posit Connect)
5. It will ask you to authorize, and you need to click return ...
6. And then it will open a browser window where the authentication code is pre-entered. You should be able to just click **Continue** and then **Authorize**.
7. When you get back to your RStudio Terminal, it should be publishing your page. Once it is done, it will reply with a URL that you can click on to get to your project.
8. This takes you to the _Settings_ for that project. To get the PUBLIC url, click on the **Gear icon** to get the settings, then choose the **URL** option.
[^intro-1]: The Terminal is where you can send commands to your computer using text vs. pointing and clicking through "normal" actions. It's super powerful and useful, but this is probably the only terminal command we'll use.
A bunch of stuff will happen in your Terminal pane. Here is a video showing this:
{{< video https://www.youtube.com/watch?v=ArWYlOJhGSI >}}
A couple of things about this the video clip above:
- The first time you run `quarto publish`, you'll be asked to authenticate to Posit Connect Cloud, so it will be a little different.
- I forgot to show your project index page, but you can just click on **Home** to see all your projects.
## On your own: Update the About page
Try some things on your own! Go into the `about.qmd` page and write some things about yourself, like your favorite hobbies. Use the [Markdown](https://quarto.org/docs/authoring/markdown-basics.html) guide to write headlines, lists and maybe even try an image? (You can use a URL from an image on the web.)
- Re-render your pages
- Re-publish your pages by going back to Terminal and using the `quarto publish` command.
## Turning in our projects
::: callout-note
This is a bit different for posit.cloud users. See below.
:::
The best way to turn in all of those files into Canvas is to compress the project folder into a single `.zip` file that you can upload to the assignment.
1. Make sure all your files are saved and rendered.
1. In your computer's Finder, open the `Documents/rwd` folder.
2. Follow the directions for your operating system linked below to create a compressed version of your `yourname-final-project` folder.
- [Compress files on a Mac](https://www.macinstruct.com/tutorials/how-to-compress-zip-files-and-folders-on-a-mac/).
- [Compress flies on Windows](https://www.laptopmag.com/articles/how-to-zip-files-windows-10).
3. Upload the resulting `.zip` file to the assignment for this week in Canvas.
If you later make changes to your R files after you've zipped your folder, you'll need to to `zip` it again. Make sure you send me the new version (or delete the old one first).
Because we are building "repeatable" code, I'll be able to download your `.zip` files, uncompress them, and the re-run them to get the same results.
Well done!
### Exporting for posit.cloud
Exporting is a little different for **posit.cloud users**. You can export your project pretty easily by following the directions in this screenshot:

This will save the `.zip` file to your downloads folder. Submit that `.zip` file to Canvas.
## Review of what we've learned so far
- You've learned about the RStudio IDE and how you write and render R code with it.
- You used `install.packages()` to download R packages to your computer. Typically executed from within the Console and only once per computer. We installed a lot of packages including the [tidyverse](https://www.tidyverse.org/packages/).
- You created a Quarto Website and related Quarto documents with code on them.
- You published your work on Posit Connect Cloud to share over the internet.