Skip to content

Commit 6a1bb7a

Browse files
authored
Fix some typos (#1701)
* Fix some typos * Stick with the file name `bake-sale.xlsx` instead of using `bake_sale.xlsx`
1 parent f3b95c4 commit 6a1bb7a

12 files changed

+21
-22
lines changed

EDA.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ You can see variation easily in real life; if you measure any continuous variabl
7373
This is true even if you measure quantities that are constant, like the speed of light.
7474
Each of your measurements will include a small amount of error that varies from measurement to measurement.
7575
Variables can also vary if you measure across different subjects (e.g., the eye colors of different people) or at different times (e.g., the energy levels of an electron at different moments).
76-
Every variable has its own pattern of variation, which can reveal interesting information about how that it varies between measurements on the same observation as well as across observations.
76+
Every variable has its own pattern of variation, which can reveal interesting information about how it varies between measurements on the same observation as well as across observations.
7777
The best way to understand that pattern is to visualize the distribution of the variable's values, which you've learned about in @sec-data-visualization.
7878

7979
We'll start our exploration by visualizing the distribution of weights (`carat`) of \~54,000 diamonds from the `diamonds` dataset.

data-import.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ read_csv("data/students.csv") |>
5656

5757
We can read this file into R using `read_csv()`.
5858
The first argument is the most important: the path to the file.
59-
You can think about the path as the address of the file: the file is called `students.csv` and that it lives in the `data` folder.
59+
You can think about the path as the address of the file: the file is called `students.csv` and it lives in the `data` folder.
6060

6161
```{r}
6262
#| message: true

databases.qmd

+4-5
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ There are three high level differences between data frames and database tables:
5353

5454
Databases are run by database management systems (**DBMS**'s for short), which come in three basic forms:
5555

56-
- **Client-server** DBMS's run on a powerful central server, which you connect from your computer (the client). They are great for sharing data with multiple people in an organization. Popular client-server DBMS's include PostgreSQL, MariaDB, SQL Server, and Oracle.
56+
- **Client-server** DBMS's run on a powerful central server, which you connect to from your computer (the client). They are great for sharing data with multiple people in an organization. Popular client-server DBMS's include PostgreSQL, MariaDB, SQL Server, and Oracle.
5757
- **Cloud** DBMS's, like Snowflake, Amazon's RedShift, and Google's BigQuery, are similar to client server DBMS's, but they run in the cloud. This means that they can easily handle extremely large datasets and can automatically provide more compute resources as needed.
5858
- **In-process** DBMS's, like SQLite or duckdb, run entirely on your computer. They're great for working with large datasets where you're the primary user.
5959

@@ -295,7 +295,7 @@ flights |>
295295
There are two important differences between dplyr verbs and SELECT clauses:
296296

297297
- In SQL, case doesn't matter: you can write `select`, `SELECT`, or even `SeLeCt`. In this book we'll stick with the common convention of writing SQL keywords in uppercase to distinguish them from table or variables names.
298-
- In SQL, order matters: you must always write the clauses in the order `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `ORDER BY`. Confusingly, this order doesn't match how the clauses actually evaluated which is first `FROM`, then `WHERE`, `GROUP BY`, `SELECT`, and `ORDER BY`.
298+
- In SQL, order matters: you must always write the clauses in the order `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `ORDER BY`. Confusingly, this order doesn't match how the clauses are actually evaluated which is first `FROM`, then `WHERE`, `GROUP BY`, `SELECT`, and `ORDER BY`.
299299

300300
The following sections explore each clause in more detail.
301301

@@ -385,7 +385,7 @@ diamonds_db |>
385385
show_query()
386386
```
387387

388-
We'll come back to what's happening with translation `n()` and `mean()` in @sec-sql-expressions.
388+
We'll come back to what's happening with the translation of `n()` and `mean()` in @sec-sql-expressions.
389389

390390
### WHERE
391391

@@ -656,8 +656,7 @@ dbplyr's translations are certainly not perfect, and there are many R functions
656656
In this chapter you learned how to access data from databases.
657657
We focused on dbplyr, a dplyr "backend" that allows you to write the dplyr code you're familiar with, and have it be automatically translated to SQL.
658658
We used that translation to teach you a little SQL; it's important to learn some SQL because it's *the* most commonly used language for working with data and knowing some will make it easier for you to communicate with other data folks who don't use R.
659-
If you've finished this chapter and would like to learn more about SQL.
660-
We have two recommendations:
659+
If you've finished this chapter and would like to learn more about SQL, we have two recommendations:
661660
662661
- [*SQL for Data Scientists*](https://sqlfordatascientists.com) by Renée M. P. Teate is an introduction to SQL designed specifically for the needs of data scientists, and includes examples of the sort of highly interconnected data you're likely to encounter in real organizations.
663662
- [*Practical SQL*](https://www.practicalsql.com) by Anthony DeBarros is written from the perspective of a data journalist (a data scientist specialized in telling compelling stories) and goes into more detail about getting your data into a database and running your own DBMS.

functions.qmd

+3-3
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ In this chapter, you'll learn about three useful types of functions:
2828
- Plot functions that take a data frame as input and return a plot as output.
2929

3030
Each of these sections includes many examples to help you generalize the patterns that you see.
31-
These examples wouldn't be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations.
31+
These examples wouldn't be possible without the help of folks of twitter, and we encourage you to follow the links in the comments to see the original inspirations.
3232
You might also want to read the original motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680) to see even more functions.
3333

3434
### Prerequisites
@@ -175,7 +175,7 @@ These changes illustrate an important benefit of functions: because we've moved
175175

176176
### Mutate functions
177177

178-
Now you've got the basic idea of functions, let's take a look at a whole bunch of examples.
178+
Now that you've got the basic idea of functions, let's take a look at a whole bunch of examples.
179179
We'll start by looking at "mutate" functions, i.e. functions that work well inside of `mutate()` and `filter()` because they return an output of the same length as the input.
180180

181181
Let's start with a simple variation of `rescale01()`.
@@ -460,7 +460,7 @@ diamonds |>
460460
summary6(carat)
461461
```
462462

463-
Furthermore, since the arguments to summarize are data-masking also means that the `var` argument to `summary6()` is data-masking.
463+
Furthermore, since the arguments to summarize are data-masking, so is the `var` argument to `summary6()`.
464464
That means you can also summarize computed variables:
465465

466466
```{r}

iteration.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ df_miss |>
220220

221221
If you look carefully, you might intuit that the columns are named using a glue specification (@sec-glue) like `{.col}_{.fn}` where `.col` is the name of the original column and `.fn` is the name of the function.
222222
That's not a coincidence!
223-
As you'll learn in the next section, you can use `.names` argument to supply your own glue spec.
223+
As you'll learn in the next section, you can use the `.names` argument to supply your own glue spec.
224224

225225
### Column names
226226

joins.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ knitr::include_graphics("diagrams/relational.png", dpi = 270)
111111

112112
You'll notice a nice feature in the design of these keys: the primary and foreign keys almost always have the same names, which, as you'll see shortly, will make your joining life much easier.
113113
It's also worth noting the opposite relationship: almost every variable name used in multiple tables has the same meaning in each place.
114-
There's only one exception: `year` means year of departure in `flights` and year of manufacturer in `planes`.
114+
There's only one exception: `year` means year of departure in `flights` and year manufactured in `planes`.
115115
This will become important when we start actually joining tables together.
116116

117117
### Checking primary keys

numbers.qmd

+3-3
Original file line numberDiff line numberDiff line change
@@ -449,7 +449,7 @@ df |>
449449

450450
### Offsets
451451

452-
`dplyr::lead()` and `dplyr::lag()` allow you to refer the values just before or just after the "current" value.
452+
`dplyr::lead()` and `dplyr::lag()` allow you to refer to the values just before or just after the "current" value.
453453
They return a vector of the same length as the input, padded with `NA`s at the start or end:
454454

455455
```{r}
@@ -475,7 +475,7 @@ You can lead or lag by more than one position by using the second argument, `n`.
475475
### Consecutive identifiers
476476
477477
Sometimes you want to start a new group every time some event occurs.
478-
For example, when you're looking at website data, it's common to want to break up events into sessions, where you begin a new session after gap of more than `x` minutes since the last activity.
478+
For example, when you're looking at website data, it's common to want to break up events into sessions, where you begin a new session after a gap of more than `x` minutes since the last activity.
479479
For example, imagine you have the times when someone visited a website:
480480
481481
```{r}
@@ -573,7 +573,7 @@ Here is a selection that you might find useful.
573573
574574
So far, we've mostly used `mean()` to summarize the center of a vector of values.
575575
As we've seen in @sec-sample-size, because the mean is the sum divided by the count, it is sensitive to even just a few unusually high or low values.
576-
An alternative is to use the `median()`, which finds a value that lies in the "middle" of the vector, i.e. 50% of the values is above it and 50% are below it.
576+
An alternative is to use the `median()`, which finds a value that lies in the "middle" of the vector, i.e. 50% of the values are above it and 50% are below it.
577577
Depending on the shape of the distribution of the variable you're interested in, mean or median might be a better measure of center.
578578
For example, for symmetric distributions we generally report the mean while for skewed distributions we usually report the median.
579579

program.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -49,4 +49,4 @@ The goal of these chapters is to teach you the minimum about programming that yo
4949
Once you have mastered the material here, we strongly recommend that you continue to invest in your programming skills.
5050
We've written two books that you might find helpful.
5151
[*Hands on Programming with R*](https://rstudio-education.github.io/hopr/), by Garrett Grolemund, is an introduction to R as a programming language and is a great place to start if R is your first programming language.
52-
[*Advanced R*](https://adv-r.hadley.nz/) by Hadley Wickham dives into the details of R the programming language; it's great place to start if you have existing programming experience and great next step once you've internalized the ideas in these chapters.
52+
[*Advanced R*](https://adv-r.hadley.nz/) by Hadley Wickham dives into the details of R the programming language; it's a great place to start if you have existing programming experience and a great next step once you've internalized the ideas in these chapters.

spreadsheets.qmd

+3-3
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ For the rest of the chapter we will focus on using `read_excel()`.
4646

4747
### Reading Excel spreadsheets {#sec-reading-spreadsheets-excel}
4848

49-
@fig-students-excel shows what the spreadsheet we're going to read into R looks like in Excel. This spreadsheet can be downloaded an Excel file from <https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/>.
49+
@fig-students-excel shows what the spreadsheet we're going to read into R looks like in Excel. This spreadsheet can be downloaded as an Excel file from <https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/>.
5050

5151
```{r}
5252
#| label: fig-students-excel
@@ -342,7 +342,7 @@ bake_sale <- tibble(
342342
bake_sale
343343
```
344344

345-
You can write data back to disk as an Excel file using the `write_xlsx()` from the [writexl package](https://docs.ropensci.org/writexl/):
345+
You can write data back to disk as an Excel file using the `write_xlsx()` function from the [writexl package](https://docs.ropensci.org/writexl/):
346346

347347
```{r}
348348
#| eval: false
@@ -359,7 +359,7 @@ These can be turned off by setting `col_names` and `format_headers` arguments to
359359
#| echo: false
360360
#| fig-width: 5
361361
#| fig-cap: |
362-
#| Spreadsheet called bake_sale.xlsx in Excel.
362+
#| Spreadsheet called bake-sale.xlsx in Excel.
363363
#| fig-alt: |
364364
#| Bake sale data frame created earlier in Excel.
365365

strings.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -622,7 +622,7 @@ If you don't already know the code for your language, [Wikipedia](https://en.wik
622622
Base R string functions automatically use the locale set by your operating system.
623623
This means that base R string functions do what you expect for your language, but your code might work differently if you share it with someone who lives in a different country.
624624
To avoid this problem, stringr defaults to English rules by using the "en" locale and requires you to specify the `locale` argument to override it.
625-
Fortunately, there are two sets of functions where the locale really matters: changing case and sorting.
625+
Fortunately, there are only two sets of functions where the locale really matters: changing case and sorting.
626626

627627
The rules for changing cases differ among languages.
628628
For example, Turkish has two i's: with and without a dot.

transform.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ In this part of the book, you'll learn about the most important types of variabl
1313
#| label: fig-ds-transform
1414
#| echo: false
1515
#| fig-cap: |
16-
#| The options for data transformation depends heavily on the type of
16+
#| The options for data transformation depend heavily on the type of
1717
#| data involved, the subject of this part of the book.
1818
#| fig-alt: |
1919
#| Our data science model, with transform highlighted in blue.

workflow-scripts.qmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ report-2022-04-02.qmd
153153
report-draft-notes.txt
154154
```
155155

156-
Numbering the key scripts make it obvious in which order to run them and a consistent naming scheme makes it easier to see what varies.
156+
Numbering the key scripts makes it obvious in which order to run them and a consistent naming scheme makes it easier to see what varies.
157157
Additionally, the figures are labelled similarly, the reports are distinguished by dates included in the file names, and `temp` is renamed to `report-draft-notes` to better describe its contents.
158158
If you have a lot of files in a directory, taking organization one step further and placing different types of files (scripts, figures, etc.) in different directories is recommended.
159159

0 commit comments

Comments
 (0)