This repository was archived by the owner on Sep 30, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path2020-01-14_project-oriented-workflow.Rmd
166 lines (105 loc) · 4.07 KB
/
2020-01-14_project-oriented-workflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
title: Project-oriented workflow
author: Mauro Lepore
date: '2020-01-14'
slug: project-oriented-workflow
categories: []
tags:
- rstudio
- here
- fs
- workflow
description: ''
---
# 2020-01-14: Project-oriented workflow
[Project-oriented workflow](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/)
<img src="https://i.imgur.com/jKWxztR.png" align="center" width = 750 />
--Jenny Bryan
## Workflow versus product
### Definitions
* Workflow: personal taste and habits.
* Product: essence of your project.
**Don't hardwire your workflow into your product.**
### Which is workflow or product?
1. The editor you use to write your R code.
2. The raw data.
3. The name of your home directory.
4. The R code someone needs to run on your raw data to get your results, including the explicit library() calls to load necessary packages.
### Example: Remove workflow
The name of the home directory is workflow, not product.
```{r}
home <- "C:/Users/Mauro/Documents/" # Workflow
proj_path <- "path/to/project"
paste0(home, proj_path)
```
Better
```{r}
proj_path <- "path/to/project"
fs::path_home_r(proj_path)
```
Best
```{r}
fs::path_home_r("path", "to", "project")
```
## Self-contained projects
### Self-contained projects can be moved around on your computer or onto other computers and will still "just work".
> It’s like agreeing that we will all drive on the left or the right. A hallmark of civilization is following conventions that constrain your behavior a little, in the name of public safety.
--Jenny Bryan
### What do they look like?
1. The Project folder contains all relevant files.
2. Any .R can run from a fresh R process with wd set to root.
3. Any .R creates all it needs, in its own workspace or folder
4. Any .R touches nothing it didn't create (e.g. doesn't install).
### Violations ...
### What should you do instead of this?
```{r}
path_to_data <- "../datasets/my-data.csv"
```
### What should you do instead?
<img src="https://i.imgur.com/V4EkuWY.png" align="center" height = 550 />
### What should you do instead of this?
```R
pacman::p_load(random)
```
## setwd( )
### What's wrong?
```R
library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave("../figs/foofy_scatterplot.png")
```
### What's wrong?
* Paths work for nobody besides the author.
* Project not self-contained and portable.
* To run, it first needs to be hand edited.
* Suggests that the useR does all of their work in one R process:
* Unpleasant to work on more than one project at a time
* Easy for work done on one project to accidentally leak into another (e.g., objects, loaded packages, session options).
### What should you do instead?
* Use RStudio projects, and/or
* Use the here package (works well with .Rmd files)
```R
library(ggplot2)
library(here)
df <- read.delim(here("data", "raw_foofy_data.csv"))
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave(here("figs", "foofy_scatterplot.png"))
```
## rm(list = ls( ))
### What's wrong?
* Suggests the useR works in one long-running (not fresh) R process.
* Does NOT, in fact, create a fresh R process -- it only deletes objects from the global workspace but leaves stuff that make your script vulnerable to hidden dependencies (e.g. packages, options, working directory).
* Is hostile to anyone that you ask to help you with your R problems.
### What's better?
* Start from blank slate.
* Restart R very often.
* Re-run your under-development script from the top. For long running processes:
* Isolate slow bit in its own script; write it with `saveRDS()` and read it with `readRDS()`, or
* Use [drake](https://docs.ropensci.org/drake/).
### Discuss: Must have or nice to have?
> The importance of these practices has a lot to do with whether your code will be run by other people, on other machines, and in the future. If your current practices serve your purposes, then go forth and be happy
-- Jenny Bryan
### Learn more
* [What They Forgot to Teach You About R (Jenny Bryan & Jim Hester)](https://rstats.wtf/).