-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
365 lines (252 loc) · 8.18 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
[![Travis-CI Build Status](https://travis-ci.org/arendsee/rmonad.svg?branch=master)](https://travis-ci.org/arendsee/rmonad)
[![Coverage Status](https://img.shields.io/codecov/c/github/arendsee/rmonad/master.svg)](https://codecov.io/github/arendsee/rmonad?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/rmonad)](https://cran.r-project.org/package=rmonad)
[![CRAN downloads](http://cranlogs.r-pkg.org/badges/rmonad)](http://cran.rstudio.com/web/packages/rmonad/index.html)
[![total downloads](http://cranlogs.r-pkg.org/badges/grand-total/rmonad)](http://cranlogs.r-pkg.org/badges/grand-total/rmonad)
[![DOI](https://zenodo.org/badge/95687512.svg)](https://zenodo.org/badge/latestdoi/95687512)
```{r, echo = FALSE}
set.seed(210)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# `rmonad`
Chain monadic sequences into stateful, branching pipelines. As nodes in the
pipeline are run, they are merged into a graph of all past operations. The
resulting structure can be computed on to access not only the final results,
but also node documentation, intermediate data, performance stats, and any raised
messages, warnings or errors. `rmonad` intercepts all exceptions, which allows
for pure error handling.
`rmond` complements, rather than competes with non-monadic pipelines packages
such as `magrittr` or `pipeR`. These can be used to perform operations where
preservation of state is not desired. Also they are needed to operate on
monadic containers themselves.
## Funding
This work is funded by the National Science Foundation grant:
[NSF-IOS 1546858](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546858)
Orphan Genes: An Untapped Genetic Reservoir of Novel Traits
## Installation
You can install from CRAN with:
```{r cran-installation, eval=FALSE}
install.packages("rmonad")
```
The newest `rmonad` code will always be in the github `dev` branch.
You can install this with:
```{r gh-installation, eval = FALSE}
devtools::install_github("arendsee/rmonad", ref="dev")
```
## Examples
For details, see the vignette. Here are a few excerpts
```{r}
library(rmonad)
```
### Record history and access inner values
```{r}
1:5 %>>%
sqrt %v>% # record an intermediate value
sqrt %>>%
sqrt
```
### Add effects inside a pipeline
```{r, eval=FALSE}
# Both plots and summarizes an input table
cars %>_% plot(xlab="index", ylab="value") %>>% summary
```
### Use first successful result
```{r}
x <- list()
# return first value in a list, otherwise return NULL
if(length(x) > 0) {
x[[1]]
} else {
NULL
}
# this does the same
x[[1]] %||% NULL %>% esc
```
### Independent evaluation of multiple expressions
```{r}
funnel(
runif(5),
stop("stop, drop and die"),
runif("df"),
1:10
)
```
### Build branching pipelines
```{r, eval=FALSE}
funnel(
read.csv("a.csv") %>>% do_analysis_a,
read.csv("b.csv") %>>% do_analysis_b,
k = 5
) %*>% joint_analysis
```
```{r, eval=FALSE}
foo <- {
"This is nothing"
NA
} %>>% {
"This the length of nothing"
length(.)
}
bar <- {
"These are cars"
cars
} %>>% {
"There are this many of them"
length(.)
}
baz <- "oz" %>>%
funnel(f=foo, b=bar) %*>%
{
"This definitely won't work"
. + f + b
}
```
### Caches, tags, and views
`rmonad` provides a flexible system for managing caches and tagging nodes for
later access.
```{r}
# tag each step you want to reuse
evalwrap(256) %>% tag('a1') %>>% sqrt %>% tag('a2') %__%
evalwrap(144) %>% tag('b1') %>>% sqrt %>% tag('b2') %__%
evalwrap(333) %>% tag('c') -> m
# sum values across three nodes of the pipeline
funnel(view(m, 'a2'), view(m, 'b2'), view(m, 'c')) %*>% sum %>% plot(label='value')
```
### Chain independent pipelines, with documentation
```{r}
analysis <-
{
"This analysis begins with 5 uniform random variables"
runif(5)
} %>>% '^'(2) %>>% sum %__%
{
"The next step is to take 6 normal random variables"
rnorm(6)
} %>>% '^'(2) %>>% sum %__%
{
"And this is were the magic happens, we take 'a' random normal variables"
rnorm("a")
} %>>% '^'(2) %>>% sum %__%
{
"Then, just for good measure, we toss in six exponentials"
rexp(6)
} %>>% '^'(2) %>>% sum
analysis
```
### Add metadata to chunk
```{r, eval=FALSE}
evalwrap({
"This is data describing a chunk"
list(
foo = "this is metadata, you can put anything you want in here",
bar = "maybe can pass parameters to an Rmarkdown chunk",
baz = "or store stuff in state, for example:",
sysinfo = devtools::session_info()
)
# this is the actual thing computed
1 + 1
})
```
### Build Markdown report from a pipeline
`rmonad` stores the description of a pipeline as a graphical object. This
object may be queried to access all data needed to build a report. These could
be detailed reports where the code, documentation, and metadata for every node
is written to a linked HTML file. Or a report may be more specialized, e.g. a
benchmarking or debugging report. A report generating function may be branched,
with certain elements generated only if some condition is met. Overall,
`rmonad` offers a more dynamic approach to literate programming.
This potential is mostly unrealized currently. `rmonad` offers the prototype
report generator `mreport`.
```{r, eval=FALSE}
x <-
{
"# Report
This is a pipeline report
"
} %__% {
"this is a docstring"
5
} %>>% {
"this is too"
sqrt(.)
} %>_% {
"# Conclusion
optional closing remarks
"
NULL
}
report(x)
```
### Graphing pipelines
Internally an `Rmonad` object wraps an `igraph` object, and can thus be easily
plotted:
```{r workflow-plot}
# here I use the `->` operator, which is the little known twin of `<-`.
funnel(
"a" %v>% paste("b"), # %v>% stores the input (%>>% doesn't)
"c" %v>% paste("d")
) %*>% # %*>% bind argument list from funnel to paste
paste %>% # funnel joins monads, so we pass in the full monad here, with
funnel( # '%>%', rather than use '%>>'% to get the wrapped value
"e" %v>% paste("f"),
"g" %v>% paste("h")
) %*>%
paste %>% # the remaining steps are all operating _on_ the monad
plot(label='value')
```
Nested pipelines can also be plotted:
```{r nested-workflow-plot}
foo <- function(x){
'c' %v>% paste(x) %v>% paste('d')
}
'a' %v>% foo %>% plot(label='value')
```
### Docstrings
This allows chunks of code to be annotated without the extra boilerplate of
`%>%doc(...`, that was used in the previous example.
```{r, eval=FALSE}
{
"This is a docstring"
1
} %>>% {
"The docstrings can be used to document specific chunks of code. It is a lot
cleaner than piping the monad into the `doc` function."
( . + . ) * ( . + . )
} %>_% {
"If you are interested in docstrings and the newer rmonad features, see the
github dev branch"
NULL
}
```
## Scaling up
`rmonad` can be used to mediate very large pipelines. Below is a plot of an in
house pipeline. Green nodes are passing and yellow nodes produced warnings.
![Plot of a large rmonad pipeline](README-big-pipeline.png)
## Recursion
```{r recursion}
countdown <- function(x) {
x %>_% {if(. == 0) stop('boom')} %>>% { countdown(.-1) }
}
10 %>>% countdown %>% plot
```
## rmonad v0.6.0 aspirational goals
- [ ] Record all operations, even those not run. Currently if an input to a
node fails, the node is ignored. So the ultimate graph is truncated at the
first error.
- [ ] Add function to align two `rmonad` pipelines. This function would be the
basis for `diff` and `patch` functions. Where a `patch` function takes an
unevaluated `rmonad` object, aligns it to a broken pipeline, and resumes
evaluation from the failing nodes using the patch object code.
- [ ] Full code regeneration from the `rmonad` object. Currently `rmonad`
stores each node's code, but it loses information.
- [ ] Store file and line number when possible (e.g. if given a source).
- [ ] Job submission handling
- [ ] Add a shiny app for interactive exploration of a pipeline
- [ ] Use `DiagrammeR` for plotting. I stopped using it when I rebuilt the
internals using `igraph`. I thought it would just be easier to use the
`igraph` plot functions. However, making `igraph` plots that are
consistently good across scales has been difficult.