-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDerivsNLS.Rmd
458 lines (364 loc) · 19.6 KB
/
DerivsNLS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
---
title: "Jacobian Calculations for nls()"
author:
- Arkajyoti Bhattacharjee, Indian Institute of Technology, Kanpur
- John C. Nash, University of Ottawa, Canada
date: "26/05/2021"
output:
pdf_document:
keep_tex: false
toc: true
bibliography: ImproveNLS.bib
---
<!-- - Heather Turner, University of Warwick, UK -->
```{r setup, include=FALSE}
rm(list=ls()) # clear the workspace for this document
knitr::opts_chunk$set(echo = TRUE)
## setup for this document
library(nlsr) # So we have the analytic/symbolic derivatives
library(numDeriv) # numerical derivatives package
library(microbenchmark) # timing
printsum <- function(xx){ print(summary(xx))} # May be needed
traceval <- TRUE # traceval set TRUE to debug or give full history
# Set to FALSE when we don't need extensive output
```
# ISSUES
- ExDerivs.R file causes a number of failures in the ORIGINAL numericDeriv.
- Need to verify nlsalt:: version of numericDeriv() matches all cases of nlspkg:: version
- Do we need to get a model frame? How? and How to use it?
## TODOS (mostly from nlsr vignette nlsr-devdoc.Rmd)
- how to insert numerical derivatives when Deriv unable to get result (nlsr)
- approximations for jacfn beyond fwd approximation. How to specify??
- how to force numerical approximations in nlfb() in a
manner consistent with that used in `optimx::optimr()`, that is, to
surround the name of `jacfn` with quotes if it is a numerical approximation,
or to provide a logical control to `nlxb()` for this purpose.
# Jacobians in nls()
This document source is in file **DerivsNLS.Rmd**.
`nls()` and other nonlinear least squares programs in R need a Jacobian matrix
calculated at the current set of trial nonlinear model parameters to set up
the Gauss-Newton equations or their stabilized modifications in methods such as
that of Marquardt (@Marquardt1963). Unfortunately, `nls()` calls
the Jacobian the "gradient", and uses function `numericDeriv()` to compute them.
This document is an attempt to describe different ways to compute the Jacobian
for use in nls() and related software, and to evaluate these approaches from
several perspectives.
In evaluating performance, we need to know the conditions under which the evaluation
was conducted. Thus the computations included in this document, which is built using
`Rmarkdown`, are specific to the computer in which the document is processed. We
will add tables that give the results for different computing environments at the
bottom.
# An example problem
We will use the Hobbs weed infestation problem (@jncnm79, page 120).
```{r hobbsex}
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr.
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt)) # formula -- display structure with str(eunsc)
# Can we convert a string form of this "model" to a formula
ceunsc <- " y ~ b1/(1+b2*exp(-b3*tt))" # This will give character form: str(ceunsc)
# Next line Will be TRUE if we have made the conversion OK
print(as.formula(ceunsc)==eunsc)
weeddata1 <- data.frame(y=ydat, tt=tdat) ## LOCAL DATA IN DATA FRAMES
weedenv <- list2env(weeddata1) ## Put data in an Environment
# Add the parameter data as "variables"
weedenv$b1 <- start1[[1]]; weedenv$b2 <- start1[[2]]; weedenv$b3 <- start1[[3]]
# Display content of the Environment with ## ls.str(weedenv)
# We are now set up for computations
```
# Tools for Jacobians
There are a number of ways to get the Jacobian in R.
## numericDeriv() original version from base R
`numericDeriv` is the R function used by `nls()` to evaluate Jacobians for its Gauss-Newton
equations. The R source code is in the file `nls.R`. It calls a C function numeric_deriv
in `nls.c`. These have been extracted in an R package form as `nlspkg` by Duncan Murdoch
as described in our document **PkgFromRbase.Rmd: Making a package from base R files**, and
we will use that version.
In the following we will test and time `numericDeriv()` along with various of its options.
```{r nd1}
rexpr<-call("-",eunsc[[3]], eunsc[[2]]) # Generate the residual "call"
res0<-eval(rexpr, weedenv) # Get the residuals
print(res0) # the base residuals
cat("Sumsquares at 1,1,1 is ",sum(res0^2),"\n")
treseval<-microbenchmark(res0<-eval(rexpr, weedenv))
print(treseval)
rexpr<-call("-",eunsc[[3]], eunsc[[2]]) # This is the "call" that computes the residual
## Try the numericDeriv option
theta<-names(start1)
## suppressMessages(library(nlspkg))
suppressMessages(ndnls<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv))
print(ndnls)
print(sum(ndnls^2))
tndnls<-microbenchmark(ndnls<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv))
print(tndnls)
## numericDeriv also has central difference option, as well as choice of eps parameter
## Central diff
ndnlsc<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE)
print(ndnlsc)
print(sum(ndnlsc^2))
tndnlsc<-microbenchmark(ndnlsc<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE))
print(tndnlsc)
## Forward diff with smaller eps
ndnlsx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10)
print(ndnlsx)
print(sum(ndnlsx^2))
tndnlsx<-microbenchmark(ndnlsx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10))
print(tndnlsx)
## Central diff with smaller eps
ndnlscx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10)
print(ndnlscx)
print(sum(ndnlscx^2))
tndnlscx<-microbenchmark(ndnlscx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10))
print(tndnlscx)
## Add dir parameter -- the direction of the parameter shift
ndnlsd<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, dir=-1)
# Does dir make a difference? This might be accidental for forward difference.
max(abs(ndnlsd-ndnls))
ndnlscd<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, dir=-1)
# Does dir make a difference? For central diff it should NOT!
max(abs(ndnlscd-ndnlsc))
```
## numericDeriv() alternative pure-R version
This version (see Appendix 2) has C code replaced with R equivalents.
```{r and1}
## Try ExDerivs.R ??
suppressMessages(andnls<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv))
# print(andnls); print(sum(andnls^2))
tandnls<-microbenchmark(andnls<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv))
print(tandnls)
## numericDeriv also has central difference option, as well as choice of eps parameter
## Central diff
andnlsc<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE)
ndnlsc<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE)
# print(andnlsc); print(sum(andnlsc^2))
tandnlsc<-microbenchmark(andnlsc<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE))
print(tandnlsc)
## Forward diff with smaller eps
andnlsx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10)
ndnlsx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10)
# print(andnlsx); print(sum(andnlsx^2))
tandnlsx<-microbenchmark(andnlsx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10))
print(tandnlsx)
## Central diff with smaller eps
andnlscx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10)
ndnlscx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10)
# print(andnlscx) ; print(sum(andnlscx^2))
tandnlscx<-microbenchmark(andnlscx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10))
print(tandnlscx)
## Comparisons for Jacobian between nlspkg and nlsalt i.e. R&C vs just R
max(abs(attr(ndnls, "gradient")-attr(andnls,"gradient")))
max(abs(attr(ndnlsc, "gradient")-attr(andnlsc,"gradient")))
max(abs(attr(ndnlsx, "gradient")-attr(andnlsx,"gradient")))
max(abs(attr(ndnlscx, "gradient")-attr(andnlscx,"gradient")))
## Using dir
cat("eps (regular) = ",.Machine$double.eps^(1/2),
" eps (central) =",.Machine$double.eps^(1/3),"\n")
andnlsd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, dir=-1)
max(abs(attr(andnlsd, "gradient")-attr(ndnls,"gradient")))
max(abs(attr(andnlsd, "gradient")-attr(andnls,"gradient")))
andnlscd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, dir=-1)
max(abs(attr(andnlscd, "gradient")-attr(ndnlsc,"gradient")))
## Try comparisons over different eps sizes
for (ee in 3:10){
andnlsd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, dir=-1, eps=10^(-ee))
andnlscd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, dir=-1, eps=10^(-ee))
andnlsx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, eps=10^(-ee))
andnlscx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=10^(-ee))
cat("Regular diff, eps=10^(-",ee,"):",
max(abs(attr(andnlsd,"gradient")-attr(andnlsx,"gradient"))),"\n")
cat("Central diff, eps=10^(-",ee,"):",
max(abs(attr(andnlscd,"gradient")-attr(andnlscx,"gradient"))),"\n")
}
```
The `dir` parameter allows us to use a backward difference for the derivative. This
appears in `nlsModel()` for the case where a parameter is on an upper bound for
the case `algorithm="port"`. It does not check for nearness to the bound, and
for the lower bound assumes that we are stepping AWAY from the bound in the
default direction (`dir=+1`). None of the code addresses the issue where bounds
are closer together than the step used for the finite difference, so there are
situations where we could crash the code. Nor does the code check if the central
difference is specified when near a bound.
- In the case of lower bounds, a central difference can overstep the bound when
a parameter is "close" or on the bound.
- In the case of an upper bound, changing the `dir` will not change the central
derivative approximation expression and steps in both forward and backward
directions of the parameter are taken.
## Symbolic methods from `nlsr`
The package `nlsr` has a function `model2rjfun()` that converts an expression
describing how the residual functions are computed into an R function that
computes the residuals at a particular set of parameters and sets the
**attribute** "gradient" of the vector of residual values to the Jacobian at
the particular set of parameters. `model2rjfun()` does much the same work
as the `res0<-eval(rexpr, weedenv)` expression evaluation, but adds derivative
expressions to the function.
```{r nlsr1}
# nlsr has function model2rjfun. We can evaluate just the residuals
res0<-model2rjfun(eunsc, start1, data=weeddata1, jacobian=FALSE)
res0(start1)
tres0nlsr<-microbenchmark(res0(start1)) # time it
print(tres0nlsr)
# or the residuals and jacobian
funsc <- model2rjfun(eunsc, start1, data=weeddata1) # from nlsr: creates a function
tmodel2rjfun <- microbenchmark(model2rjfun(eunsc, start1, data=weeddata1))
print(tmodel2rjfun)
## Ways to display information about the residual/jacobian function
# print(funsc); print(funsc(start1)); print(environment(funsc)); print(ls.str(environment(funsc)))
# print(ls(environment(funsc)$data)); eval(eunsc, environment(funsc))
vfunsc<-funsc(start1)
print(vfunsc)
tfunsc<-microbenchmark(funsc(start1))
print(tfunsc)
```
# `numDeriv` package
The package `numDeriv` includes a function `jacobian()` that acts on a user
function `resid()` to produce the Jacobian at a set of parameters by several
choices of approximation.
```{r numDeriv1}
# We use the residual function (without gradient attribute) from nlsr
jnumd<-jacobian(res0, start1) # uses default "Richardson" method
jnumd
# Timings of the analytic jacobian calculations
tjnumd<-microbenchmark(jnumd<-jacobian(res0, start1))
print(tjnumd)
jnumds<-jacobian(res0, start1, method="simple") # uses default "Richardson" method
jnumds
# Timings of the analytic jacobian calculations
tjnumds<-microbenchmark(jnumds<-jacobian(res0, start1, method="simple"))
print(tjnumds)
jnumdc<-jacobian(res0, start1, method="complex") # uses default "Richardson" method
jnumdc
# Timings of the analytic jacobian calculations
tjnumdc<-microbenchmark(jnumdc<-jacobian(res0, start1, method="complex"))
print(tjnumdc)
```
Note that the manual pages for `numDeriv` offer many options for the functions in
the package. We have yet to explore many of these.
# Comparisons
In the following, we are comparing to `vfunsc`, which is the evaluated
residual vector at `start1=c(1,1,1)` with "gradient" attribute (jacobian)
included, as developed using package `nlsr`. This is taken as the "correct"
result, even though it is possible that the generated order of calculations
may introduce inaccuracies in the supposedly analytic derivatives.
`numericDeriv` computes a similar structure (residuals with "gradient" attribute):
`ndnlsc`: the forward difference result with default `eps` (.Machine$double.eps^(1/2))
`ndnlsc2`: Central difference with default `eps` (.Machine$double.eps^(1/3))
`ndnlscx`: Forward difference with smaller eps=1e-10
`ndnlscx2`: Central difference with smaller eps=1e-10
`jnumd`: numDeriv::jacobian() result with default settings.
```{r compjac1}
## Matrix comparisons -- uncomment code to show these, which use page space
# attr(ndnls, "gradient")-attr(vfunsc,"gradient")
# attr(ndnlsc, "gradient")-attr(vfunsc,"gradient")
# attr(ndnlsx, "gradient")-attr(vfunsc,"gradient")
# attr(ndnlscx, "gradient")-attr(vfunsc,"gradient")
# jnumd - attr(vfunsc,"gradient")
# jnumds - attr(vfunsc,"gradient")
# jnumdc - attr(vfunsc,"gradient")
## Summary comparisons - maximum absolute differences
max(abs(attr(ndnls, "gradient")-attr(vfunsc,"gradient")))
max(abs(attr(ndnlsc, "gradient")-attr(vfunsc,"gradient")))
max(abs(attr(ndnlsx, "gradient")-attr(vfunsc,"gradient")))
max(abs(attr(ndnlscx, "gradient")-attr(vfunsc,"gradient")))
max(abs(jnumd - attr(vfunsc,"gradient")))
max(abs(jnumds - attr(vfunsc,"gradient")))
max(abs(jnumdc - attr(vfunsc,"gradient")))
```
## Observations
Some particular notes:
- the mean time for the default `numericDeriv()` of `nls()` is quite
fast and its coefficient of variation (sd/mean) is around 0.42. The
timings are actually very slightly faster than the analytic expressions
of `nlsr`, but the latter has a COV of 0.36.
- this default method Jacobian unfortunately deviates from the analytical
computation by a relatively large amount (of the order of 1e-6 for our
example).
- the central difference version of `numericDeriv()` does better (about
three orders of magnitude smaller deviation from the analytic result),
and the time is comparable with the analytic evaluation.
- making the `eps` parameter smaller degrades the accuracy of the Jacobian
computed via `numericDeriv()`. This may be counter-intuitive for those
unfamiliar with numerical methods. Essentially, a smaller `eps` results in
subtraction of very close values for the residuals.
- the "simple" option of the `numDeriv` function `jacobian()` gives similarly
poor accuracy.
- the "Richardson" (default) results of `numDeriv` are of similar accuracy to
the central difference option of `numericDeriv()` but at a much greater time
cost -- about 23 times slower.
- on the other hand, the "complex" option gets an essentially analytic result,
approximately 8 orders of magnitude better than the central difference approximation
of `numericDeriv`, for a time cost only 3.5 times as great. Unfortunately, not all
models are amenable to the complex step approximation.
# Cautionary notes on performance results
The results here have been evaluated on a single computer. In fact, while we could
process the Rmarkdown file on any of several machines, the work was mainly carried out
on a machine characterized with the string
```
M21:john-Linux-5.11.0-25-generic|Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz|33482145792bytesRAM
```
This is a relatively capable tower PC, but otherwise unremarkable.
Nevertheless, we found that running the timings more than once when other tasks were in
progress did result in variations in the mean and standard deviation of the timings of several
percentage points. We would expect both absolute differences in times and changes in relative
performance with different processors and operating systems, and had thought to carry out some
investigation of such differences. However, such effort seems less valuable than pursuing more
capable nonlinear least squares and derivative code.
# Some notes on derivative computation for nonlinear least squares
In no particular order, we comment on some issues relating to the Jacobian calculations
in nonlinear least squares.
## Nomenclature
R is not in step with many other areas of numerical computation when labeling different
objects in the nonlinear least squares problem. In particular, R uses the term "gradient"
when the object of interest is the Jacobian matrix. In that it is useful in performing
iterations of the Gauss-Newton or related equations to have the Jacobian associated with
the residuals, and the rows of the Jacobian matrix are "gradients" of the respective
residuals, we can accept the attribute name "gradient" to select the required information.
Moreover, as in package `nlsr` it is very useful to have the Jacobian matrix as an
attribute of the residual vector, since the main solver function, in this case `nlsr::nlfb()`,
can be called with the same input for the arguments `res` and `jac`. These are the functions
required to compute the residual and the Jacobian, and using the same function for both is
very convenient, but needs some way to return both the residual vector and Jacobian matrix
in a coherent fashion.
## Numerical approximation near constraints
As far as we are aware there is no software that implements a fully safeguarded system to
compute numerical approximation of the Jacobian (or gradients in general optimization) near
constraints. The same statement applies even in the case of the much simpler bounds constraints.
Users have a perverse tendency to devise ways to foil our best efforts. For example, they may
decide that a good way to specify fixed (i.e., masked) parameters that they do not want to vary
during a particular calculation is to specify the lower and upper bound of a parameter at the
same value. Later runs may want the parameter constraints relaxed.
In `nlsr::nlxb()`, users may, in fact, specify masked parameters this way. This is a case of
"if you can't beat them, join them", but it does provide an easily understood way for users to
fix values.
More tricky is dealing with constraints that are close together. Note that these may arise from,
for example, two linear (planar) constraints that approach at a narrow angle. In the apex where
these constraints intersect, we will have tight bounds on parameters. If the constraint is not
one that is imposed by the nature of the residual or objective function, for example, a log()
or square root near zero, then we can generally proceed and allow the derivative approximation
to evaluate outside the constraints. Things are decidedly nastier if we do have inadmissible
values of the parameters. This is where analytic Jacobian evaluation is very helpful.
The issue of constraints and the need for a step in parameter values for derivative approximations
was one of the motivations for trying to find analytic derivatives in package `nlsr` and the
continuing effort to bring them into other R tools.
## A case where a parameter is close to inadmissible
The following example shows that numericDeriv() does a reasonable job of computing
the Jacobian, but the result is still singular.
```{r code=xfun::read_utf8('Tests/Examples/badJlogmod.R')}
```
# Appendix 1: Base R numericDeriv code
This code is in two files, nls.R and nls.c and is extracted here.
## From nls.R
```{r nlsR, comment=NA, echo=FALSE}
cat(readLines('archived/nls-numericDeriv.R'), sep = '\n')
```
## From nls.c
```{r nlsc, comment=NA, echo=FALSE}
cat(readLines('archived/nls-numericDeriv.c'), sep = '\n')
```
# Appendix 2: numericDeriv() from nlsalt package (all in R)
```{r nlsndnew, comment=NA, echo=FALSE}
cat(readLines('./nlsalt/R/nlsnd.R'), sep = '\n')
```
# References