DerivsNLS.Rmd

---
title: "Jacobian Calculations for nls()"
author:
  - Arkajyoti Bhattacharjee, Indian Institute of Technology, Kanpur
  - John C. Nash, University of Ottawa, Canada
date: "26/05/2021"
output: 
    pdf_document:
        keep_tex: false
        toc: true
bibliography: ImproveNLS.bib
---

<!-- - Heather Turner, University of Warwick, UK -->

```{r setup, include=FALSE}
rm(list=ls()) # clear the workspace for this document
knitr::opts_chunk$set(echo = TRUE)
## setup for this document
library(nlsr)  # So we have the analytic/symbolic derivatives
library(numDeriv) # numerical derivatives package
library(microbenchmark)  # timing

printsum <- function(xx){ print(summary(xx))} # May be needed
traceval  <-  TRUE  # traceval set TRUE to debug or give full history
#  Set to FALSE when we don't need extensive output
```

# ISSUES

- ExDerivs.R file causes a number of failures in the ORIGINAL numericDeriv.

- Need to verify nlsalt:: version of numericDeriv() matches all cases of nlspkg:: version

- Do we need to get a model frame? How? and How to use it?


## TODOS (mostly from nlsr vignette nlsr-devdoc.Rmd)

- how to insert numerical derivatives when Deriv unable to get result (nlsr)
- approximations for jacfn beyond fwd approximation. How to specify??
- how to force numerical approximations in nlfb() in a 
  manner consistent with that used in `optimx::optimr()`, that is, to
  surround the name of `jacfn` with quotes if it is a numerical approximation,
  or to provide a logical control to `nlxb()` for this purpose.

# Jacobians in nls()

This document source is in file **DerivsNLS.Rmd**.

`nls()` and other nonlinear least squares programs in R need a Jacobian matrix 
calculated at the current set of trial nonlinear model parameters to set up 
the Gauss-Newton equations or their stabilized modifications in methods such as
that of Marquardt (@Marquardt1963). Unfortunately, `nls()` calls
the Jacobian the "gradient", and uses function `numericDeriv()` to compute them.
This document is an attempt to describe different ways to compute the Jacobian
for use in nls() and related software, and to evaluate these approaches from
several perspectives.

In evaluating performance, we need to know the conditions under which the evaluation
was conducted. Thus the computations included in this document, which is built using
`Rmarkdown`, are specific to the computer in which the document is processed. We
will add tables that give the results for different computing environments at the
bottom.

# An example problem

We will use the Hobbs weed infestation problem (@jncnm79, page 120).

```{r hobbsex}
# Data for Hobbs problem
ydat  <-  c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
            38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat  <-  seq_along(ydat) # for testing

# A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr.
start1  <-  c(b1=1, b2=1, b3=1)
eunsc  <-   y ~ b1/(1+b2*exp(-b3*tt)) # formula -- display structure with str(eunsc)
# Can we convert a string form of this "model" to a formula
ceunsc <- " y ~ b1/(1+b2*exp(-b3*tt))" # This will give character form: str(ceunsc)
# Next line Will be TRUE if we have made the conversion OK
print(as.formula(ceunsc)==eunsc)
weeddata1  <-  data.frame(y=ydat, tt=tdat) ## LOCAL DATA IN DATA FRAMES
weedenv <- list2env(weeddata1) ## Put data in an Environment
# Add the parameter data as "variables" 
weedenv$b1 <- start1[[1]]; weedenv$b2 <- start1[[2]]; weedenv$b3 <- start1[[3]]
# Display content of the Environment with ## ls.str(weedenv)
# We are now set up for computations
```

# Tools for Jacobians

There are a number of ways to get the Jacobian in R.

## numericDeriv() original version from base R

`numericDeriv` is the R function used by `nls()` to evaluate Jacobians for its Gauss-Newton
equations. The R source code is in the file `nls.R`. It calls a C function numeric_deriv
in `nls.c`. These have been extracted in an R package form as `nlspkg` by Duncan Murdoch
as described in our document **PkgFromRbase.Rmd: Making a package from base R files**, and
we will use that version.

In the following we will test and time `numericDeriv()` along with various of its options.

```{r nd1}
rexpr<-call("-",eunsc[[3]], eunsc[[2]]) # Generate the residual "call"
res0<-eval(rexpr, weedenv) # Get the residuals
print(res0) # the base residuals
cat("Sumsquares at 1,1,1 is ",sum(res0^2),"\n")
treseval<-microbenchmark(res0<-eval(rexpr, weedenv))
print(treseval)
rexpr<-call("-",eunsc[[3]], eunsc[[2]]) # This is the "call" that computes the residual
## Try the numericDeriv option
theta<-names(start1)
## suppressMessages(library(nlspkg))
suppressMessages(ndnls<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv))
print(ndnls)
print(sum(ndnls^2))
tndnls<-microbenchmark(ndnls<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv))
print(tndnls)

## numericDeriv also has central difference option, as well as choice of eps parameter
## Central diff
ndnlsc<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE)
print(ndnlsc)
print(sum(ndnlsc^2))
tndnlsc<-microbenchmark(ndnlsc<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE))
print(tndnlsc)

## Forward diff with smaller eps
ndnlsx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10)
print(ndnlsx)
print(sum(ndnlsx^2))
tndnlsx<-microbenchmark(ndnlsx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10))
print(tndnlsx)

## Central diff with smaller eps
ndnlscx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10)
print(ndnlscx)
print(sum(ndnlscx^2))
tndnlscx<-microbenchmark(ndnlscx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10))
print(tndnlscx)

## Add dir parameter -- the direction of the parameter shift
ndnlsd<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, dir=-1)
# Does dir make a difference? This might be accidental for forward difference.
max(abs(ndnlsd-ndnls))
ndnlscd<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, dir=-1)
# Does dir make a difference? For central diff it should NOT!
max(abs(ndnlscd-ndnlsc))
```


## numericDeriv() alternative pure-R version

This version (see Appendix 2) has C code replaced with R equivalents.


```{r and1}
## Try ExDerivs.R ??
suppressMessages(andnls<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv))
# print(andnls); print(sum(andnls^2))
tandnls<-microbenchmark(andnls<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv))
print(tandnls)
## numericDeriv also has central difference option, as well as choice of eps parameter
## Central diff
andnlsc<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE)
ndnlsc<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE)
# print(andnlsc); print(sum(andnlsc^2))
tandnlsc<-microbenchmark(andnlsc<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE))
print(tandnlsc)

## Forward diff with smaller eps
andnlsx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10)
ndnlsx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10)
# print(andnlsx); print(sum(andnlsx^2))
tandnlsx<-microbenchmark(andnlsx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, eps=1e-10))
print(tandnlsx)

## Central diff with smaller eps
andnlscx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10)
ndnlscx<-nlspkg::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10)
# print(andnlscx) ; print(sum(andnlscx^2))
tandnlscx<-microbenchmark(andnlscx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=1e-10))
print(tandnlscx)

## Comparisons for Jacobian between nlspkg and nlsalt i.e. R&C vs just R
max(abs(attr(ndnls, "gradient")-attr(andnls,"gradient")))
max(abs(attr(ndnlsc, "gradient")-attr(andnlsc,"gradient")))
max(abs(attr(ndnlsx, "gradient")-attr(andnlsx,"gradient")))
max(abs(attr(ndnlscx, "gradient")-attr(andnlscx,"gradient")))

## Using dir
cat("eps (regular) = ",.Machine$double.eps^(1/2),
    "  eps (central) =",.Machine$double.eps^(1/3),"\n")
andnlsd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, dir=-1)
max(abs(attr(andnlsd, "gradient")-attr(ndnls,"gradient")))
max(abs(attr(andnlsd, "gradient")-attr(andnls,"gradient")))
andnlscd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, dir=-1)
max(abs(attr(andnlscd, "gradient")-attr(ndnlsc,"gradient")))

## Try comparisons over different eps sizes
for (ee in 3:10){
andnlsd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, dir=-1, eps=10^(-ee))
andnlscd<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, dir=-1, eps=10^(-ee))
andnlsx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, eps=10^(-ee))
andnlscx<-nlsalt::numericDeriv(rexpr, theta, rho=weedenv, central=TRUE, eps=10^(-ee))
cat("Regular diff, eps=10^(-",ee,"):",
      max(abs(attr(andnlsd,"gradient")-attr(andnlsx,"gradient"))),"\n")
cat("Central diff, eps=10^(-",ee,"):",
    max(abs(attr(andnlscd,"gradient")-attr(andnlscx,"gradient"))),"\n")
}
```

The `dir` parameter allows us to use a backward difference for the derivative. This 
appears in `nlsModel()` for the case where a parameter is on an upper bound for
the case `algorithm="port"`. It does not check for nearness to the bound, and 
for the lower bound assumes that we are stepping AWAY from the bound in the
default direction (`dir=+1`). None of the code addresses the issue where bounds
are closer together than the step used for the finite difference, so there are
situations where we could crash the code. Nor does the code check if the central
difference is specified when near a bound. 

- In the case of lower bounds, a central difference can overstep the bound when
  a parameter is "close" or on the bound.
- In the case of an upper bound, changing the `dir` will not change the central 
  derivative approximation expression and steps in both forward and backward 
  directions of the parameter are taken.
  
## Symbolic methods from `nlsr`

The package `nlsr` has a function `model2rjfun()` that converts an expression
describing how the residual functions are computed into an R function that
computes the residuals at a particular set of parameters and sets the 
**attribute** "gradient" of the vector of residual values to the Jacobian at
the particular set of parameters. `model2rjfun()` does much the same work
as the `res0<-eval(rexpr, weedenv)` expression evaluation, but adds derivative
expressions to the function. 

```{r nlsr1}
# nlsr has function model2rjfun. We can evaluate just the residuals
res0<-model2rjfun(eunsc, start1, data=weeddata1, jacobian=FALSE)
res0(start1)
tres0nlsr<-microbenchmark(res0(start1)) # time it
print(tres0nlsr)
# or the residuals and jacobian
funsc <- model2rjfun(eunsc, start1, data=weeddata1) # from nlsr: creates a function
tmodel2rjfun <- microbenchmark(model2rjfun(eunsc, start1, data=weeddata1))
print(tmodel2rjfun)
## Ways to display information about the residual/jacobian function
# print(funsc); print(funsc(start1)); print(environment(funsc)); print(ls.str(environment(funsc)))
# print(ls(environment(funsc)$data)); eval(eunsc, environment(funsc))
vfunsc<-funsc(start1)
print(vfunsc)
tfunsc<-microbenchmark(funsc(start1))
print(tfunsc)
```

# `numDeriv` package

The package `numDeriv` includes a function `jacobian()` that acts on a user
function `resid()` to produce the Jacobian at a set of parameters by several
choices of approximation. 

```{r numDeriv1}
# We use the residual function (without gradient attribute) from nlsr
jnumd<-jacobian(res0, start1) #  uses default "Richardson" method
jnumd
# Timings of the analytic jacobian calculations
tjnumd<-microbenchmark(jnumd<-jacobian(res0, start1))
print(tjnumd)

jnumds<-jacobian(res0, start1, method="simple") #  uses default "Richardson" method
jnumds
# Timings of the analytic jacobian calculations
tjnumds<-microbenchmark(jnumds<-jacobian(res0, start1, method="simple"))
print(tjnumds)

jnumdc<-jacobian(res0, start1, method="complex") #  uses default "Richardson" method
jnumdc
# Timings of the analytic jacobian calculations
tjnumdc<-microbenchmark(jnumdc<-jacobian(res0, start1, method="complex"))
print(tjnumdc)
```

Note that the manual pages for `numDeriv` offer many options for the functions in
the package. We have yet to explore many of these.

# Comparisons

In the following, we are comparing to `vfunsc`, which is the evaluated 
residual vector at `start1=c(1,1,1)` with "gradient" attribute (jacobian)
included, as developed using package `nlsr`. This is taken as the "correct"
result, even though it is possible that the generated order of calculations
may introduce inaccuracies in the supposedly analytic derivatives.

`numericDeriv` computes a similar structure (residuals with "gradient" attribute):
`ndnlsc`: the forward difference result with default `eps` (.Machine$double.eps^(1/2))
`ndnlsc2`: Central difference with default `eps` (.Machine$double.eps^(1/3))
`ndnlscx`: Forward difference with smaller eps=1e-10
`ndnlscx2`: Central difference with smaller eps=1e-10

`jnumd`: numDeriv::jacobian() result with default settings.

```{r compjac1}
## Matrix comparisons -- uncomment code to show these, which use page space
# attr(ndnls, "gradient")-attr(vfunsc,"gradient")
# attr(ndnlsc, "gradient")-attr(vfunsc,"gradient")
# attr(ndnlsx, "gradient")-attr(vfunsc,"gradient")
# attr(ndnlscx, "gradient")-attr(vfunsc,"gradient")
# jnumd - attr(vfunsc,"gradient")
# jnumds - attr(vfunsc,"gradient")
# jnumdc - attr(vfunsc,"gradient")

## Summary comparisons - maximum absolute differences
max(abs(attr(ndnls, "gradient")-attr(vfunsc,"gradient")))
max(abs(attr(ndnlsc, "gradient")-attr(vfunsc,"gradient")))
max(abs(attr(ndnlsx, "gradient")-attr(vfunsc,"gradient")))
max(abs(attr(ndnlscx, "gradient")-attr(vfunsc,"gradient")))
max(abs(jnumd - attr(vfunsc,"gradient")))
max(abs(jnumds - attr(vfunsc,"gradient")))
max(abs(jnumdc - attr(vfunsc,"gradient")))
```

## Observations

Some particular notes:

- the mean time for the default `numericDeriv()` of `nls()` is quite
  fast and its coefficient of variation (sd/mean) is around 0.42. The
  timings are actually very slightly faster than the analytic expressions
  of `nlsr`, but the latter has a COV of 0.36.
  
- this default method Jacobian unfortunately deviates from the analytical
  computation by a relatively large amount (of the order of 1e-6 for our
  example). 
  
- the central difference version of `numericDeriv()` does better (about 
  three orders of magnitude smaller deviation from the analytic result),
  and the time is comparable with the analytic evaluation.
  
- making the `eps` parameter smaller degrades the accuracy of the Jacobian
  computed via `numericDeriv()`. This may be counter-intuitive for those 
  unfamiliar with numerical methods. Essentially, a smaller `eps` results in
  subtraction of very close values for the residuals. 
  
- the "simple" option of the `numDeriv` function `jacobian()` gives similarly
  poor accuracy. 
  
- the "Richardson" (default) results of `numDeriv` are of similar accuracy to
  the central difference option of `numericDeriv()` but at a much greater time
  cost -- about 23 times slower.
  
- on the other hand, the "complex" option gets an essentially analytic result,
  approximately 8 orders of magnitude better than the central difference approximation
  of `numericDeriv`, for a time cost only 3.5 times as great. Unfortunately, not all
  models are amenable to the complex step approximation.


# Cautionary notes on performance results

The results here have been evaluated on a single computer. In fact, while we could
process the Rmarkdown file on any of several machines, the work was mainly carried out
on a machine characterized with the string

```
M21:john-Linux-5.11.0-25-generic|Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz|33482145792bytesRAM 
```

This is a relatively capable tower PC, but otherwise unremarkable. 

Nevertheless, we found that running the timings more than once when other tasks were in 
progress did result in variations in the mean and standard deviation of the timings of several
percentage points. We would expect both absolute differences in times and changes in relative
performance with different processors and operating systems, and had thought to carry out some
investigation of such differences. However, such effort seems less valuable than pursuing more
capable nonlinear least squares and derivative code.


# Some notes on derivative computation for nonlinear least squares

In no particular order, we comment on some issues relating to the Jacobian calculations
in nonlinear least squares.

## Nomenclature

R is not in step with many other areas of numerical computation when labeling different
objects in the nonlinear least squares problem. In particular, R uses the term "gradient"
when the object of interest is the Jacobian matrix. In that it is useful in performing 
iterations of the Gauss-Newton or related equations to have the Jacobian associated with
the residuals, and the rows of the Jacobian matrix are "gradients" of the respective
residuals, we can accept the attribute name "gradient" to select the required information.
Moreover, as in package `nlsr` it is very useful to have the Jacobian matrix as an 
attribute of the residual vector, since the main solver function, in this case `nlsr::nlfb()`,
can be called with the same input for the arguments `res` and `jac`. These are the functions
required to compute the residual and the Jacobian, and using the same function for both is
very convenient, but needs some way to return both the residual vector and Jacobian matrix
in a coherent fashion. 

## Numerical approximation near constraints

As far as we are aware there is no software that implements a fully safeguarded system to
compute numerical approximation of the Jacobian (or gradients in general optimization) near
constraints. The same statement applies even in the case of the much simpler bounds constraints.
Users have a perverse tendency to devise ways to foil our best efforts. For example, they may
decide that a good way to specify fixed (i.e., masked) parameters that they do not want to vary
during a particular calculation is to specify the lower and upper bound of a parameter at the
same value. Later runs may want the parameter constraints relaxed. 

In `nlsr::nlxb()`, users may, in fact, specify masked parameters this way. This is a case of 
"if you can't beat them, join them", but it does provide an easily understood way for users to
fix values. 

More tricky is dealing with constraints that are close together. Note that these may arise from,
for example, two linear (planar) constraints that approach at a narrow angle. In the apex where
these constraints intersect, we will have tight bounds on parameters. If the constraint is not
one that is imposed by the nature of the residual or objective function, for example, a log() 
or square root near zero, then we can generally proceed and allow the derivative approximation
to evaluate outside the constraints. Things are decidedly nastier if we do have inadmissible
values of the parameters. This is where analytic Jacobian evaluation is very helpful. 

The issue of constraints and the need for a step in parameter values for derivative approximations
was one of the motivations for trying to find analytic derivatives in package `nlsr` and the 
continuing effort to bring them into other R tools.

## A case where a parameter is close to inadmissible

The following example shows that numericDeriv() does a reasonable job of computing
the Jacobian, but the result is still singular.

```{r code=xfun::read_utf8('Tests/Examples/badJlogmod.R')}
```


# Appendix 1: Base R numericDeriv code

This code is in two files, nls.R and nls.c and is extracted here.

## From nls.R

```{r nlsR, comment=NA, echo=FALSE}
   cat(readLines('archived/nls-numericDeriv.R'), sep = '\n')
```


## From nls.c

```{r nlsc, comment=NA, echo=FALSE}
   cat(readLines('archived/nls-numericDeriv.c'), sep = '\n')
```

# Appendix 2: numericDeriv() from nlsalt package (all in R)

```{r nlsndnew, comment=NA, echo=FALSE}
   cat(readLines('./nlsalt/R/nlsnd.R'), sep = '\n')
```

# References