Skip to content

Commit

Permalink
modify docs to include scipy solver
Browse files Browse the repository at this point in the history
  • Loading branch information
rtqichen committed Dec 18, 2020
1 parent 8a11aba commit 75c78a4
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 9 deletions.
8 changes: 5 additions & 3 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
- `explicit_adams` Explicit Adams.
- `implicit_adams` Implicit Adams.

- `scipy_solver`: Wraps a SciPy solver.


**What are `NFE-F` and `NFE-B`?**<br>
Number of function evaluations for forward and backward pass.
Expand All @@ -31,7 +33,7 @@ The basic idea is each adaptive solver can produce an error estimate of the curr
[Error Tolerances for Variable-Step Solvers](https://www.mathworks.com/help/simulink/ug/types-of-solvers.html#f11-44943)

**How is the error tolerance calculated?**<br>
The error tolerance is [calculated]((https://github.com/rtqichen/torchdiffeq/blob/master/torchdiffeq/_impl/misc.py#L74)) as `atol + rtol * norm of current state`, where the norm being used is a mixed L-infinity/RMS norm.
The error tolerance is [calculated]((https://github.com/rtqichen/torchdiffeq/blob/master/torchdiffeq/_impl/misc.py#L74)) as `atol + rtol * norm of current state`, where the norm being used is a mixed L-infinity/RMS norm.

**Where is the code that computes the error tolerance?**<br>
It is computed [here.](https://github.com/rtqichen/torchdiffeq/blob/c4c9c61c939c630b9b88267aa56ddaaec319cb16/torchdiffeq/_impl/misc.py#L94)
Expand Down Expand Up @@ -72,7 +74,7 @@ https://stackoverflow.com/questions/52528955/installing-a-python-module-from-git

**What is the most memory-expensive operation during training?**<br>
The most memory-expensive operation is the single [backward call](https://github.com/rtqichen/torchdiffeq/blob/master/torchdiffeq/_impl/adjoint.py#L75) made to the network.

**My Neural ODE's numerical solution is farther away from the target than the initial value**<br>
Most tricks for initializing residual nets (like zeroing the weights of the last layer) should help for ODEs as well. This will initialize the ODE as an identity.

Expand All @@ -82,4 +84,4 @@ This might be because you're running on CPU. Being extremely slow on CPU is expe


**My Neural ODE produces underflow in dt when using adaptive solvers like `dopri5`**<br>
This is a problem of the ODE becoming stiff, essentially acting too erratic in a region and the step size becomes so close to zero that no progress can be made in the solver. We were able to avoid this with regularization such as weight decay and using "nice" activation functions, but YMMV. Other potential options are just to accept a larger error by increasing `atol`, `rtol`, or by switching to a fixed solver.
This is a problem of the ODE becoming stiff, essentially acting too erratic in a region and the step size becomes so close to zero that no progress can be made in the solver. We were able to avoid this with regularization such as weight decay and using "nice" activation functions, but YMMV. Other potential options are just to accept a larger error by increasing `atol`, `rtol`, or by switching to a fixed solver.
13 changes: 9 additions & 4 deletions FURTHER_DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,15 @@ For this solver, `rtol` and `atol` correspond to the tolerance for convergence o

- `max_iters`: The maximum number of iterations to run the Adams-Moulton corrector for.

**scipy_solver:**<br>
- `solver`: which SciPy solver to use; corresponds to the `'method'` argument of `scipy.integrate.solve_ivp`.

## Adjoint options

The function `odeint_adjoint` offers some adjoint-specific options.

- `adjoint_rtol`,<br>`adjoint_atol`,<br>`adjoint_method`,<br>`adjoint_options`:<br>The `rtol, atol, method, options` to use for the backward pass. Defaults to the values used for the forward pass.

- `adjoint_params`: The parameters to compute gradients with respect to in the backward pass. Should be a tuple of tensors. Defaults to `tuple(func.parameters())`. If passed then `func` does not have to be a `torch.nn.Module`.

- `adjoint_params`: The parameters to compute gradients with respect to in the backward pass. Should be a tuple of tensors. Defaults to `tuple(func.parameters())`.
- If passed then `func` does not have to be a `torch.nn.Module`.
- If `func` has no parameters, `adjoint_params=()` must be specified.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,9 @@ Fixed-step:
- `rk4` Fourth-order Runge-Kutta with 3/8 rule.
- `explicit_adams` Explicit Adams-Bashforth.
- `implicit_adams` Implicit Adams-Bashforth-Moulton.


Additionally, all solvers available through SciPy are wrapped for use with `scipy_solver`.

For most problems, good choices are the default `dopri5`, or to use `rk4` with `options=dict(step_size=...)` set appropriately small. Adjusting the tolerances (adaptive solvers) or step size (fixed solvers), will allow for trade-offs between speed and accuracy.

#### Frequently Asked Questions
Expand All @@ -94,4 +96,4 @@ If you found this library useful in your research, please consider citing
journal={Advances in Neural Information Processing Systems},
year={2018}
}
```
```

0 comments on commit 75c78a4

Please sign in to comment.