-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce differentiation interface #2137
base: main
Are you sure you want to change the base?
Conversation
@gdalle it would be nice if you could have a look at 64091bc. This already works very nicely! We might be able to get rid of our direct |
When testing dense Jacobian + ForwardDiff I get an error like this:
I wonder whether that has something to do with passing the AD backend to both |
Taking a look now |
core/src/model.jl
Outdated
@@ -28,6 +28,33 @@ struct Model{T} | |||
end | |||
end | |||
|
|||
function get_jac_eval(du::Vector, u::Vector, p::Parameters, solver::Solver) | |||
backend = if solver.autodiff | |||
AutoForwardDiff() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you want to configure the tag here if it is available from somewhere else (perhaps the solver
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solver
is our own solver config object, but it works if I consistently specify the same tag everywhere 👍
p.all_nodes_active = false | ||
jac_prototype | ||
end | ||
|
||
# Custom overloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to get rid of the SCT dependency, you may want to
- put those overloads in an SCT package extension
- ask the user to provide an AD backend, and if it is an
AutoSparse
, retrieve itssparsity_detector
instead of providing your own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this probably doesn't fit our application as described in #2137 (comment). Keeping the SCT dependency is fine
# Custom overloads | ||
reduction_factor(x::GradientTracer, threshold::Real) = x | ||
reduction_factor(x::GradientTracer, ::Real) = x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is explicitly discouraged in the SCT docs: see the following page to add overloads properly
https://adrianhill.de/SparseConnectivityTracer.jl/stable/internals/adding_overloads/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I know 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these functions have more than 2 arguments, but for all of them we only care about the derivative with respect to one input. It's not clear to me whether/how that fits within the overload functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to use our mechanisms. They will generate a couple of superfluous methods, but I don't see the harm in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, this specific line of code looks harmless. But once you have more than a handful of overloads, you might want to create an SCT extension and follow our docs. The generated code will be more future proof and compatible with local and global Jacobian and Hessian tracers. Your current code only supports global Jacobians.
core/src/model.jl
Outdated
|
||
# Activate all nodes to catch all possible state dependencies | ||
p.all_nodes_active = true | ||
prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u) | |
prep = prepare_jacobian(water_balance!, du, backend, u, Constant(p), Constant(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs:
Another option would be creating a closure, but that is sometimes undesirable.
When is a closure undesirable?
gradient(f, backend, x, Constant(c))
gradient(f, backend, x, Cache(c))
In the first call, c is kept unchanged throughout the function evaluation. In the second call, c can be mutated with values computed during the function.
Our p
contains caches for in place computations in our RHS (hence the discussion on PreallocationTools
etc. in the related issue). does that mean that we should use Cache(p)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should SciMLStructures.jl
come in here for more granular control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is a closure undesirable?
With Enzyme in particular it can make things slower or even error. With other backends it doesn't make much of a difference, but explicitly laying out the Context
s also allows taking into account element types (eg. for handling translation to Dual
with Cache
s).
Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?
Does p
contain anything whose value you care about?
In general, you might want to split it between a Constant
part and a Cache
part.
Should SciMLStructures.jl come in here for more granular control?
DI has no specific support for SciMLStructures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gdalle I'm working on a refactor where e.g. the prep
definition now looks like this:
prep = prepare_jacobian(
water_balance!,
du,
backend,
u,
Constant(p_non_diff),
Cache(p_diff),
Constant(p_mutable),
Constant(t),
)
This now fails in the sparsity detection because of an attempt to write GradientTracer
values to a Vector{Float64}
field of p_diff::ParametersDiff
. I made ParametersDiff
parametric so ParametersDiff{GradientTracer{...}}
can exist, and I half expected this to be constructed internally. This probably worked before because of PreallocationTools
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just saw this warning in the docs:
Most backends require any Cache context to be an AbstractArray.
Let's see what I can do with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like that quickly solves the problem. I just naively subtyped ParametersDiff{T} <: AbstractVector{T}
. Maybe I need to overload some methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a tricky one and it's indeed on me. If you don't want to wait for a DI fix (ETA ~ days), a short-term solution would be to use PreallocationTools and a closure, even if it makes Enzyme angry.
core/src/model.jl
Outdated
jac = | ||
(J, u, p, t) -> | ||
jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jac = | |
(J, u, p, t) -> | |
jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u) | |
jac(J, u, p, t) = jacobian!(water_balance!, du, J, prep, backend, u, Constant(p), Constant(t)) |
Love the confident commit naming. That's the spirit we wanna see |
core/src/model.jl
Outdated
diff_cache_SCT = | ||
zeros(GradientTracer{IndexSetGradientPattern{Int64, BitSet}}, length(diff_cache)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR you can use SCT.jacobian_buffer
instead, and with the update to DI I'll make once that is merged, you probably won't need any tweak at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@SouthEndMusic can you take the branch from JuliaDiff/DifferentiationInterface.jl#739 for a spin, see if it works? Warning I am aware that using |
@gdalle I took the main branch, and it indeed works 👍 |
@SouthEndMusic with the branch from JuliaDiff/DifferentiationInterface.jl#741 the |
The changes have been released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few remarks in passing.
Do you still need PreallocationTools in the end?
@@ -290,6 +291,9 @@ function function_accepts_kwarg(f, kwarg)::Bool | |||
return false | |||
end | |||
|
|||
get_ad_type(solver::Solver) = | |||
solver.autodiff ? AutoForwardDiff(; tag = :Ribasim) : AutoFiniteDiff() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you want to perform higher-order autodiff, you may want to use ForwardDiff.Tag
to create your package tag, see e.g. https://www.stochasticlifestyle.com/improved-forwarddiff-jl-stacktraces-with-package-tags/
|
||
# Activate all nodes to catch all possible state dependencies | ||
p_mutable.all_nodes_active = true | ||
jac_prep = prepare_jacobian( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be extra-safe, this huge-ass PR contains additional checking mechanisms, to make sure that your preparation is consistent with your execution. All you have to do to activate them is go prepare_jacobian(blablabla; strict=Val(true))
. It will be the default in the next breaking release of DI.
Since testing different AD backends is now 1-2 LOC I tested using Enzyme, but running the basic test model seems to hang indefinitely. |
The compilation times can be a bit painful on such a complex function |
Nope, we no longer need this as a direct dependency
Is there work to improve this, so that the compilation time for Enzyme is comparable to that of Julia? |
Idk, you'd have to ask Billy |
Do you have any clue if you got perfromance gains from the switch to DI? |
Also, do you have ready examples of the type of Jacobians you're handling? DI can also perform mixed-mode sparse AD, which wasn't possible before. That can come in handy for settings with dense columns and dense rows. |
There might be some, because I believe the cache retrieval with |
Our models are graphs of river systems, with roughly a state per edge. So we know our Jacobians are sparse, but other than that I don't think we can say much, there generally aren't dense columns or rows |
Where do those come from? I was wondering whether the limitation of |
Yep that's exactly what I did. I tried overloading the right methods to make a struct work, but I gave up on that after a while. Can you take inspiration from Enzyme's We also did this for our state vector earlier. That used to be a |
No, replicating Enzyme's behavior would be a huge pain, but what I can do is support tuples of arrays. How many arrays are we talking here? Because you're not limited on the number of |
With the current possibilities, I could already make several cache arguments indeed. It would be nice to be able to combine them in a single object, though, so that function signatures do not have to be updated when a new cache is added. I'm constrained to the This splitting of Hopefully in the not so distant future most of these things are handled by the integration of |
Yeah we're running into exactly the same problem inside the OrdinaryDiffEq PR: |
Nice
How is that different from |
Constant is explicitly not differentiated, which would cause issues if it contains cache memory. We don't see these issues with ForwardDiff or FiniteDiff, but with Enzyme the |
Fixes #2134