Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce differentiation interface #2137

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

SouthEndMusic
Copy link
Collaborator

Fixes #2134

@SouthEndMusic
Copy link
Collaborator Author

@gdalle it would be nice if you could have a look at 64091bc. This already works very nicely! We might be able to get rid of our direct SparseConnectivityTracer dependency, were it not that we (I) did some naughty overloads using GradientTracer. What would the next step towards Enzyme support look like?

@SouthEndMusic
Copy link
Collaborator Author

When testing dense Jacobian + ForwardDiff I get an error like this:

ERROR: LoadError: Invalid Tag object: ...
Observed ForwardDiff.Tag{...}

I wonder whether that has something to do with passing the AD backend to both DifferentiationInterface.jacobian and the algorithm constructor. I also wonder how this works for algorithms that require derivatives .w.r.t. time.

@gdalle
Copy link

gdalle commented Mar 10, 2025

Taking a look now

@@ -28,6 +28,33 @@ struct Model{T}
end
end

function get_jac_eval(du::Vector, u::Vector, p::Parameters, solver::Solver)
backend = if solver.autodiff
AutoForwardDiff()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want to configure the tag here if it is available from somewhere else (perhaps the solver?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solver is our own solver config object, but it works if I consistently specify the same tag everywhere 👍

p.all_nodes_active = false
jac_prototype
end

# Custom overloads
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to get rid of the SCT dependency, you may want to

  • put those overloads in an SCT package extension
  • ask the user to provide an AD backend, and if it is an AutoSparse, retrieve its sparsity_detector instead of providing your own

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this probably doesn't fit our application as described in #2137 (comment). Keeping the SCT dependency is fine

# Custom overloads
reduction_factor(x::GradientTracer, threshold::Real) = x
reduction_factor(x::GradientTracer, ::Real) = x
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is explicitly discouraged in the SCT docs: see the following page to add overloads properly

https://adrianhill.de/SparseConnectivityTracer.jl/stable/internals/adding_overloads/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I know 🙃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these functions have more than 2 arguments, but for all of them we only care about the derivative with respect to one input. It's not clear to me whether/how that fits within the overload functionality

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to use our mechanisms. They will generate a couple of superfluous methods, but I don't see the harm in that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, this specific line of code looks harmless. But once you have more than a handful of overloads, you might want to create an SCT extension and follow our docs. The generated code will be more future proof and compatible with local and global Jacobian and Hessian tracers. Your current code only supports global Jacobians.


# Activate all nodes to catch all possible state dependencies
p.all_nodes_active = true
prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)
prep = prepare_jacobian(water_balance!, du, backend, u, Constant(p), Constant(t))

See https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/explanation/advanced/#Contexts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs:

Another option would be creating a closure, but that is sometimes undesirable.

When is a closure undesirable?

gradient(f, backend, x, Constant(c))
gradient(f, backend, x, Cache(c))

In the first call, c is kept unchanged throughout the function evaluation. In the second call, c can be mutated with values computed during the function.

Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SciMLStructures.jl come in here for more granular control?

Copy link

@gdalle gdalle Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is a closure undesirable?

With Enzyme in particular it can make things slower or even error. With other backends it doesn't make much of a difference, but explicitly laying out the Contexts also allows taking into account element types (eg. for handling translation to Dual with Caches).

Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?

Does p contain anything whose value you care about?
In general, you might want to split it between a Constant part and a Cache part.

Should SciMLStructures.jl come in here for more granular control?

DI has no specific support for SciMLStructures

Copy link
Collaborator Author

@SouthEndMusic SouthEndMusic Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdalle I'm working on a refactor where e.g. the prep definition now looks like this:

prep = prepare_jacobian(
        water_balance!,
        du,
        backend,
        u,
        Constant(p_non_diff),
        Cache(p_diff),
        Constant(p_mutable),
        Constant(t),
    )

This now fails in the sparsity detection because of an attempt to write GradientTracer values to a Vector{Float64} field of p_diff::ParametersDiff. I made ParametersDiff parametric so ParametersDiff{GradientTracer{...}} can exist, and I half expected this to be constructed internally. This probably worked before because of PreallocationTools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just saw this warning in the docs:

Most backends require any Cache context to be an AbstractArray.

Let's see what I can do with that.

Copy link
Collaborator Author

@SouthEndMusic SouthEndMusic Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like that quickly solves the problem. I just naively subtyped ParametersDiff{T} <: AbstractVector{T}. Maybe I need to overload some methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a tricky one and it's indeed on me. If you don't want to wait for a DI fix (ETA ~ days), a short-term solution would be to use PreallocationTools and a closure, even if it makes Enzyme angry.

Comment on lines 51 to 53
jac =
(J, u, p, t) ->
jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jac =
(J, u, p, t) ->
jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u)
jac(J, u, p, t) = jacobian!(water_balance!, du, J, prep, backend, u, Constant(p), Constant(t))

@SouthEndMusic SouthEndMusic marked this pull request as draft March 10, 2025 15:58
@gdalle
Copy link

gdalle commented Mar 13, 2025

Pass most tests

Love the confident commit naming. That's the spirit we wanna see

Comment on lines 50 to 51
diff_cache_SCT =
zeros(GradientTracer{IndexSetGradientPattern{Int64, BitSet}}, length(diff_cache))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR you can use SCT.jacobian_buffer instead, and with the update to DI I'll make once that is merged, you probably won't need any tweak at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@gdalle
Copy link

gdalle commented Mar 13, 2025

@SouthEndMusic can you take the branch from JuliaDiff/DifferentiationInterface.jl#739 for a spin, see if it works?

Warning

I am aware that using DI.Cache still gives rise to allocations during each Jacobian computation. You're the first person actually using it, so I plan to fix it, but first I want to know if it runs

@SouthEndMusic
Copy link
Collaborator Author

@gdalle I took the main branch, and it indeed works 👍

@gdalle
Copy link

gdalle commented Mar 14, 2025

@SouthEndMusic with the branch from JuliaDiff/DifferentiationInterface.jl#741 the Caches should be allocation-free after preparation.

@gdalle
Copy link

gdalle commented Mar 15, 2025

The changes have been released

@SouthEndMusic SouthEndMusic requested a review from visr March 17, 2025 14:43
@SouthEndMusic SouthEndMusic marked this pull request as ready for review March 17, 2025 14:44
Copy link

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few remarks in passing.
Do you still need PreallocationTools in the end?

@@ -290,6 +291,9 @@ function function_accepts_kwarg(f, kwarg)::Bool
return false
end

get_ad_type(solver::Solver) =
solver.autodiff ? AutoForwardDiff(; tag = :Ribasim) : AutoFiniteDiff()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you want to perform higher-order autodiff, you may want to use ForwardDiff.Tag to create your package tag, see e.g. https://www.stochasticlifestyle.com/improved-forwarddiff-jl-stacktraces-with-package-tags/


# Activate all nodes to catch all possible state dependencies
p_mutable.all_nodes_active = true
jac_prep = prepare_jacobian(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be extra-safe, this huge-ass PR contains additional checking mechanisms, to make sure that your preparation is consistent with your execution. All you have to do to activate them is go prepare_jacobian(blablabla; strict=Val(true)). It will be the default in the next breaking release of DI.

@SouthEndMusic
Copy link
Collaborator Author

Since testing different AD backends is now 1-2 LOC I tested using Enzyme, but running the basic test model seems to hang indefinitely.

@gdalle
Copy link

gdalle commented Mar 17, 2025

The compilation times can be a bit painful on such a complex function

@SouthEndMusic
Copy link
Collaborator Author

Do you still need PreallocationTools in the end?

Nope, we no longer need this as a direct dependency

The compilation times can be a bit painful on such a complex function

Is there work to improve this, so that the compilation time for Enzyme is comparable to that of Julia?

@gdalle
Copy link

gdalle commented Mar 17, 2025

Is there work to improve this, so that the compilation time for Enzyme is comparable to that of Julia?

Idk, you'd have to ask Billy

@gdalle
Copy link

gdalle commented Mar 17, 2025

Do you have any clue if you got perfromance gains from the switch to DI?

@gdalle
Copy link

gdalle commented Mar 17, 2025

Also, do you have ready examples of the type of Jacobians you're handling? DI can also perform mixed-mode sparse AD, which wasn't possible before. That can come in handy for settings with dense columns and dense rows.

@SouthEndMusic
Copy link
Collaborator Author

Do you have any clue if you got perfromance gains from the switch to DI?

There might be some, because I believe the cache retrieval with PreallocationTools.jl wasn't optimal, so we minimized those. Now there are a bunch of view creations, hopefully those aren't slow. Regardless, the ODE portion or our code is optimized pretty well, often a significant portion of time is spent in matrix factorization. We'll have a look at the benchmarks

@SouthEndMusic
Copy link
Collaborator Author

Also, do you have ready examples of the type of Jacobians you're handling? DI can also perform mixed-mode sparse AD, which wasn't possible before. That can come in handy for settings with dense columns and dense rows.

Our models are graphs of river systems, with roughly a state per edge. So we know our Jacobians are sparse, but other than that I don't think we can say much, there generally aren't dense columns or rows

@gdalle
Copy link

gdalle commented Mar 17, 2025

Now there are a bunch of view creations

Where do those come from? I was wondering whether the limitation of DI.Cache to AbstractArray would force people to stuff everything they care about into one array.
It would be pretty straightforward for me to support tuples of arrays too (although you get essentially the same behavior by using several Caches).

@SouthEndMusic
Copy link
Collaborator Author

SouthEndMusic commented Mar 17, 2025

I was wondering whether the limitation of DI.Cache to AbstractArray would force people to stuff everything they care about into one array.

Yep that's exactly what I did. I tried overloading the right methods to make a struct work, but I gave up on that after a while. Can you take inspiration from Enzyme's make_zero?

We also did this for our state vector earlier. That used to be a ComponentVector, but that has the problem that the ranges are encoded in the type which causes latency in our (non statically compiled) executable.

@gdalle
Copy link

gdalle commented Mar 17, 2025

Yep that's exactly what I did. I tried overloading the right methods to make a struct work, but I gave up on that after a while. Can you take inspiration from Enzyme's make_zero?

No, replicating Enzyme's behavior would be a huge pain, but what I can do is support tuples of arrays. How many arrays are we talking here? Because you're not limited on the number of Cache arguments passed to DI.jacobian either.

@SouthEndMusic
Copy link
Collaborator Author

With the current possibilities, I could already make several cache arguments indeed. It would be nice to be able to combine them in a single object, though, so that function signatures do not have to be updated when a new cache is added.

I'm constrained to the f!(du, u, p, t) signature for Ordinarydiffeq.jl, so I had to make an intermediate method of the rhs with p distributed into Cache and Constant arguments to be parsed by DifferentiationInterface.jl (and another one with the arguments in the right order for derivatives w.r.t. time for solver algorithms that require that). That change in the structure of p is the main reason this PR is so big.

This splitting of p in Cache and Constant sections is I believe one of the main use cases of SciMLStructures.jl, which is why I mentioned it earlier.

Hopefully in the not so distant future most of these things are handled by the integration of DifferentiationInterface.jl into Ordinarydiffeq.jl.

@gdalle
Copy link

gdalle commented Mar 17, 2025

Yeah we're running into exactly the same problem inside the OrdinaryDiffEq PR: p mixing constants and caches is not great for DI. I'll probably create a third context type that is just passed along, but it will require PreallocationTools to work properly. Alternately, tuples or singly-nested structs of arrays should be easy to support as caches in DI, which would solve your immediate problem.

@SouthEndMusic
Copy link
Collaborator Author

Nice

I'll probably create a third context type that is just passed along

How is that different from Constant?

@gdalle
Copy link

gdalle commented Mar 17, 2025

Constant is explicitly not differentiated, which would cause issues if it contains cache memory. We don't see these issues with ForwardDiff or FiniteDiff, but with Enzyme the Const annotation that corresponds to DI.Constant semantically will stop propagation of derivatives through p.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce DifferentiationInterface.jl for Jacobian computation
4 participants