Introduce differentiation interface #2137

SouthEndMusic · 2025-03-10T10:53:27Z

SouthEndMusic · 2025-03-10T10:57:20Z

@gdalle it would be nice if you could have a look at 64091bc. This already works very nicely! We might be able to get rid of our direct SparseConnectivityTracer dependency, were it not that we (I) did some naughty overloads using GradientTracer. What would the next step towards Enzyme support look like?

SouthEndMusic · 2025-03-10T13:37:32Z

When testing dense Jacobian + ForwardDiff I get an error like this:

ERROR: LoadError: Invalid Tag object: ...
Observed ForwardDiff.Tag{...}

I wonder whether that has something to do with passing the AD backend to both DifferentiationInterface.jacobian and the algorithm constructor. I also wonder how this works for algorithms that require derivatives .w.r.t. time.

gdalle · 2025-03-10T13:41:40Z

Taking a look now

Manifest.toml

Project.toml

core/src/Ribasim.jl

core/src/model.jl

gdalle · 2025-03-10T13:46:19Z

core/src/model.jl

@@ -28,6 +28,33 @@ struct Model{T}
    end
 end

+function get_jac_eval(du::Vector, u::Vector, p::Parameters, solver::Solver)
+    backend = if solver.autodiff
+        AutoForwardDiff()


Maybe you want to configure the tag here if it is available from somewhere else (perhaps the solver?)

solver is our own solver config object, but it works if I consistently specify the same tag everywhere 👍

gdalle · 2025-03-10T13:48:22Z

core/src/util.jl

-    p.all_nodes_active = false
-    jac_prototype
-end
-
 # Custom overloads


If you want to get rid of the SCT dependency, you may want to

put those overloads in an SCT package extension

ask the user to provide an AD backend, and if it is an AutoSparse, retrieve its sparsity_detector instead of providing your own

Hmm this probably doesn't fit our application as described in #2137 (comment). Keeping the SCT dependency is fine

gdalle · 2025-03-10T13:48:58Z

core/src/util.jl

 # Custom overloads
-reduction_factor(x::GradientTracer, threshold::Real) = x
+reduction_factor(x::GradientTracer, ::Real) = x


Note that this is explicitly discouraged in the SCT docs: see the following page to add overloads properly

https://adrianhill.de/SparseConnectivityTracer.jl/stable/internals/adding_overloads/

I know I know 🙃

Some of these functions have more than 2 arguments, but for all of them we only care about the derivative with respect to one input. It's not clear to me whether/how that fits within the overload functionality

It makes sense to use our mechanisms. They will generate a couple of superfluous methods, but I don't see the harm in that.

To be fair, this specific line of code looks harmless. But once you have more than a handful of overloads, you might want to create an SCT extension and follow our docs. The generated code will be more future proof and compatible with local and global Jacobian and Hessian tracers. Your current code only supports global Jacobians.

core/src/Ribasim.jl

gdalle · 2025-03-10T13:51:36Z

core/src/model.jl

+
+    # Activate all nodes to catch all possible state dependencies
+    p.all_nodes_active = true
+    prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)


Suggested change

prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)

prep = prepare_jacobian(water_balance!, du, backend, u, Constant(p), Constant(t))

See https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/explanation/advanced/#Contexts

From the docs:

Another option would be creating a closure, but that is sometimes undesirable.

When is a closure undesirable?

gradient(f, backend, x, Constant(c))
gradient(f, backend, x, Cache(c))

In the first call, c is kept unchanged throughout the function evaluation. In the second call, c can be mutated with values computed during the function.

Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?

Should SciMLStructures.jl come in here for more granular control?

When is a closure undesirable?

With Enzyme in particular it can make things slower or even error. With other backends it doesn't make much of a difference, but explicitly laying out the Contexts also allows taking into account element types (eg. for handling translation to Dual with Caches).

Our p contains caches for in place computations in our RHS (hence the discussion on PreallocationTools etc. in the related issue). does that mean that we should use Cache(p)?

Does p contain anything whose value you care about?
In general, you might want to split it between a Constant part and a Cache part.

Should SciMLStructures.jl come in here for more granular control?

DI has no specific support for SciMLStructures

@gdalle I'm working on a refactor where e.g. the prep definition now looks like this:

prep = prepare_jacobian( water_balance!, du, backend, u, Constant(p_non_diff), Cache(p_diff), Constant(p_mutable), Constant(t), )

This now fails in the sparsity detection because of an attempt to write GradientTracer values to a Vector{Float64} field of p_diff::ParametersDiff. I made ParametersDiff parametric so ParametersDiff{GradientTracer{...}} can exist, and I half expected this to be constructed internally. This probably worked before because of PreallocationTools.

Oh I just saw this warning in the docs:

Most backends require any Cache context to be an AbstractArray.

Let's see what I can do with that.

It doesn't look like that quickly solves the problem. I just naively subtyped ParametersDiff{T} <: AbstractVector{T}. Maybe I need to overload some methods?

MWE: JuliaDiff/DifferentiationInterface.jl#737

Thanks, that's a tricky one and it's indeed on me. If you don't want to wait for a DI fix (ETA ~ days), a short-term solution would be to use PreallocationTools and a closure, even if it makes Enzyme angry.

gdalle · 2025-03-10T13:52:14Z

core/src/model.jl

+    jac =
+        (J, u, p, t) ->
+            jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u)


Suggested change

jac =

(J, u, p, t) ->

jacobian!((du, u) -> water_balance!(du, u, p, t), du, J, prep, backend, u)

jac(J, u, p, t) = jacobian!(water_balance!, du, J, prep, backend, u, Constant(p), Constant(t))

gdalle · 2025-03-13T07:14:17Z

Pass most tests

Love the confident commit naming. That's the spirit we wanna see

gdalle · 2025-03-13T07:20:35Z

core/src/model.jl

+    diff_cache_SCT =
+        zeros(GradientTracer{IndexSetGradientPattern{Int64, BitSet}}, length(diff_cache))


With this PR you can use SCT.jacobian_buffer instead, and with the update to DI I'll make once that is merged, you probably won't need any tweak at all

gdalle · 2025-03-13T16:48:28Z

@SouthEndMusic can you take the branch from JuliaDiff/DifferentiationInterface.jl#739 for a spin, see if it works?

Warning

I am aware that using DI.Cache still gives rise to allocations during each Jacobian computation. You're the first person actually using it, so I plan to fix it, but first I want to know if it runs

SouthEndMusic · 2025-03-13T17:59:25Z

@gdalle I took the main branch, and it indeed works 👍

gdalle · 2025-03-14T13:10:25Z

@SouthEndMusic with the branch from JuliaDiff/DifferentiationInterface.jl#741 the Caches should be allocation-free after preparation.

gdalle · 2025-03-15T11:36:18Z

The changes have been released

gdalle

Just a few remarks in passing.
Do you still need PreallocationTools in the end?

gdalle · 2025-03-17T14:58:02Z

core/src/config.jl

@@ -290,6 +291,9 @@ function function_accepts_kwarg(f, kwarg)::Bool
    return false
 end

+get_ad_type(solver::Solver) =
+    solver.autodiff ? AutoForwardDiff(; tag = :Ribasim) : AutoFiniteDiff()


In case you want to perform higher-order autodiff, you may want to use ForwardDiff.Tag to create your package tag, see e.g. https://www.stochasticlifestyle.com/improved-forwarddiff-jl-stacktraces-with-package-tags/

gdalle · 2025-03-17T15:00:12Z

core/src/model.jl

+
+    # Activate all nodes to catch all possible state dependencies
+    p_mutable.all_nodes_active = true
+    jac_prep = prepare_jacobian(


If you want to be extra-safe, this huge-ass PR contains additional checking mechanisms, to make sure that your preparation is consistent with your execution. All you have to do to activate them is go prepare_jacobian(blablabla; strict=Val(true)). It will be the default in the next breaking release of DI.

SouthEndMusic · 2025-03-17T15:04:06Z

Since testing different AD backends is now 1-2 LOC I tested using Enzyme, but running the basic test model seems to hang indefinitely.

gdalle · 2025-03-17T15:08:08Z

The compilation times can be a bit painful on such a complex function

SouthEndMusic · 2025-03-17T15:12:26Z

Do you still need PreallocationTools in the end?

Nope, we no longer need this as a direct dependency

The compilation times can be a bit painful on such a complex function

Is there work to improve this, so that the compilation time for Enzyme is comparable to that of Julia?

gdalle · 2025-03-17T15:15:54Z

Is there work to improve this, so that the compilation time for Enzyme is comparable to that of Julia?

Idk, you'd have to ask Billy

gdalle · 2025-03-17T15:33:56Z

Do you have any clue if you got perfromance gains from the switch to DI?

gdalle · 2025-03-17T15:34:47Z

Also, do you have ready examples of the type of Jacobians you're handling? DI can also perform mixed-mode sparse AD, which wasn't possible before. That can come in handy for settings with dense columns and dense rows.

SouthEndMusic · 2025-03-17T16:10:50Z

Do you have any clue if you got perfromance gains from the switch to DI?

There might be some, because I believe the cache retrieval with PreallocationTools.jl wasn't optimal, so we minimized those. Now there are a bunch of view creations, hopefully those aren't slow. Regardless, the ODE portion or our code is optimized pretty well, often a significant portion of time is spent in matrix factorization. We'll have a look at the benchmarks

SouthEndMusic · 2025-03-17T16:13:17Z

Also, do you have ready examples of the type of Jacobians you're handling? DI can also perform mixed-mode sparse AD, which wasn't possible before. That can come in handy for settings with dense columns and dense rows.

Our models are graphs of river systems, with roughly a state per edge. So we know our Jacobians are sparse, but other than that I don't think we can say much, there generally aren't dense columns or rows

gdalle · 2025-03-17T16:32:10Z

Now there are a bunch of view creations

Where do those come from? I was wondering whether the limitation of DI.Cache to AbstractArray would force people to stuff everything they care about into one array.
It would be pretty straightforward for me to support tuples of arrays too (although you get essentially the same behavior by using several Caches).

SouthEndMusic · 2025-03-17T16:56:15Z

I was wondering whether the limitation of DI.Cache to AbstractArray would force people to stuff everything they care about into one array.

Yep that's exactly what I did. I tried overloading the right methods to make a struct work, but I gave up on that after a while. Can you take inspiration from Enzyme's make_zero?

We also did this for our state vector earlier. That used to be a ComponentVector, but that has the problem that the ranges are encoded in the type which causes latency in our (non statically compiled) executable.

gdalle · 2025-03-17T17:01:23Z

Yep that's exactly what I did. I tried overloading the right methods to make a struct work, but I gave up on that after a while. Can you take inspiration from Enzyme's make_zero?

No, replicating Enzyme's behavior would be a huge pain, but what I can do is support tuples of arrays. How many arrays are we talking here? Because you're not limited on the number of Cache arguments passed to DI.jacobian either.

SouthEndMusic · 2025-03-17T17:25:08Z

With the current possibilities, I could already make several cache arguments indeed. It would be nice to be able to combine them in a single object, though, so that function signatures do not have to be updated when a new cache is added.

I'm constrained to the f!(du, u, p, t) signature for Ordinarydiffeq.jl, so I had to make an intermediate method of the rhs with p distributed into Cache and Constant arguments to be parsed by DifferentiationInterface.jl (and another one with the arguments in the right order for derivatives w.r.t. time for solver algorithms that require that). That change in the structure of p is the main reason this PR is so big.

This splitting of p in Cache and Constant sections is I believe one of the main use cases of SciMLStructures.jl, which is why I mentioned it earlier.

Hopefully in the not so distant future most of these things are handled by the integration of DifferentiationInterface.jl into Ordinarydiffeq.jl.

gdalle · 2025-03-17T17:31:33Z

Yeah we're running into exactly the same problem inside the OrdinaryDiffEq PR: p mixing constants and caches is not great for DI. I'll probably create a third context type that is just passed along, but it will require PreallocationTools to work properly. Alternately, tuples or singly-nested structs of arrays should be easy to support as caches in DI, which would solve your immediate problem.

SouthEndMusic · 2025-03-17T17:40:46Z

Nice

I'll probably create a third context type that is just passed along

How is that different from Constant?

gdalle · 2025-03-17T17:50:11Z

Constant is explicitly not differentiated, which would cause issues if it contains cache memory. We don't see these issues with ForwardDiff or FiniteDiff, but with Enzyme the Const annotation that corresponds to DI.Constant semantically will stop propagation of derivatives through p.

SouthEndMusic added 2 commits March 10, 2025 10:22

Unrelated cleanups

d3c68c2

Introduce DifferentiationInterface

64091bc

Fix most tests

732f074

bring back TracerSparsityDetector

b5a5693

gdalle suggested changes Mar 10, 2025

View reviewed changes

Most comments adressed

387d3d4

SouthEndMusic marked this pull request as draft March 10, 2025 15:58

SouthEndMusic added 6 commits March 11, 2025 14:49

Subdividing parameters first part

46af5f4

Merge branch 'main' into introduce_differentiation_interface

e5540f6

Introduce vector cache

79162e2

Add SCT cache hack, fix minor bugs

448ca28

Fix many tests

828178a

Pass most tests

de0dbb7

gdalle reviewed Mar 13, 2025

View reviewed changes

SouthEndMusic added 5 commits March 13, 2025 15:56

Fix more tests

17e2da8

Merge branch 'main' into introduce_differentiation_interface

0c7222c

Replace weight function by Returns

eb31535

Replace weight function by Returns properly

1939178

Avoid accessing undefined cache entries

23cf2cd

We no longer need to explicitly create the Jacobian buffer for SCT

151c552

Use DI prep_cache branch

23b844b

Update DI

a77042e

Fix last bugs

193035a

SouthEndMusic requested a review from visr March 17, 2025 14:43

SouthEndMusic marked this pull request as ready for review March 17, 2025 14:44

gdalle reviewed Mar 17, 2025

View reviewed changes

SouthEndMusic added 2 commits March 17, 2025 16:09

Nits

7c7401b

Remove PreallocationTools dep

9321621

	prep = prepare_jacobian((du, u) -> water_balance!(du, u, p, t), du, backend, u)
	prep = prepare_jacobian(water_balance!, du, backend, u, Constant(p), Constant(t))

		diff_cache_SCT =
		zeros(GradientTracer{IndexSetGradientPattern{Int64, BitSet}}, length(diff_cache))

Introduce differentiation interface #2137

Are you sure you want to change the base?

Introduce differentiation interface #2137

Conversation

SouthEndMusic commented Mar 10, 2025

SouthEndMusic commented Mar 10, 2025

SouthEndMusic commented Mar 10, 2025

gdalle commented Mar 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdalle Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

SouthEndMusic Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SouthEndMusic Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdalle commented Mar 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdalle commented Mar 13, 2025

SouthEndMusic commented Mar 13, 2025

gdalle commented Mar 14, 2025

gdalle commented Mar 15, 2025

gdalle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SouthEndMusic commented Mar 17, 2025

gdalle commented Mar 17, 2025

SouthEndMusic commented Mar 17, 2025

gdalle commented Mar 17, 2025

gdalle commented Mar 17, 2025

gdalle commented Mar 17, 2025

SouthEndMusic commented Mar 17, 2025

SouthEndMusic commented Mar 17, 2025

gdalle commented Mar 17, 2025 • edited Loading

SouthEndMusic commented Mar 17, 2025 • edited Loading

gdalle commented Mar 17, 2025

SouthEndMusic commented Mar 17, 2025

gdalle commented Mar 17, 2025

SouthEndMusic commented Mar 17, 2025

gdalle commented Mar 17, 2025

gdalle Mar 10, 2025 •

edited

Loading

SouthEndMusic Mar 11, 2025 •

edited

Loading

SouthEndMusic Mar 11, 2025 •

edited

Loading

gdalle commented Mar 17, 2025 •

edited

Loading

SouthEndMusic commented Mar 17, 2025 •

edited

Loading