Merge pull request #24 from pzimbrod/23-Implement-DeepONet

pzimbrod · web-flow · commit e7a14c5fe72f · 2022-02-08T15:42:13.000Z
🆕 initial DeepONet implementation
diff --git a/README.md b/README.md
@@ -12,9 +12,9 @@
 
 A Package that provides Layers for the learning of (nonlinear) operators in order to solve parametric PDEs.
 
-For now, this package contains the Fourier Neural Operator originally proposed by Li et al.
+For now, this package contains the Fourier Neural Operator originally proposed by Li et al [1] as well as the DeepONet conceived by Lu et al [2].
 
-I decided to implement this method in Julia because coding up a layer using PyTorch in Python is rather cumbersome in comparison and Julia as a whole simply runs at comparable or faster speed than Python. Please do check out the [original work](https://github.com/zongyi-li/fourier_neural_operator) at GitHub as well.
+I decided to implement this method in Julia because coding up a layer using PyTorch in Python is rather cumbersome in comparison and Julia as a whole simply runs at comparable or faster speed than Python.
 
 The implementation of the layers is influenced heavily by the basic layers provided in the [Flux.jl](https://github.com/FluxML/Flux.jl) package.
 
@@ -28,6 +28,8 @@ pkg> add OperatorLearning
 
 ## Usage/Examples
 
+### Fourier Layer
+
 The basic workflow is more or less in line with the layer architectures that `Flux` provides, i.e. you construct individual layers, chain them if desired and pass the inputs as arguments to the layers.
 
 The Fourier Layer performs a linear transform as well as convolution (linear transform in fourier space), adds them and passes it through the activation.
@@ -47,11 +49,34 @@ model = FourierLayer(101, 101, 100, 16, σ)
 model = FourierLayer(101, 101, 100, 16, σ; bias_fourier=false)
 ```
 
-To see a full implementation, check the Burgers equation example at `examples/burgers.jl`.
+To see a full implementation, check the Burgers equation example at `examples/burgers_FNO.jl`.
 Compared to the original implementation by [Li et al.](https://github.com/zongyi-li/fourier_neural_operator/blob/master/fourier_1d.py) using PyTorch, this version written in Julia clocks in about 20 - 25% faster when running on a NVIDIA RTX A5000 GPU.
 
 If you'd like to replicate the example, you need to get the dataset for learning the Burgers equation. You can get it [here](https://drive.google.com/drive/folders/1UnbQh2WWc6knEHbLn-ZaXrKUZhp7pjt-) or alternatively use the provided [scripts](https://github.com/zongyi-li/fourier_neural_operator/tree/master/data_generation/burgers).
 
+### DeepONet
+
+The `DeepONet` function basically sets up two separate Flux `Chain` structs and transforms the two input arrays into one via einsum/dot product.
+
+You can either set up a "vanilla" DeepONet via the constructor function which sets up `Dense` layers for you or, if you feel fancy, pass two Chains directly to the function so you can use other architectures such as CNN or RNN as well.
+The former takes two tuples that describe each architecture. E.g. `(32,64,72)` sets up a DNN with 32 neurons in the first, 64 in the second and 72 in the last layer.
+
+```julia
+using OperatorLearning
+using Flux
+
+# Create a DeepONet with branch 32 -> 64 -> 72 and sigmoid activation
+# and trunk 24 -> 64 -> 72 and tanh activation without biases
+model = DeepONet((32,64,72), (24,64,72), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
+
+# Alternatively, set up your own nets altogether and pass them to DeepONet
+branch = Chain(Dense(2,128),Dense(128,64),Dense(64,72))
+trunk = Chain(Dense(1,24),Dense(24,72))
+model = DeepONet(branch,trunk)
+```
+
+For usage, check the Burgers equation example at `examples/burgers_DeepONet.jl`.
+
 ## License
 
 [MIT](https://choosealicense.com/licenses/mit/)
@@ -60,7 +85,7 @@ If you'd like to replicate the example, you need to get the dataset for learning
 
 - [x] 1D Fourier Layer
 - [ ] 2D / 3D Fourier Layer
-- [ ] DeepONet
+- [x] DeepONet
 - [ ] Physics informed Loss
 
 ## Contributing
@@ -69,4 +94,6 @@ Contributions are always welcome! Please submit a PR if you'd like to participat
 
 ## References
 
-- Li et al., 2020 [arXiv:2010.08895](https://arxiv.org/abs/2010.08895)
+[1] Z. Li et al., „Fourier Neural Operator for Parametric Partial Differential Equations“, [arXiv:2010.08895](https://arxiv.org/abs/2010.08895) [cs, math], May 2021
+
+[2] L. Lu, P. Jin, and G. E. Karniadakis, „DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators“, [arXiv:1910.03193](http://arxiv.org/abs/1910.03193) [cs, stat], Apr. 2020
diff --git a/examples/burgers_DeepONet.jl b/examples/burgers_DeepONet.jl
@@ -0,0 +1,63 @@
+using Flux: length, reshape, train!, throttle, @epochs
+using OperatorLearning, Flux, MAT
+
+device = cpu;
+
+#=
+We would like to implement and train a DeepONet that infers the solution
+u(x) of the burgers equation on a grid of 1024 points at time one based
+on the initial condition a(x) = u(x,0)
+=#
+
+# Read the data from MAT file and store it in a dict
+# key "a" is the IC
+# key "u" is the desired solution at time 1
+vars = matread("burgers_data_R10.mat") |> device
+
+# For trial purposes, we might want to train with different resolutions
+# So we sample only every n-th element
+subsample = 2^3;
+
+# create the x training array, according to our desired grid size
+xtrain = vars["a"][1:1000, 1:subsample:end]' |> device;
+# create the x test array
+xtest = vars["a"][end-99:end, 1:subsample:end]' |> device;
+
+# Create the y training array
+ytrain = vars["u"][1:1000, 1:subsample:end] |> device;
+# Create the y test array
+ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
+
+# The data is missing grid data, so we create it
+# `collect` converts data type `range` into an array
+grid = collect(range(0, 1, length=1024))' |> device
+
+# Pass the data to the Flux DataLoader and give it a batch of 20
+#train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true) |> device
+#test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false) |> device
+
+# Create the DeepONet:
+# IC is given on grid of 1024 points, and we solve for a fixed time t in one
+# spatial dimension x, making the branch input of size 1024 and trunk size 1
+# We choose GeLU activation for both subnets
+model = DeepONet((1024,1024,1024),(1,1024,1024),gelu,gelu) |> device
+
+# We use the ADAM optimizer for training
+learning_rate = 0.001
+opt = ADAM(learning_rate)
+
+# Specify the model parameters
+parameters = params(model)
+
+# The loss function
+# We can't use the "vanilla" implementation of the mse here since we have
+# two distinct inputs to our DeepONet, so we wrap them into a tuple
+loss(xtrain,ytrain,sensor) = Flux.Losses.mse(model(xtrain,sensor),ytrain)
+
+# Define a callback function that gives some output during training
+evalcb() = @show(loss(xtest,ytest,grid))
+# Print the callback only every 5 seconds
+throttled_cb = throttle(evalcb, 5)
+
+# Do the training loop
+Flux.@epochs 500 train!(loss, parameters, [(xtrain,ytrain,grid)], opt, cb = evalcb)
diff --git a/examples/burgers_FNO.jl b/examples/burgers_FNO.jl
@@ -1,4 +1,4 @@
-using Flux: length, reshape, train!, @epochs
+using Flux: length, reshape, train!, throttle, @epochs
 using OperatorLearning, Flux, MAT
 
 device = gpu;
@@ -74,10 +74,12 @@ parameters = params(model)
 loss(x,y) = Flux.Losses.mse(model(x),y)
 
 # Define a callback function that gives some output during training
-evalcb() = @show(loss(x,y))
+evalcb() = @show(loss(xtest,ytest))
+# Print the callback only every 5 seconds, 
+throttled_cb = throttle(evalcb, 5)
 
 # Do the training loop
-Flux.@epochs 500 train!(loss, parameters, train_loader, opt, cb = evalcb)
+Flux.@epochs 500 train!(loss, parameters, train_loader, opt, cb = throttled_cb)
 
 # Accuracy metrics
 val_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=1, shuffle=false) |> device
@@ -86,4 +88,4 @@ loss = 0.0 |> device
 for (x,y) in val_loader
     ŷ = model(x)
     loss += Flux.Losses.mse(ŷ,y)
-end
+end
diff --git a/src/DeepONet.jl b/src/DeepONet.jl
@@ -0,0 +1,125 @@
+"""
+`DeepONet(architecture_branch::Tuple, architecture_trunk::Tuple,
+                        act_branch = identity, act_trunk = identity;
+                        init_branch = Flux.glorot_uniform,
+                        init_trunk = Flux.glorot_uniform,
+                        bias_branch=true, bias_trunk=true)`
+`DeepONet(branch_net::Flux.Chain, trunk_net::Flux.Chain)`
+
+Create an (unstacked) DeepONet architecture as proposed by Lu et al.
+arXiv:1910.03193
+
+The model works as follows:
+
+x --- branch --
+               |
+                -⊠--u-
+               |
+y --- trunk ---
+
+Where `x` represents the input function, discretely evaluated at its respective sensors. So the ipnut is of shape [m] for one instance or [m x b] for a training set.
+`y` are the probing locations for the operator to be trained. It has shape [N x n] for N different variables in the PDE (i.e. spatial and temporal coordinates) with each n distinct evaluation points.
+`u` is the solution of the queried instance of the PDE, given by the specific choice of parameters.
+
+Both inputs `x` and `y` are multiplied together via dot product Σᵢ bᵢⱼ tᵢₖ.
+
+You can set up this architecture in two ways:
+
+1. By Specifying the architecture and all its parameters as given above. This always creates `Dense` layers for the branch and trunk net and corresponds to the DeepONet proposed by Lu et al.
+
+2. By passing two architectures in the form of two Chain structs directly. Do this if you want more flexibility and e.g. use an RNN or CNN instead of simple `Dense` layers.
+
+Strictly speaking, DeepONet does not imply either of the branch or trunk net to be a simple DNN. Usually though, this is the case which is why it's treated as the default case here.
+
+# Example
+
+Consider a transient 1D advection problem ∂ₜu + u ⋅ ∇u = 0, with an IC u(x,0) = g(x).
+We are given several (b = 200) instances of the IC, discretized at 50 points each and want to query the solution for 100 different locations and times [0;1].
+
+That makes the branch input of shape [50 x 200] and the trunk input of shape [2 x 100]. So the input for the branch net is 50 and 100 for the trunk net.
+
+# Usage
+
+```julia
+julia> model = DeepONet((32,64,72), (24,64,72))
+DeepONet with
+branch net: (Chain(Dense(32, 64), Dense(64, 72)))
+Trunk net: (Chain(Dense(24, 64), Dense(64, 72)))
+
+julia> model = DeepONet((32,64,72), (24,64,72), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
+DeepONet with
+branch net: (Chain(Dense(32, 64, σ), Dense(64, 72, σ)))
+Trunk net: (Chain(Dense(24, 64, tanh; bias=false), Dense(64, 72, tanh; bias=false)))
+
+julia> branch = Chain(Dense(2,128),Dense(128,64),Dense(64,72))
+Chain(
+  Dense(2, 128),                        # 384 parameters
+  Dense(128, 64),                       # 8_256 parameters
+  Dense(64, 72),                        # 4_680 parameters
+)                   # Total: 6 arrays, 13_320 parameters, 52.406 KiB.
+
+julia> trunk = Chain(Dense(1,24),Dense(24,72))
+Chain(
+  Dense(1, 24),                         # 48 parameters
+  Dense(24, 72),                        # 1_800 parameters
+)                   # Total: 4 arrays, 1_848 parameters, 7.469 KiB.
+
+julia> model = DeepONet(branch,trunk)
+DeepONet with
+branch net: (Chain(Dense(2, 128), Dense(128, 64), Dense(64, 72)))
+Trunk net: (Chain(Dense(1, 24), Dense(24, 72)))
+```
+"""
+struct DeepONet
+    branch_net::Flux.Chain
+    trunk_net::Flux.Chain
+end
+
+# Declare the function that assigns Weights and biases to the layer
+function DeepONet(architecture_branch::Tuple, architecture_trunk::Tuple,
+                        act_branch = identity, act_trunk = identity;
+                        init_branch = Flux.glorot_uniform,
+                        init_trunk = Flux.glorot_uniform,
+                        bias_branch=true, bias_trunk=true)
+
+    @assert architecture_branch[end] == architecture_trunk[end] "Branch and Trunk net must share the same amount of nodes in the last layer. Otherwise Σᵢ bᵢⱼ tᵢₖ won't work."
+
+    # To construct the subnets we use the helper function in subnets.jl
+    # Initialize the branch net
+    branch_net = construct_subnet(architecture_branch, act_branch;
+                                    init=init_branch, bias=bias_branch)
+    # Initialize the trunk net
+    trunk_net = construct_subnet(architecture_trunk, act_trunk;
+                                    init=init_trunk, bias=bias_trunk)
+
+    return DeepONet(branch_net, trunk_net)
+end
+
+Flux.@functor DeepONet
+
+#= The actual layer that does stuff
+x is the input function, evaluated at m locations (or m x b in case of batches)
+y is the array of sensors, i.e. the variables of the output function
+with shape (N x n) - N different variables with each n evaluation points =#
+function (a::DeepONet)(x::AbstractArray, y::AbstractVecOrMat)
+    # Assign the parameters
+    branch, trunk = a.branch_net, a.trunk_net
+
+    #= Dot product needs a dim to contract
+    However, we perform the transformations by the NNs always in the first dim
+    so we need to adjust (i.e. transpose) one of the inputs,
+    which we do on the branch input here =#
+    return branch(x)' * trunk(y)
+end
+
+# Sensors stay the same and shouldn't be batched
+(a::DeepONet)(x::AbstractArray, y::AbstractArray) = 
+  throw(ArgumentError("Sensor locations fed to trunk net can't be batched."))
+
+# Print nicely
+function Base.show(io::IO, l::DeepONet)
+    print(io, "DeepONet with\nbranch net: (",l.branch_net)
+    print(io, ")\n")
+    print(io, "Trunk net: (", l.trunk_net)
+    print(io, ")\n")
+end
diff --git a/src/OperatorLearning.jl b/src/OperatorLearning.jl
@@ -10,10 +10,12 @@ using Random: AbstractRNG
 using Flux: nfan, glorot_uniform, batch
 using OMEinsum
 
-export FourierLayer
+export FourierLayer, DeepONet
 
 include("FourierLayer.jl")
+include("DeepONet.jl")
 include("ComplexWeights.jl")
 include("batched.jl")
+include("subnets.jl")
 
 end # module
diff --git a/src/subnets.jl b/src/subnets.jl
@@ -0,0 +1,39 @@
+"""
+Construct a Chain of `Dense` layers from a given tuple of integers.
+
+Input:
+A tuple (m,n,o,p) of integer type numbers that each describe the width of the i-th Dense layer to Construct
+
+Output:
+A `Flux` Chain with length of the input tuple and individual width given by the tuple elements
+
+# Example
+
+```julia
+julia> model = OperatorLearning.construct_subnet((2,128,64,32,1))
+Chain(
+  Dense(2, 128),                        # 384 parameters
+  Dense(128, 64),                       # 8_256 parameters
+  Dense(64, 32),                        # 2_080 parameters
+  Dense(32, 1),                         # 33 parameters
+)                   # Total: 8 arrays, 10_753 parameters, 42.504 KiB.
+
+julia> model([2,1])
+1-element Vector{Float32}:
+ -0.7630446
+```
+"""
+function construct_subnet(architecture::Tuple, σ = identity;
+                          init=Flux.glorot_uniform, bias=true)
+    # First, create an array that contains all Dense layers independently
+    # Given n-element architecture constructs n-1 layers
+    layers = Array{Flux.Dense}(undef, length(architecture)-1)
+    @inbounds for i ∈ 2:length(architecture)
+      layers[i-1] = Flux.Dense(architecture[i-1], architecture[i], σ;
+                                init=init, bias=bias)
+    end
+
+    # Concatenate the layers to a string, chain them and parse them into
+    # the Flux Chain constructor syntax
+    return Meta.parse("Chain("*join(layers,",")*")") |> eval
+end
diff --git a/test/deeponet.jl b/test/deeponet.jl
@@ -0,0 +1,17 @@
+using Test, Random, Flux
+
+@testset "DeepONet" begin
+    @testset "dimensions" begin
+        # Test the proper construction
+        # Branch net
+        @test size(DeepONet((32,64,72), (24,48,72), σ, tanh).branch_net.layers[end].weight) == (72,64)
+        @test size(DeepONet((32,64,72), (24,48,72), σ, tanh).branch_net.layers[end].bias) == (72,)
+        # Trunk net
+        @test size(DeepONet((32,64,72), (24,48,72), σ, tanh).trunk_net.layers[end].weight) == (72,48)
+        @test size(DeepONet((32,64,72), (24,48,72), σ, tanh).trunk_net.layers[end].bias) == (72,)
+    end
+
+    # Accept only Int as architecture parameters
+    @test_throws MethodError DeepONet((32.5,64,72), (24,48,72), σ, tanh)
+    @test_throws MethodError DeepONet((32,64,72), (24.1,48,72))
+end
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -8,6 +8,10 @@ Random.seed!(0)
     include("fourierlayer.jl")
 end
 
+@testset "DeepONet" begin
+    include("deeponet.jl")
+end
+
 @testset "Weights" begin
     include("complexweights.jl")
 end