📖 update docs

pzimbrod · pzimbrod · commit 256bb246d72e · 2022-02-09T10:55:44.000+01:00
diff --git a/docs/make.jl b/docs/make.jl
@@ -15,8 +15,11 @@ makedocs(;
     ),
     pages=[
         "Home" => "index.md",
-        "Module Reference" => "reference.md",
+        "Examples" =>
+            ["Burgers Equation with FNO" => "examples/burgers_FNO.md",
+             "Burgers Equation with DeepONet" => "examples/burgers_DeepONet.md"],
         "Frequently Asked Questions" => "faq.md",
+        "Module Reference" => "reference.md",
     ],
 )
 
diff --git a/docs/src/examples/burgers_DeepONet.md b/docs/src/examples/burgers_DeepONet.md
@@ -0,0 +1,98 @@
+# Solving the Burgers Equation with DeepONet
+
+This example mostly adapts the original work by [Li et al](https://github.com/zongyi-li/fourier_neural_operator/blob/master/fourier_1d.py) to solving with DeepONet and is intended to provide an analogue to [the FNO example](burgers_FNO.md).
+
+We try to create an operator for the Burgers equation
+
+$$ \partial_t u(x,t) + \partial_x (u^2(x,t)/2) = \nu \partial_{xx} u(x,t) $$
+
+in one dimension for a unit spacial and temporal domain. The operator maps the initial condition $u(x,0) = u_0(x)$ to the flow field at the final time $u(x,1)$.
+
+So overall, we need an approximation function that does the following:
+
+```julia
+function foo(u0,x)
+    # Do something
+    return u1
+```
+
+We sample from a dataset that contains several instances of the initial condition (`a`) and the final velocity field (`u`).
+The data is given on a grid of 8192 points, however we would like to only sample 1024 points.
+
+```julia
+using Flux: length, reshape, train!, throttle, @epochs
+using OperatorLearning, Flux, MAT
+
+device = cpu;
+
+#=
+We would like to implement and train a DeepONet that infers the solution
+u(x) of the burgers equation on a grid of 1024 points at time one based
+on the initial condition a(x) = u(x,0)
+=#
+
+# Read the data from MAT file and store it in a dict
+# key "a" is the IC
+# key "u" is the desired solution at time 1
+vars = matread("burgers_data_R10.mat") |> device
+
+# For trial purposes, we might want to train with different resolutions
+# So we sample only every n-th element
+subsample = 2^3;
+
+# create the x training array, according to our desired grid size
+xtrain = vars["a"][1:1000, 1:subsample:end]' |> device;
+# create the x test array
+xtest = vars["a"][end-99:end, 1:subsample:end]' |> device;
+
+# Create the y training array
+ytrain = vars["u"][1:1000, 1:subsample:end] |> device;
+# Create the y test array
+ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
+```
+
+One particular thing to note here is that we need to permute the array containing the initial condition so that the inner product of DeepONet works. This is because we need to do the following contraction:
+
+$$ \sum\limits_i t_{ji} b{ik} = u_{jk} $$
+
+For now, we only have one input and one output array. In addition, we need another input array that provides the probing locations for the operator $u_1(x) = \mathcal{G}(u_0)(x)$. In theory, we could choose those arbitrarily. For sake of simplicity though, we simply create the same equispaced grid that the original data was sampled from, i.e. a 1-D grid of 1024 equispaced points in [0;1]. Again, we need to transpose the array so that the array dim that is transformed by the trunk network is in the first column - otherwise the inner product would be much more cumbersome to handle.
+
+```julia
+# The data is missing grid data, so we create it
+# `collect` converts data type `range` into an array
+grid = collect(range(0, 1, length=1024))' |> device
+```
+
+We can now set up the DeepONet. We choose the latent space to have dimensionality 1024 and use the vanilla DeepONet architecture, i.e. we use `Dense` layers in both branch and trunk net. Both contain two layers and use the GeLU activation function:
+
+```julia
+# Create the DeepONet:
+# IC is given on grid of 1024 points, and we solve for a fixed time t in one
+# spatial dimension x, making the branch input of size 1024 and trunk size 1
+# We choose GeLU activation for both subnets
+model = DeepONet((1024,1024,1024),(1,1024,1024),gelu,gelu) |> device
+```
+
+The rest is more or less boilerplate training code for a DNN, *with one exception*: For the loss to compute properly, we need to pass two separate input arrays for the branch and trunk network each. We employ the ADAM optimizer with a fixed learning rate of 1e-3, use the mean squared error as loss, evaluate the test loss as callback and train the FNO for 500 epochs.
+
+```julia
+# We use the ADAM optimizer for training
+learning_rate = 0.001
+opt = ADAM(learning_rate)
+
+# Specify the model parameters
+parameters = params(model)
+
+# The loss function
+# We can't use the "vanilla" implementation of the mse here since we have
+# two distinct inputs to our DeepONet, so we wrap them into a tuple
+loss(xtrain,ytrain,sensor) = Flux.Losses.mse(model(xtrain,sensor),ytrain)
+
+# Define a callback function that gives some output during training
+evalcb() = @show(loss(xtest,ytest,grid))
+# Print the callback only every 5 seconds
+throttled_cb = throttle(evalcb, 5)
+
+# Do the training loop
+Flux.@epochs 500 train!(loss, parameters, [(xtrain,ytrain,grid)], opt, cb = evalcb)
+```
diff --git a/docs/src/examples/burgers_FNO.md b/docs/src/examples/burgers_FNO.md
@@ -0,0 +1,125 @@
+# Solving the Burgers Equation with the Fourier Neural Operator
+
+This example mostly replicates the original work by [Li et al](https://github.com/zongyi-li/fourier_neural_operator/blob/master/fourier_1d.py).
+
+We try to create an operator for the Burgers equation
+
+$$ \partial_t u(x,t) + \partial_x (u^2(x,t)/2) = \nu \partial_{xx} u(x,t) $$
+
+in one dimension for a unit spacial and temporal domain. The operator maps the initial condition $u(x,0) = u_0(x)$ to the flow field at the final time $u(x,1)$.
+
+So overall, we need an approximation function that does the following:
+
+```julia
+function foo(u0,x)
+    # Do something
+    return u1
+```
+
+We sample from a dataset that contains several instances of the initial condition (`a`) and the final velocity field (`u`).
+The data is given on a grid of 8192 points, however we would like to only sample 1024 points.
+
+```julia
+using Flux: length, reshape, train!, throttle, @epochs
+using OperatorLearning, Flux, MAT
+
+device = gpu;
+
+# Read the data from MAT file and store it in a dict
+vars = matread("burgers_data_R10.mat") |> device
+
+# For trial purposes, we might want to train with different resolutions
+# So we sample only every n-th element
+subsample = 2^3;
+
+# create the x training array, according to our desired grid size
+xtrain = vars["a"][1:1000, 1:subsample:end] |> device;
+# create the x test array
+xtest = vars["a"][end-99:end, 1:subsample:end] |> device;
+
+# Create the y training array
+ytrain = vars["u"][1:1000, 1:subsample:end] |> device;
+# Create the y test array
+ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
+```
+
+For now, we only have one input and one output array. In addition, we need corresponding x values for a(x) and u(x) as the second input array which at this point are still missing. The data were sampled from an equispaced grid (otherwise the FFT in our architecture wouldn't work anyway), so manually creating them is fairly straightforward:
+
+```julia
+# The data is missing grid data, so we create it
+# `collect` converts data type `range` into an array
+grid = collect(range(0, 1, length=length(xtrain[1,:]))) |> device
+
+# Merge the created grid with the data
+# Output has the dims: batch x grid points x 2  (a(x), x)
+# First, reshape the data to a 3D tensor,
+# Then, create a 3D tensor from the synthetic grid data
+# and concatenate them along the newly created 3rd dim
+xtrain = cat(reshape(xtrain,(1000,1024,1)),
+            reshape(repeat(grid,1000),(1000,1024,1));
+            dims=3) |> device
+ytrain = cat(reshape(ytrain,(1000,1024,1)),
+            reshape(repeat(grid,1000),(1000,1024,1));
+            dims=3) |> device
+# Same treatment with the test data
+xtest = cat(reshape(xtest,(100,1024,1)),
+            reshape(repeat(grid,100),(100,1024,1));
+            dims=3) |> device
+ytest = cat(reshape(ytest,(100,1024,1)),
+            reshape(repeat(grid,100),(100,1024,1));
+            dims=3) |> device
+```
+
+Next we need to consider the shape that the `FourierLayer` expects the inputs to be, i.e. `[numInputs, grid, batch]`. But our dataset contains the batching dim as the first one, so we need to do some permuting:
+
+```julia
+# Our net wants the input in the form (2,grid,batch), though,
+# So we permute
+xtrain, xtest = permutedims(xtrain,(3,2,1)), permutedims(xtest,(3,2,1)) |> device
+ytrain, ytest = permutedims(ytrain,(3,2,1)), permutedims(ytest,(3,2,1)) |> device
+```
+
+In order to slice the data into mini-batches, we pass the arrays to the Flux `DataLoader`.
+
+```julia
+# Pass the data to the Flux DataLoader and give it a batch of 20
+train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true) |> device
+test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false) |> device
+```
+
+We can now set up the architecture. We lift the inputs to a higher-dimensional space via a simple linear transform using a `Dense` layer. The input dimensionality is 2, we will transform it to 128. After that, we set up 4 instances of a Fourier Layer where we keep only 16 of the `N/2 + 1 = 513` modes that the FFT provides and use the GeLU activation. Finally, we reduce the latent space to the two output arrays we wish to obtain - `u1(x)` and `x`:
+
+```julia
+# Set up the Fourier Layer
+# 128 in- and outputs, batch size 20 as given above, grid size 1024
+# 16 modes to keep, σ activation on the gpu
+layer = FourierLayer(128,128,1024,16,gelu,bias_fourier=false) |> device
+
+# The whole architecture
+# linear transform into the latent space, 4 Fourier Layers,
+# then transform it back
+model = Chain(Dense(2,128;bias=false), layer, layer, layer, layer,
+                Dense(128,2;bias=false)) |> device
+```
+
+The rest is more or less boilerplate training code for a DNN. We employ the ADAM optimizer with a fixed learning rate of 1e-3, use the mean squared error as loss, evaluate the test loss as callback and train the FNO for 500 epochs.
+
+```julia
+# We use the ADAM optimizer for training
+learning_rate = 0.001
+opt = ADAM(learning_rate)
+
+# Specify the model parameters
+parameters = params(model)
+
+# The loss function
+loss(x,y) = Flux.Losses.mse(model(x),y)
+
+# Define a callback function that gives some output during training
+evalcb() = @show(loss(xtest,ytest))
+# Print the callback only every 5 seconds, 
+throttled_cb = throttle(evalcb, 5)
+
+# Do the training loop
+Flux.@epochs 500 train!(loss, parameters, train_loader, opt, cb = throttled_cb)
+```
diff --git a/docs/src/faq.md b/docs/src/faq.md
@@ -10,6 +10,8 @@ Currently, you need solved instances of the system you're trying to approximate
 
 That is, you'll need to gather data (probably using numerical simulations) that include the solution vector, the grid and the parameters of the PDE (system).
 
+In case you want to train a DeepONet, instantiating a grid is trivial since the sensor locations (the grid) does not necessarily need to match the discretization of the input function. So you can just create the arrays yourself as you like.
+
 However, future work includes implementing physics-informed operator approximations which have been shown to be able to lighten the amount of training data needed or even alleviate it altogether (see e.g. [[1](https://doi.org/10.1126/sciadv.abi8605)] or [[2](http://arxiv.org/abs/2111.03794)]).
 
 ## What about hardware and distributed computing?
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -18,7 +18,9 @@ Simply install by running in a REPL:
 pkg> add OperatorLearning
 ```
 
-## Usage/Examples
+## Usage
+
+### Fourier Neural Operator
 
 The basic workflow is more or less in line with the layer architectures that `Flux` provides, i.e. you construct individual layers, chain them if desired and pass the inputs as arguments to the layers.
 
@@ -31,12 +33,34 @@ The syntax for a single Fourier Layer is:
 using OperatorLearning
 using Flux
 
-# Input = 101, Output = 101, Batch size = 200, Grid points = 100, Fourier modes = 16
+# Input = 101, Output = 101, Grid points = 100, Fourier modes = 16
 # Activation: sigmoid (you need to import Flux in your Script to access the activations)
-model = FourierLayer(101, 101, 200, 100, 16, σ)
+model = FourierLayer(101, 101, 100, 16, σ)
 
 # Same as above, but perform strict convolution in Fourier Space
-model = FourierLayer(101, 101, 200, 100, 16, σ; bias_fourier=false)
+model = FourierLayer(101, 101, 100, 16, σ; bias_fourier=false)
+```
+
+To see a full implementation, check the corresponding [Burgers equation example](examples/burgers_FNO.md).
+
+### DeepONet
+
+The workflow here is a little different than with Fourier Neural Operator. In this case, you create the entire architecture by specifying two tuples corresponding to the architecture of branch and trunk net.
+
+This creates a "vanilla" DeepONet where branch and trunk net are simply Chains of Dense layers. You can however use any other architecture in the subnets as well, as long as the outputs of the two match. Otherwise, the contraction operation won't work due to dimension mismatch.
+
+```julia
+using OperatorLearning
+using Flux
+
+# Create a DeepONet with branch 32 -> 64 -> 72 and sigmoid activation
+# and trunk 24 -> 64 -> 72 and tanh activation without biases
+model = DeepONet((32,64,72), (24,64,72), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
+
+# Alternatively, set up your own nets altogether and pass them to DeepONet
+branch = Chain(Dense(2,128),Dense(128,64),Dense(64,72))
+trunk = Chain(Dense(1,24),Dense(24,72))
+model = DeepONet(branch,trunk)
 ```
 
-To see a full implementation, check the Burgers equation example at `examples/burgers.jl`.
+To see a full implementation, check the corresponding [Burgers equation example](examples/burgers_DeepONet.md).
diff --git a/examples/burgers_DeepONet.jl b/examples/burgers_DeepONet.jl
@@ -32,10 +32,6 @@ ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
 # `collect` converts data type `range` into an array
 grid = collect(range(0, 1, length=1024))' |> device
 
-# Pass the data to the Flux DataLoader and give it a batch of 20
-#train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true) |> device
-#test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false) |> device
-
 # Create the DeepONet:
 # IC is given on grid of 1024 points, and we solve for a fixed time t in one
 # spatial dimension x, making the branch input of size 1024 and trunk size 1