Skip to content

Commit 256bb24

Browse files
committed
📖 update docs
1 parent e7a14c5 commit 256bb24

File tree

6 files changed

+258
-10
lines changed

6 files changed

+258
-10
lines changed

‎docs/make.jl‎

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,11 @@ makedocs(;
1515
),
1616
pages=[
1717
"Home" => "index.md",
18-
"Module Reference" => "reference.md",
18+
"Examples" =>
19+
["Burgers Equation with FNO" => "examples/burgers_FNO.md",
20+
"Burgers Equation with DeepONet" => "examples/burgers_DeepONet.md"],
1921
"Frequently Asked Questions" => "faq.md",
22+
"Module Reference" => "reference.md",
2023
],
2124
)
2225

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Solving the Burgers Equation with DeepONet
2+
3+
This example mostly adapts the original work by [Li et al](https://github.com/zongyi-li/fourier_neural_operator/blob/master/fourier_1d.py) to solving with DeepONet and is intended to provide an analogue to [the FNO example](burgers_FNO.md).
4+
5+
We try to create an operator for the Burgers equation
6+
7+
$$ \partial_t u(x,t) + \partial_x (u^2(x,t)/2) = \nu \partial_{xx} u(x,t) $$
8+
9+
in one dimension for a unit spacial and temporal domain. The operator maps the initial condition $u(x,0) = u_0(x)$ to the flow field at the final time $u(x,1)$.
10+
11+
So overall, we need an approximation function that does the following:
12+
13+
```julia
14+
function foo(u0,x)
15+
# Do something
16+
return u1
17+
```
18+
19+
We sample from a dataset that contains several instances of the initial condition (`a`) and the final velocity field (`u`).
20+
The data is given on a grid of 8192 points, however we would like to only sample 1024 points.
21+
22+
```julia
23+
using Flux: length, reshape, train!, throttle, @epochs
24+
using OperatorLearning, Flux, MAT
25+
26+
device = cpu;
27+
28+
#=
29+
We would like to implement and train a DeepONet that infers the solution
30+
u(x) of the burgers equation on a grid of 1024 points at time one based
31+
on the initial condition a(x) = u(x,0)
32+
=#
33+
34+
# Read the data from MAT file and store it in a dict
35+
# key "a" is the IC
36+
# key "u" is the desired solution at time 1
37+
vars = matread("burgers_data_R10.mat") |> device
38+
39+
# For trial purposes, we might want to train with different resolutions
40+
# So we sample only every n-th element
41+
subsample = 2^3;
42+
43+
# create the x training array, according to our desired grid size
44+
xtrain = vars["a"][1:1000, 1:subsample:end]' |> device;
45+
# create the x test array
46+
xtest = vars["a"][end-99:end, 1:subsample:end]' |> device;
47+
48+
# Create the y training array
49+
ytrain = vars["u"][1:1000, 1:subsample:end] |> device;
50+
# Create the y test array
51+
ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
52+
```
53+
54+
One particular thing to note here is that we need to permute the array containing the initial condition so that the inner product of DeepONet works. This is because we need to do the following contraction:
55+
56+
$$ \sum\limits_i t_{ji} b{ik} = u_{jk} $$
57+
58+
For now, we only have one input and one output array. In addition, we need another input array that provides the probing locations for the operator $u_1(x) = \mathcal{G}(u_0)(x)$. In theory, we could choose those arbitrarily. For sake of simplicity though, we simply create the same equispaced grid that the original data was sampled from, i.e. a 1-D grid of 1024 equispaced points in [0;1]. Again, we need to transpose the array so that the array dim that is transformed by the trunk network is in the first column - otherwise the inner product would be much more cumbersome to handle.
59+
60+
```julia
61+
# The data is missing grid data, so we create it
62+
# `collect` converts data type `range` into an array
63+
grid = collect(range(0, 1, length=1024))' |> device
64+
```
65+
66+
We can now set up the DeepONet. We choose the latent space to have dimensionality 1024 and use the vanilla DeepONet architecture, i.e. we use `Dense` layers in both branch and trunk net. Both contain two layers and use the GeLU activation function:
67+
68+
```julia
69+
# Create the DeepONet:
70+
# IC is given on grid of 1024 points, and we solve for a fixed time t in one
71+
# spatial dimension x, making the branch input of size 1024 and trunk size 1
72+
# We choose GeLU activation for both subnets
73+
model = DeepONet((1024,1024,1024),(1,1024,1024),gelu,gelu) |> device
74+
```
75+
76+
The rest is more or less boilerplate training code for a DNN, *with one exception*: For the loss to compute properly, we need to pass two separate input arrays for the branch and trunk network each. We employ the ADAM optimizer with a fixed learning rate of 1e-3, use the mean squared error as loss, evaluate the test loss as callback and train the FNO for 500 epochs.
77+
78+
```julia
79+
# We use the ADAM optimizer for training
80+
learning_rate = 0.001
81+
opt = ADAM(learning_rate)
82+
83+
# Specify the model parameters
84+
parameters = params(model)
85+
86+
# The loss function
87+
# We can't use the "vanilla" implementation of the mse here since we have
88+
# two distinct inputs to our DeepONet, so we wrap them into a tuple
89+
loss(xtrain,ytrain,sensor) = Flux.Losses.mse(model(xtrain,sensor),ytrain)
90+
91+
# Define a callback function that gives some output during training
92+
evalcb() = @show(loss(xtest,ytest,grid))
93+
# Print the callback only every 5 seconds
94+
throttled_cb = throttle(evalcb, 5)
95+
96+
# Do the training loop
97+
Flux.@epochs 500 train!(loss, parameters, [(xtrain,ytrain,grid)], opt, cb = evalcb)
98+
```
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Solving the Burgers Equation with the Fourier Neural Operator
2+
3+
This example mostly replicates the original work by [Li et al](https://github.com/zongyi-li/fourier_neural_operator/blob/master/fourier_1d.py).
4+
5+
We try to create an operator for the Burgers equation
6+
7+
$$ \partial_t u(x,t) + \partial_x (u^2(x,t)/2) = \nu \partial_{xx} u(x,t) $$
8+
9+
in one dimension for a unit spacial and temporal domain. The operator maps the initial condition $u(x,0) = u_0(x)$ to the flow field at the final time $u(x,1)$.
10+
11+
So overall, we need an approximation function that does the following:
12+
13+
```julia
14+
function foo(u0,x)
15+
# Do something
16+
return u1
17+
```
18+
19+
We sample from a dataset that contains several instances of the initial condition (`a`) and the final velocity field (`u`).
20+
The data is given on a grid of 8192 points, however we would like to only sample 1024 points.
21+
22+
```julia
23+
using Flux: length, reshape, train!, throttle, @epochs
24+
using OperatorLearning, Flux, MAT
25+
26+
device = gpu;
27+
28+
# Read the data from MAT file and store it in a dict
29+
vars = matread("burgers_data_R10.mat") |> device
30+
31+
# For trial purposes, we might want to train with different resolutions
32+
# So we sample only every n-th element
33+
subsample = 2^3;
34+
35+
# create the x training array, according to our desired grid size
36+
xtrain = vars["a"][1:1000, 1:subsample:end] |> device;
37+
# create the x test array
38+
xtest = vars["a"][end-99:end, 1:subsample:end] |> device;
39+
40+
# Create the y training array
41+
ytrain = vars["u"][1:1000, 1:subsample:end] |> device;
42+
# Create the y test array
43+
ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
44+
```
45+
46+
For now, we only have one input and one output array. In addition, we need corresponding x values for a(x) and u(x) as the second input array which at this point are still missing. The data were sampled from an equispaced grid (otherwise the FFT in our architecture wouldn't work anyway), so manually creating them is fairly straightforward:
47+
48+
```julia
49+
# The data is missing grid data, so we create it
50+
# `collect` converts data type `range` into an array
51+
grid = collect(range(0, 1, length=length(xtrain[1,:]))) |> device
52+
53+
# Merge the created grid with the data
54+
# Output has the dims: batch x grid points x 2 (a(x), x)
55+
# First, reshape the data to a 3D tensor,
56+
# Then, create a 3D tensor from the synthetic grid data
57+
# and concatenate them along the newly created 3rd dim
58+
xtrain = cat(reshape(xtrain,(1000,1024,1)),
59+
reshape(repeat(grid,1000),(1000,1024,1));
60+
dims=3) |> device
61+
ytrain = cat(reshape(ytrain,(1000,1024,1)),
62+
reshape(repeat(grid,1000),(1000,1024,1));
63+
dims=3) |> device
64+
# Same treatment with the test data
65+
xtest = cat(reshape(xtest,(100,1024,1)),
66+
reshape(repeat(grid,100),(100,1024,1));
67+
dims=3) |> device
68+
ytest = cat(reshape(ytest,(100,1024,1)),
69+
reshape(repeat(grid,100),(100,1024,1));
70+
dims=3) |> device
71+
```
72+
73+
Next we need to consider the shape that the `FourierLayer` expects the inputs to be, i.e. `[numInputs, grid, batch]`. But our dataset contains the batching dim as the first one, so we need to do some permuting:
74+
75+
```julia
76+
# Our net wants the input in the form (2,grid,batch), though,
77+
# So we permute
78+
xtrain, xtest = permutedims(xtrain,(3,2,1)), permutedims(xtest,(3,2,1)) |> device
79+
ytrain, ytest = permutedims(ytrain,(3,2,1)), permutedims(ytest,(3,2,1)) |> device
80+
```
81+
82+
In order to slice the data into mini-batches, we pass the arrays to the Flux `DataLoader`.
83+
84+
```julia
85+
# Pass the data to the Flux DataLoader and give it a batch of 20
86+
train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true) |> device
87+
test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false) |> device
88+
```
89+
90+
We can now set up the architecture. We lift the inputs to a higher-dimensional space via a simple linear transform using a `Dense` layer. The input dimensionality is 2, we will transform it to 128. After that, we set up 4 instances of a Fourier Layer where we keep only 16 of the `N/2 + 1 = 513` modes that the FFT provides and use the GeLU activation. Finally, we reduce the latent space to the two output arrays we wish to obtain - `u1(x)` and `x`:
91+
92+
```julia
93+
# Set up the Fourier Layer
94+
# 128 in- and outputs, batch size 20 as given above, grid size 1024
95+
# 16 modes to keep, σ activation on the gpu
96+
layer = FourierLayer(128,128,1024,16,gelu,bias_fourier=false) |> device
97+
98+
# The whole architecture
99+
# linear transform into the latent space, 4 Fourier Layers,
100+
# then transform it back
101+
model = Chain(Dense(2,128;bias=false), layer, layer, layer, layer,
102+
Dense(128,2;bias=false)) |> device
103+
```
104+
105+
The rest is more or less boilerplate training code for a DNN. We employ the ADAM optimizer with a fixed learning rate of 1e-3, use the mean squared error as loss, evaluate the test loss as callback and train the FNO for 500 epochs.
106+
107+
```julia
108+
# We use the ADAM optimizer for training
109+
learning_rate = 0.001
110+
opt = ADAM(learning_rate)
111+
112+
# Specify the model parameters
113+
parameters = params(model)
114+
115+
# The loss function
116+
loss(x,y) = Flux.Losses.mse(model(x),y)
117+
118+
# Define a callback function that gives some output during training
119+
evalcb() = @show(loss(xtest,ytest))
120+
# Print the callback only every 5 seconds,
121+
throttled_cb = throttle(evalcb, 5)
122+
123+
# Do the training loop
124+
Flux.@epochs 500 train!(loss, parameters, train_loader, opt, cb = throttled_cb)
125+
```

‎docs/src/faq.md‎

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Currently, you need solved instances of the system you're trying to approximate
1010

1111
That is, you'll need to gather data (probably using numerical simulations) that include the solution vector, the grid and the parameters of the PDE (system).
1212

13+
In case you want to train a DeepONet, instantiating a grid is trivial since the sensor locations (the grid) does not necessarily need to match the discretization of the input function. So you can just create the arrays yourself as you like.
14+
1315
However, future work includes implementing physics-informed operator approximations which have been shown to be able to lighten the amount of training data needed or even alleviate it altogether (see e.g. [[1](https://doi.org/10.1126/sciadv.abi8605)] or [[2](http://arxiv.org/abs/2111.03794)]).
1416

1517
## What about hardware and distributed computing?

‎docs/src/index.md‎

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ Simply install by running in a REPL:
1818
pkg> add OperatorLearning
1919
```
2020

21-
## Usage/Examples
21+
## Usage
22+
23+
### Fourier Neural Operator
2224

2325
The basic workflow is more or less in line with the layer architectures that `Flux` provides, i.e. you construct individual layers, chain them if desired and pass the inputs as arguments to the layers.
2426

@@ -31,12 +33,34 @@ The syntax for a single Fourier Layer is:
3133
using OperatorLearning
3234
using Flux
3335

34-
# Input = 101, Output = 101, Batch size = 200, Grid points = 100, Fourier modes = 16
36+
# Input = 101, Output = 101, Grid points = 100, Fourier modes = 16
3537
# Activation: sigmoid (you need to import Flux in your Script to access the activations)
36-
model = FourierLayer(101, 101, 200, 100, 16, σ)
38+
model = FourierLayer(101, 101, 100, 16, σ)
3739

3840
# Same as above, but perform strict convolution in Fourier Space
39-
model = FourierLayer(101, 101, 200, 100, 16, σ; bias_fourier=false)
41+
model = FourierLayer(101, 101, 100, 16, σ; bias_fourier=false)
42+
```
43+
44+
To see a full implementation, check the corresponding [Burgers equation example](examples/burgers_FNO.md).
45+
46+
### DeepONet
47+
48+
The workflow here is a little different than with Fourier Neural Operator. In this case, you create the entire architecture by specifying two tuples corresponding to the architecture of branch and trunk net.
49+
50+
This creates a "vanilla" DeepONet where branch and trunk net are simply Chains of Dense layers. You can however use any other architecture in the subnets as well, as long as the outputs of the two match. Otherwise, the contraction operation won't work due to dimension mismatch.
51+
52+
```julia
53+
using OperatorLearning
54+
using Flux
55+
56+
# Create a DeepONet with branch 32 -> 64 -> 72 and sigmoid activation
57+
# and trunk 24 -> 64 -> 72 and tanh activation without biases
58+
model = DeepONet((32,64,72), (24,64,72), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
59+
60+
# Alternatively, set up your own nets altogether and pass them to DeepONet
61+
branch = Chain(Dense(2,128),Dense(128,64),Dense(64,72))
62+
trunk = Chain(Dense(1,24),Dense(24,72))
63+
model = DeepONet(branch,trunk)
4064
```
4165

42-
To see a full implementation, check the Burgers equation example at `examples/burgers.jl`.
66+
To see a full implementation, check the corresponding [Burgers equation example](examples/burgers_DeepONet.md).

‎examples/burgers_DeepONet.jl‎

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ ytest = vars["u"][end-99:end, 1:subsample:end] |> device;
3232
# `collect` converts data type `range` into an array
3333
grid = collect(range(0, 1, length=1024))' |> device
3434

35-
# Pass the data to the Flux DataLoader and give it a batch of 20
36-
#train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true) |> device
37-
#test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false) |> device
38-
3935
# Create the DeepONet:
4036
# IC is given on grid of 1024 points, and we solve for a fixed time t in one
4137
# spatial dimension x, making the branch input of size 1024 and trunk size 1

0 commit comments

Comments
 (0)