Skip to content

Commit a8dfcfc

Browse files
Re-organise "built-in layers" section (#2112)
* re-organise built-in layers section * fixup * more create_bias somewhere more logical * add a warning about other AD breaking automagic train mode * remove mention of at-layer macro for now * fix some links Co-authored-by: Saransh Chopra <[email protected]> Co-authored-by: Saransh Chopra <[email protected]>
1 parent bf3cf8b commit a8dfcfc

File tree

4 files changed

+85
-32
lines changed

4 files changed

+85
-32
lines changed

docs/src/models/basics.md

-1
Original file line numberDiff line numberDiff line change
@@ -233,5 +233,4 @@ Affine(3 => 1, bias=false, init=ones) |> gpu
233233

234234
```@docs
235235
Functors.@functor
236-
Flux.create_bias
237236
```

docs/src/models/layers.md

+80-30
Original file line numberDiff line numberDiff line change
@@ -1,86 +1,136 @@
1-
# Basic Layers
1+
# Built-in Layer Types
22

3+
If you started at the beginning of the guide, then you have already met the
4+
basic [`Dense`](@ref) layer, and seen [`Chain`](@ref) for combining layers.
35
These core layers form the foundation of almost all neural networks.
46

7+
The `Dense` exemplifies several features:
8+
9+
* It contains an an [activation function](@ref man-activations), which is broadcasted over the output. Because this broadcast can be fused with other operations, doing so is more efficient than applying the activation function separately.
10+
11+
* It take an `init` keyword, which accepts a function acting like `rand`. That is, `init(2,3,4)` should create an array of this size. Flux has [many such functions](@ref man-init-funcs) built-in. All make a CPU array, moved later with [`gpu`](@ref Flux.gpu) if desired.
12+
13+
* The bias vector is always intialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero.
14+
15+
* It is annotated with [`@functor`](@ref Functors.@functor), which means that [`params`](@ref Flux.params) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU.
16+
17+
By contrast, `Chain` itself contains no parameters, but connects other layers together.
18+
The section on [dataflow layers](@ref man-dataflow-layers) introduces others like this,
19+
20+
## Fully Connected
21+
522
```@docs
6-
Chain
723
Dense
24+
Flux.Bilinear
25+
Flux.Scale
826
```
927

10-
## Convolution and Pooling Layers
28+
Perhaps `Scale` isn't quite fully connected, but it may be thought of as `Dense(Diagonal(s.weights), s.bias)`, and LinearAlgebra's `Diagonal` is a matrix which just happens to contain many zeros.
29+
30+
## Convolution Models
1131

1232
These layers are used to build convolutional neural networks (CNNs).
1333

34+
They all expect images in what is called WHCN order: a batch of 32 colour images, each 50 x 50 pixels, will have `size(x) == (50, 50, 3, 32)`. A single grayscale image might instead have `size(x) == (28, 28, 1, 1)`.
35+
36+
Besides images, 2D data, they also work with 1D data, where for instance stereo sound recording with 1000 samples might have `size(x) == (1000, 2, 1)`. They will also work with 3D data, `ndims(x) == 5`, where again the last two dimensions are channel and batch.
37+
38+
To understand how strides and padding work, the article by [Dumoulin & Visin](https://arxiv.org/abs/1603.07285) has great illustrations.
39+
1440
```@docs
1541
Conv
1642
Conv(weight::AbstractArray)
17-
AdaptiveMaxPool
18-
MaxPool
19-
GlobalMaxPool
20-
AdaptiveMeanPool
21-
MeanPool
22-
GlobalMeanPool
23-
DepthwiseConv
2443
ConvTranspose
2544
ConvTranspose(weight::AbstractArray)
2645
CrossCor
2746
CrossCor(weight::AbstractArray)
47+
DepthwiseConv
2848
SamePad
2949
Flux.flatten
3050
```
3151

32-
## Upsampling Layers
52+
### Pooling
53+
54+
These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.
55+
56+
```@docs
57+
AdaptiveMaxPool
58+
MaxPool
59+
GlobalMaxPool
60+
AdaptiveMeanPool
61+
MeanPool
62+
GlobalMeanPool
63+
```
64+
65+
## Upsampling
66+
67+
The opposite of pooling, these layers increase the size of an array. They have no trainable parameters.
3368

3469
```@docs
3570
Upsample
3671
PixelShuffle
3772
```
3873

39-
## Recurrent Layers
74+
## Embedding Vectors
4075

41-
Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).
76+
These layers accept an index, and return a vector (or several indices, and several vectors). The possible embedding vectors are learned parameters.
4277

4378
```@docs
44-
RNN
45-
LSTM
46-
GRU
47-
GRUv3
48-
Flux.Recur
49-
Flux.reset!
79+
Flux.Embedding
80+
Flux.EmbeddingBag
5081
```
5182

52-
## Other General Purpose Layers
83+
## [Dataflow Layers, or Containers](@id man-dataflow-layers)
5384

54-
These are marginally more obscure than the Basic Layers.
55-
But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).
85+
The basic `Chain(F, G, H)` applies the layers it contains in sequence, equivalent to `H ∘ G ∘ F`. Flux has some other layers which contain layers, but connect them up in a more complicated way: `SkipConnection` allows ResNet's residual connection.
5686

5787
```@docs
88+
Chain
89+
Flux.activations
5890
Maxout
5991
SkipConnection
6092
Parallel
61-
Flux.Bilinear
62-
Flux.Scale
63-
Flux.Embedding
93+
PairwiseFusion
94+
```
95+
96+
## Recurrent Models
97+
98+
Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).
99+
100+
```@docs
101+
RNN
102+
LSTM
103+
GRU
104+
GRUv3
105+
Flux.Recur
106+
Flux.reset!
64107
```
65108

66109
## Normalisation & Regularisation
67110

68-
These layers don't affect the structure of the network but may improve training times or reduce overfitting.
111+
These layers don't affect the structure of the network but may improve training times or reduce overfitting. Some of them contain trainable parameters, while others do not.
69112

70113
```@docs
71-
Flux.normalise
72114
BatchNorm
73115
Dropout
74-
Flux.dropout
75116
AlphaDropout
76117
LayerNorm
77118
InstanceNorm
78119
GroupNorm
120+
Flux.normalise
121+
Flux.dropout
79122
```
80123

81-
### Testmode
124+
### Test vs. Train
125+
126+
Several normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference.
127+
128+
!!! warning
129+
This automatic train/test detection works best with Zygote, the default
130+
automatic differentiation package. It may not work with other packages
131+
such as Tracker, Yota, or ForwardDiff.
82132

83-
Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `Flux.testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
133+
The functions `Flux.trainmode!` and `Flux.testmode!` let you manually specify which behaviour you want. When called on a model, they will place all layers within the model into the specified mode.
84134

85135
```@docs
86136
Flux.testmode!

docs/src/utilities.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Random Weight Initialisation
1+
# [Random Weight Initialisation](@id man-init-funcs)
22

33
Flux initialises convolutional layers and recurrent cells with `glorot_uniform` by default.
44
Most layers accept a function as an `init` keyword, which replaces this default. For example:
@@ -42,6 +42,7 @@ Flux.ones32
4242
Flux.zeros32
4343
Flux.rand32
4444
Flux.randn32
45+
Flux.create_bias
4546
```
4647

4748
These functions call:

src/layers/basic.jl

+3
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,9 @@ function Base.show(io::IO, l::Dense)
182182
print(io, ")")
183183
end
184184

185+
Dense(W::LinearAlgebra.Diagonal, bias = true, σ = identity) =
186+
Scale(W.diag, bias, σ)
187+
185188
"""
186189
Scale(size::Integer..., σ=identity; bias=true, init=ones32)
187190
Scale(scale::AbstractArray, [bias, σ])

0 commit comments

Comments
 (0)