Skip to content

Commit 5886dc4

Browse files
committed
re-organise built-in layers section
1 parent 4662c4c commit 5886dc4

File tree

2 files changed

+78
-31
lines changed

2 files changed

+78
-31
lines changed

docs/src/models/layers.md

+75-31
Original file line numberDiff line numberDiff line change
@@ -1,86 +1,130 @@
1-
# Basic Layers
1+
# Built-in Layer Types
22

3-
These core layers form the foundation of almost all neural networks.
3+
If you started at the beginning, then you have already met the basic [`Dense`](@ref) layer, and seen [`Chain`](@ref) for combining layers. These core layers form the foundation of almost all neural networks.
4+
5+
The `Dense` layer
6+
7+
* Weight matrices are created ... Many layers take an `init` keyword, accepts a function acting like `rand`. That is, `init(2,3,4)` creates an array of this size. ... always on the CPU.
8+
9+
* An activation function. This is broadcast over the output: `Flux.Scale(3, tanh)([1,2,3]) ≈ tanh.(1:3)`
10+
11+
* The bias vector is always intialised `Flux.zeros32`. The keyword `bias=false` will turn this off.
12+
13+
14+
* All layers are annotated with `@layer`, which means that `params` will see the contents, and `gpu` will move their arrays to the GPU.
15+
16+
17+
## Fully Connected
418

519
```@docs
6-
Chain
720
Dense
21+
Flux.Bilinear
22+
Flux.Scale
823
```
924

10-
## Convolution and Pooling Layers
25+
Perhaps `Scale` isn't quite fully connected, but it may be thought of as `Dense(Diagonal(s.weights), s.bias)`, and LinearAlgebra's `Diagonal` is a matrix which just happens to contain many zeros.
26+
27+
## Convolution Models
1128

1229
These layers are used to build convolutional neural networks (CNNs).
1330

31+
They all expect images in what is called WHCN order: a batch of 32 colour images, each 50 x 50 pixels, will have `size(x) == (50, 50, 3, 32)`. A single grayscale image might instead have `size(x) == (28, 28, 1, 1)`.
32+
33+
Besides images, 2D data, they also work with 1D data, where for instance stereo sound recording with 1000 samples might have `size(x) == (1000, 2, 1)`. They will also work with 3D data, `ndims(x) == 5`, where again the last two dimensions are channel and batch.
34+
35+
To understand how `stride` ?? there's a cute article.
36+
1437
```@docs
1538
Conv
1639
Conv(weight::AbstractArray)
17-
AdaptiveMaxPool
18-
MaxPool
19-
GlobalMaxPool
20-
AdaptiveMeanPool
21-
MeanPool
22-
GlobalMeanPool
23-
DepthwiseConv
2440
ConvTranspose
2541
ConvTranspose(weight::AbstractArray)
2642
CrossCor
2743
CrossCor(weight::AbstractArray)
44+
DepthwiseConv
2845
SamePad
2946
Flux.flatten
3047
```
3148

32-
## Upsampling Layers
49+
### Pooling
50+
51+
These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.
52+
53+
```@docs
54+
AdaptiveMaxPool
55+
MaxPool
56+
GlobalMaxPool
57+
AdaptiveMeanPool
58+
MeanPool
59+
GlobalMeanPool
60+
```
61+
62+
## Upsampling
63+
64+
The opposite of pooling, these layers increase the size of an array. They have no trainable parameters.
3365

3466
```@docs
3567
Upsample
3668
PixelShuffle
3769
```
3870

39-
## Recurrent Layers
71+
## Embedding Vectors
4072

41-
Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).
73+
These layers accept an index, and return a vector (or several indices, and several vectors). The possible embedding vectors are learned parameters.
4274

4375
```@docs
44-
RNN
45-
LSTM
46-
GRU
47-
GRUv3
48-
Flux.Recur
49-
Flux.reset!
76+
Flux.Embedding
77+
Flux.EmbeddingBag
5078
```
5179

52-
## Other General Purpose Layers
80+
## Dataflow Layers, or Containers
5381

54-
These are marginally more obscure than the Basic Layers.
55-
But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).
82+
The basic `Chain(F, G, H)` applies the layers it contains in sequence, equivalent to `H ∘ G ∘ F`. Flux has some other layers which contain layers, but connect them up in a more complicated way: `SkipConnection` allows ResNet's ??residual connection.
83+
84+
These are all defined with [`@layer`](@ref)` :exand TypeName`, which tells the pretty-printing code that they contain other layers.
5685

5786
```@docs
87+
Chain
88+
Flux.activations
5889
Maxout
5990
SkipConnection
6091
Parallel
61-
Flux.Bilinear
62-
Flux.Scale
63-
Flux.Embedding
92+
PairwiseFusion
93+
```
94+
95+
## Recurrent Models
96+
97+
Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).
98+
99+
```@docs
100+
RNN
101+
LSTM
102+
GRU
103+
GRUv3
104+
Flux.Recur
105+
Flux.reset!
64106
```
65107

66108
## Normalisation & Regularisation
67109

68-
These layers don't affect the structure of the network but may improve training times or reduce overfitting.
110+
These layers don't affect the structure of the network but may improve training times or reduce overfitting. Some of them contain trainable parameters, while others do not.
69111

70112
```@docs
71-
Flux.normalise
72113
BatchNorm
73114
Dropout
74-
Flux.dropout
75115
AlphaDropout
76116
LayerNorm
77117
InstanceNorm
78118
GroupNorm
119+
Flux.normalise
120+
Flux.dropout
79121
```
80122

81-
### Testmode
123+
### Test vs. Train
124+
125+
Several normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference.
82126

83-
Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `Flux.testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
127+
The functions `Flux.trainmode!` and `Flux.testmode!` let you manually specify which behaviour you want. When called on a model, they will place all layers within the model into the specified mode.
84128

85129
```@docs
86130
Flux.testmode!

src/layers/basic.jl

+3
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,9 @@ function Base.show(io::IO, l::Dense)
182182
print(io, ")")
183183
end
184184

185+
Dense(W::LinearAlgebra.Diagonal, bias = true, σ = identity) =
186+
Scale(W.diag, bias, σ)
187+
185188
"""
186189
Scale(size::Integer..., σ=identity; bias=true, init=ones32)
187190
Scale(scale::AbstractArray, [bias, σ])

0 commit comments

Comments
 (0)