Skip to content

bartozl/Attention-based-activation-function

Repository files navigation

Combination of activation functions

The aim of this project is to study a new activation function, based on the combination of already known activation functions. In the following paragraphs, different approaches will be briefly explained. The code can be found in the mixed_activations.py file.

1. Linear combinator

The activation function is defined as follows:

  • parameters to be learned
  • base activation function (e.g. relu, sigmoid, etc.)
  • input
  • number of neurons of the layer
  • number of base activation functions

2. Non-linear combinator

The activation function is now computed by a Multi Layer Perceptron that takes as input the output of the basic activations (fit with the input). In pseudo-formula:

3. Attention-based combinator

Here, as in the first case, the activation function is the linear combination of the basic activation functions. However, the parameters (i.e. the weights of the combination) are obtained as the output of a MLP. In pseudo-formula:

with

Examples of mixed activation functions with antirelu, identity, sigmoid, tanh as base functions.

Linear
MLP
ATT

Train and test

python feedforward.py -config config_name.json

config_name = {basic, linear, non_linear}

Plot

python plot.py -args -dataset

args = {activations_, accuracy, table, table_max}

dataset = {MNIST, CIFAR10}

run_config.json

parameter type value description
network_type integer [1, 2] 1: neurons' number divided by 2 for each new layer
2: neuron's number is 300 for each new layer
nn_layers integer [2, inf) number of linear layers of the network
act_fn list of lists of strings ["antirelu", "identity", "relu", "sigmoid", "tanh"] basic activations function to combine
lambda_l1 list of floats (0.0, 0.00000005) l1 regularization scaling factor
normalize list of strings ["None", "Sigmoid", "Softmax"] alpha normalization (only for Linear combinator)
init list of strings ["None", "random", "uniform", "normal"] alpha initialization (only for Linear combinator)
dataset list of strings ["MNIST", "CIFAR10"] available datasets
subset float (0, 1) portion of dataset used
epochs integer (0, inf) number of epochs for training and test
random_seed integer (0, inf) allows reproducibility
combinator list of strings ["None", "Linear", "MLP1", "MLP2", "MLP_ATT", "MLP_ATT_neg"] available combinators
batch_size integer (0, inf) batch size for training/test
alpha_dropout list of floats (0, 1) alpha dropout for MLP_ATT combinator

Brief code description

  • feedforward.py is the starting point, it contains the main, the train and the test functions
  • utils.py contains all the auxiliary functions, both for computations and plotting. The most relevant functions for computation is:
    • generate_configs: based on run_config.json, create a list of configurations to be run
  • mixed_activations.py contains the MIX module (i.e. the core of the project). Also a jit version is implemented
  • modules.py contains auxiliary modules, such as:
    • Network: the neural network used for the experiments. Also a jit version is implemented
    • MLP1, MLP2, MLP_ATT, .... : all small networks needed in the MIX module
  • plot.py contains all plotting functions. Jit computed models are not plottable.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages