TCN_TRAS_power

use TCN and Transformer model for forecasting on "Hourly Energy Consumption" data

1. Temporal Convolutional Networks

Advantages of TCN over RNN
- Parallelism
  TCN can process data in parallel without requiring sequential processing like RNN.
- Flexible receptive field
  The size of the receptive field of TCN is determined by the number of layers, the size of the convolution kernel, and the expansion coefficient. It can be flexibly customized according to different characteristics of different tasks.
- Stable gradient
  RNNs often have problems of gradient disappearance and gradient explosion, which are mainly caused by shared parameters in different time periods. Like traditional convolutional neural networks, TCNs are less prone to gradient disappearance and explosion problems.
- The memory is lower
  When RNN is used, it needs to save the information of each step, which will occupy a lot of memory. The convolution kernel of TCN is shared in one layer, and the memory usage is lower.

ex : kernel size=2,dilations=[1,2,4,8]

step 1 : Causal Convolutions

step 2 : Dilated + Causal Convolutions

In order to effectively deal with long historical information

Supplementary information : Dilated Non-Causal Convolutions

(kernel size=3)

step 3 : Add Residual block

Adaptive model depth, alleviating the phenomenon of gradient vanishing and exploding

Supplementary information : Dropout

Prevent units from co-adapting too much, and turn on all neurons during Inference, so that the average value of each small neural network can be easily estimated, using dropout can greatly reduce the possibility of overfitting

Parameters setting

Given input_length, kernel_size, dilation_base and the minimum number of layers required for full history coverage, the basic TCN network would look something like this:

Transformer model

The motivation of Transformer is to solve the following problems of RNNs:
- Sequential computation is difficult to parallelize.
- Use the attention mechanism to solve the problem that RNNs cannot rely on long distances. The farther the words are, the harder it is to find each other's information.
- Excessive RNN architecture may cause the gradient to disappear

Structure

Transformer consists of Encoder and Decoder, which are very different in nature and should be understood separately. The former is used to compress sequence information, and the latter is used to convert (decompress) the information extracted by the former into information required by the task.

Component features 1 : Self-attention

The role of Query and Key is to determine the weight of Value, and Key and Value form a token. When Query and Key are strongly related to the task, the Value corresponding to Key will be enlarged.

Component features 2 : Muti-Head Attention

The advantage of this is that each head (q, k, v) can focus on different information, some focus on local, some focus on global information, etc.

Component features 3 : Add & Norm layer

Since the input and output have the same shape, we can use residual connection to add the output to the input. And use LayerNorm to normalize in the direction of each instance.

Component features 4 : Layer Normalization vs Batch Normalization

Layer Normalization (LN) is often used in RNN. The concept is similar to BN. The difference is that LN normalizes each sample.

Component features 5 :Sequence Mask

The Decoder is not allowed to see future messages, so we have to cover up the vectors after i+1.

References

https://unit8.com/resources/temporal-convolutional-networks-and-forecasting/ https://www.twblogs.net/a/5d6dc709bd9eee541c33c0b2
https://meetonfriday.com/posts/7c0020de/
https://medium.com/@a5560648/dropout-5fb2105dbf7c
https://zhuanlan.zhihu.com/p/52477665 https://ithelp.ithome.com.tw/articles/10206317 https://medium.com/%E5%B7%A5%E4%BA%BA%E6%99%BA%E6%85%A7/review-attention-is-all-you-need-62a1c93c48a5
https://medium.com/ching-i/transformer-attention-is-all-you-need-c7967f38af14

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
code_file		code_file
data		data
trained_model		trained_model
README.md		README.md
env_set		env_set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCN_TRAS_power

1. Temporal Convolutional Networks

step 1 : Causal Convolutions

step 2 : Dilated + Causal Convolutions

Supplementary information : Dilated Non-Causal Convolutions

step 3 : Add Residual block

Supplementary information : Dropout

Parameters setting

Transformer model

Structure

Component features 1 : Self-attention

Component features 2 : Muti-Head Attention

Component features 3 : Add & Norm layer

Component features 4 : Layer Normalization vs Batch Normalization

Component features 5 :Sequence Mask

References

About

Releases

Packages

Languages

linyu096/TCN_TRAS_power

Folders and files

Latest commit

History

Repository files navigation

TCN_TRAS_power

1. Temporal Convolutional Networks

step 1 : Causal Convolutions

step 2 : Dilated + Causal Convolutions

Supplementary information : Dilated Non-Causal Convolutions

step 3 : Add Residual block

Supplementary information : Dropout

Parameters setting

Transformer model

Structure

Component features 1 : Self-attention

Component features 2 : Muti-Head Attention

Component features 3 : Add & Norm layer

Component features 4 : Layer Normalization vs Batch Normalization

Component features 5 :Sequence Mask

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages