So, our traditional transformer Proposed in Attention is all you need. The paper solves the biggest problem of RNN (Recurrent Neural Network), gradient vanishing, but the Transformer requires too much time and computation to train and evaluate.
The Linformer is proposed first time for the linear complexity attention mechanism.
Traditional transformers have quadratic time complexity which depends on the D_Model but the Linformer depends on the Sequence Length. Also, Linformer allows you to project Key-value pair sharing and Headwise sharing which reduces the computation of time and memory.
pip install -r requirements.txt
linformer/
├── model/ # Linformer implementation
├── __init__.py
│ ├── Linformer.py # Core Linformer model
│ ├── embeddings.py # Embeddings implementation
│ └── attention.py # Attention implementation
├── requirements.txt # Python dependencies
├── train.py # Training script
├── config.py # Set configurations for model training
├── dataset.py # DataLoader stuff like that
├── utils.py # some use full functions
└── README.md # Project documentation
Too run the script:
python train.py --epoch 10
Note: Arguments
--epoch 10 set the number of epoch to 10.
--workers 2 set the number of workers in dataloader to 2.
--datalen 500 set the dataset size to 500.
--srclang "en" set the src lang to "en".
--tgtlang "it" set the tgt lang to "it".
- Liformer Paper: Linformer: Self-Attention with Linear Complexity
- Special thanks to @hkproj