GitHub - giangdip2410/Brainformer-SMOE: Brainformer SMOE

Brainformer SMOE

This repository contains the code for running the character-level Brainformers from our paper.

Requirements

pytorch
fastmoe: https://github.com/laekov/fastmoe
transformer: https://github.com/huggingface/transformers You need CUDA 11 and PyTorch 1.10.0 or above to run this code. See this page for installation instructions. To replicate our experimental conditions one A100 GPU is needed.

Download Data

The scripts for donwloading enwiki8 and text8 datset, run:

bash get_data.sh

Running experiments in the paper

Pretraining Brainformer on enwik8:

The scripts for training the character-level models from the paper are located in the ./experiments/ directory. For example, to train the enwik8 model, run:

bash train.sh

We used eight V100 GPUs, but if you'd like to run this model on GPUs with less memory you can increase the --batch-split (it splits batches into smaller pieces without changing the final result).

The `--architecture` parameter

A standard transformer with 3 layers (so 6 self-attention and feedforward sublayers) would use be trained using --architecture sfsfsf. That 6 sublayer model with a sandwiching coefficient of 1 would be --architecture s.sfsf.f and with a sandwiching coefficient of 2 would be --architecture s.s.sf.f.f. Make sure to also set the --nlayers parameter to be the length of the architecture string divided by 2.

License

The code is licensed under CC-BY-NC license. See the LICENSE file for more details.

Acknowledgements + More Information

This code is based on the code of the Sandwitc and Adaptive Span model. We recommend reading the Adaptive Span README for further information on this codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
experiments		experiments
fastermoe		fastermoe
gates		gates
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
adagrad_with_grad_clip.py		adagrad_with_grad_clip.py
adaptive_span.py		adaptive_span.py
config.py		config.py
custom_gate.py		custom_gate.py
custom_layers.py		custom_layers.py
custom_transformer.py		custom_transformer.py
custom_transformer2.py		custom_transformer2.py
custom_utils.py		custom_utils.py
data.py		data.py
functions.py		functions.py
get_data.sh		get_data.sh
get_pretrained.sh		get_pretrained.sh
linear.py		linear.py
main.py		main.py
models.py		models.py
new_utils.py		new_utils.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brainformer SMOE

Requirements

Download Data

Running experiments in the paper

Pretraining Brainformer on enwik8:

The `--architecture` parameter

License

Acknowledgements + More Information

About

Releases

Packages

Languages

License

giangdip2410/Brainformer-SMOE

Folders and files

Latest commit

History

Repository files navigation

Brainformer SMOE

Requirements

Download Data

Running experiments in the paper

Pretraining Brainformer on enwik8:

The --architecture parameter

License

Acknowledgements + More Information

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

The `--architecture` parameter

Packages