Name	Name	Last commit message	Last commit date
parent directory ..
bf16_config	bf16_config
data_preprocessing	data_preprocessing
download	download
CHANGES.md	CHANGES.md
LICENSE	LICENSE
README.md	README.md
__init__.py	__init__.py
albert_mrpc_main.py	albert_mrpc_main.py
albert_pretraining_bookswiki_main.py	albert_pretraining_bookswiki_main.py
albert_pretraining_overfit_main.py	albert_pretraining_overfit_main.py
albert_squad_main.py	albert_squad_main.py
classifier_utils.py	classifier_utils.py
custom_layer_norm.py	custom_layer_norm.py
fine_tuning_utils.py	fine_tuning_utils.py
lamb_optimizer.py	lamb_optimizer.py
modeling.py	modeling.py
optimization.py	optimization.py
requirements.txt	requirements.txt
run_classifier.py	run_classifier.py
run_pretraining.py	run_pretraining.py
run_squad_v1.py	run_squad_v1.py
squad_utils.py	squad_utils.py
tokenization.py	tokenization.py

ALBERT

For more information about training deep learning models on Gaudi, visit developer.habana.ai.

Please visit this page for performance information.

Model Overview
Setup
ALBERT Fine-Tuning
Downloading the datasets
Training the Model
Examples
Advanced
Changelog

Model Overview

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm by Google. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

This release supports Albert Finetuning on 1 and 8 cards.

Our implementation is a fork of Google Research ALBERT. Please visit this page for performance information.

Setup

Please follow the instructions given in the following link for setting up the environment including the $PYTHON environment variable: Gaudi Installation Guide. This guide will walk you through the process of setting up your system to run the model on Gaudi.

ALBERT Fine-Tuning

Suited for tasks:
- mrpc: Microsoft Research Paraphrase Corpus (MRPC) is a paraphrase identification dataset, where systems aim to identify if two sentences are paraphrases of each other.
- squad: Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
Default hyperparameters:
- dataset: squad
- predict_batch_size: 8
- max_seq_length: 384
- doc_stride: 128
- max_query_length: 64
- learning_rate: 5e-5
- num_train_epochs: 2.0
- warmup_proportion: 0.1
- save_checkpoints_steps: 5000
- do_lower_case: true
- do_train: true
- do_predict: true
- use_einsum: false
- n_best_size: 20
- max_answer_length: 30
The output will be saved in $HOME/tmp by default.

Downloading the datasets

For finetuning task, since it is using the same datasets as in bert, please follow the steps in Model-References/TensorFlow/nlp/bert/README.md section "Download and preprocess the datasets for Pretraining and Finetuning"

Training the Model

In the docker container, clone this repository and switch to the branch that matches your SynapseAI version. (Run the hl-smi utility to determine the SynapseAI version.)

git clone -b [SynapseAI version] https://github.com/HabanaAI/Model-References

Go to the ALBERT directory:

cd Model-References/TensorFlow/nlp/albert
pip install -r requirements.txt

If Model-References repository path is not in the PYTHONPATH, make sure you update it:

export PYTHONPATH=$PYTHONPATH:/path/to/Model-References

Download the pretrained model to the current folder:

For ALBERT Base:

$PYTHON download/download_pretrained_model.py https://storage.googleapis.com/albert_models/ albert_base_v1

For ALBERT Large:

$PYTHON download/download_pretrained_model.py https://storage.googleapis.com/albert_models/ albert_large_v1

Examples

The training can be run with run_pretraining.py for pretraining, run_classifier for finetuning with MRPC dataset, and run_squad_v1.py with SQuAD dataset:

The following examples assume that the datasets are in a directory /data/tensorflow/ :

Single Gaudi card finetuning of albert Large, using SQuAD dataset on bfloat16 precision:

TF_BF16_CONVERSION=/path/to/Model-References/TensorFlow/nlp/albert/bf16_config/albert.json \
$PYTHON run_squad_v1.py \
  --train_feature_file=/root/tmp/albert_large/train_feature_file.tf_record \
  --predict_feature_file=/root/tmp/albert_large/predict_feature_file.tf_record \
  --predict_feature_left_file=/root/tmp/albert_large/predict_feature_left_file.tf_record \
  --spm_model_file=albert_large_v1/30k-clean.model \
  --vocab_file=albert_large_v1/30k-clean.vocab \
  --albert_config_file=albert_large_v1/albert_config.json \
  --init_checkpoint=albert_large_v1/model.ckpt-best \
  --do_train=True \
  --train_file=/data/tensorflow/squad/train-v1.1.json \
  --do_predict=True \
  --predict_file=/data/tensorflow/squad/dev-v1.1.json \
  --train_batch_size=32 \
  --learning_rate=3e-05 \
  --num_train_epochs=2 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=/root/tmp/albert_large \
  --use_horovod=false \
  --enable_scoped_allocator=False \
  --save_checkpoints_steps=5000

NOTE: mpirun map-by PE attribute value may vary on your setup. Please refer to the instructions on mpirun Configuration for calculation.

8 Gaudi cards finetuning of ALBERT Large in bfloat16 precision using SQuAD dataset on a single box (8 cards):

cd /path/to/Model-References/TensorFlow/nlp/albert/

mpirun --allow-run-as-root \
  -x TF_BF16_CONVERSION=/path/to/Model-References/TensorFlow/nlp/albert/bf16_config/albert.json \
  --tag-output \
  --merge-stderr-to-stdout \
  --output-filename /root/tmp/albert_log \
  --bind-to core \
  --map-by socket:PE=7 \
  -np 8 \
  $PYTHON run_squad_v1.py \
    --train_feature_file=/root/tmp/albert_large/train_feature_file.tf_record \
    --predict_feature_file=/root/tmp/albert_large/predict_feature_file.tf_record \
    --predict_feature_left_file=/root/tmp/albert_large/predict_feature_left_file.tf_record \
    --spm_model_file=albert_large_v1/30k-clean.model \
    --vocab_file=albert_large_v1/30k-clean.vocab \
    --albert_config_file=albert_large_v1/albert_config.json \
    --init_checkpoint=albert_large_v1/model.ckpt-best \
    --do_train=True \
    --train_file=/data/tensorflow/squad/train-v1.1.json \
    --do_predict=True \
    --predict_file=/data/tensorflow/squad/dev-v1.1.json \
    --train_batch_size=32 \
    --learning_rate=3e-05 \
    --num_train_epochs=2 \
    --max_seq_length=384 \
    --doc_stride=128 \
    --output_dir=/root/tmp/albert_large \
    --use_horovod=true \
    --enable_scoped_allocator=False \
    --save_checkpoints_steps=5000  \
2>&1 | tee ~/hlogs/albert_large_finetuning_bf16_squad_8_cards.txt

8 Gaudi cards finetuning of ALBERT Large in bfloat16 precision using SQuAD dataset on a K8s single box (8 cards):

Make sure to download the pretrained model for ALBERT Large or Base to a location that can be accessed by the workers, such as the folder for dataset.

Make sure the Python packages in requirements.txt are installed in all the workers.

mpirun --allow-run-as-root \
       --bind-to core \
       --map-by socket:PE=6 \
       -np 8 \
       --tag-output \
       --merge-stderr-to-stdout \
       bash -c "cd /root/Model-References/TensorFlow/nlp/albert;\
       $PYTHON run_squad_v1.py \
          --train_feature_file=/root/tmp/albert_large/train_feature_file.tf_record \
          --predict_feature_file=/root/tmp/albert_large/predict_feature_file.tf_record \
          --predict_feature_left_file=/root/tmp/albert_large/predict_feature_left_file.tf_record \
          --spm_model_file=/data/tensorflow/albert_large_v1/30k-clean.model \
          --vocab_file=/data/tensorflow/albert_large_v1/30k-clean.vocab \
          --albert_config_file=/data/tensorflow/albert_large_v1/albert_config.json \
          --init_checkpoint=/data/tensorflow/albert_large_v1/model.ckpt-best \
          --do_train=True \
          --train_file=/data/tensorflow/squad/train-v1.1.json \
          --do_predict=True \
          --predict_file=/data/tensorflow/squad/dev-v1.1.json \
          --train_batch_size=32 \
          --learning_rate=3e-05 \
          --num_train_epochs=2 \
          --max_seq_length=384 \
          --doc_stride=128 \
          --output_dir=/root/tmp/albert_large \
          --use_horovod=true \
          --enable_scoped_allocator=False \
          --save_checkpoints_steps=5000"  \
2>&1 | tee ~/hlogs/albert_large_ft_squad_8cards.txt

Advanced

Scripts

run_classifier.py: Script implementing MRPC task.
run_squad_v1.py: Script implementing SQUAD task.

Supported Configuration

Model	xHPU	Device	SynapseAI Version	TensorFlow Version(s)
Albert-Large FT (SQUAD)	1 HPU	Gaudi	1.3.0	2.8.0 , 2.7.1
Albert-Large FT (SQUAD)	8 HPU	Gaudi	1.3.0	2.8.0 , 2.7.1

Changelog

1.4.0

Import horovod-fork package directly instead of using Model-References' TensorFlow.common.horovod_helpers; wrapped horovod import with a try-catch block so that the user is not required to install this library when the model is being run on a single card
References to custom demo script were replaced by community entry points in README.
Switched from depracated flag TF_ENABLE_BF16_CONVERSION to TF_BF16_CONVERSION

1.3.0

adding handling of save_checkpoints_steps parameter and change default to 5000
removal obsolete demo_albert (bash script)
move BF16 config json file from TensorFlow/common/ to model's dir
update requirements.txt
remove redundant imports
Change python or python3 to $PYTHON to execute correct version based on environment setup.

1.2.0

cleanup script from deprecated habana_model_runner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ALBERT

Table of Contents

Model Overview

Setup

ALBERT Fine-Tuning

Downloading the datasets

Training the Model

Examples

Advanced

Scripts

Supported Configuration

Changelog

1.4.0

1.3.0

1.2.0

FilesExpand file tree

albert

Directory actions

More options

Directory actions

More options

Latest commit

History

albert

Folders and files

parent directory

README.md

ALBERT

Table of Contents

Model Overview

Setup

ALBERT Fine-Tuning

Downloading the datasets

Training the Model

Examples

Advanced

Scripts

Supported Configuration

Changelog

1.4.0

1.3.0

1.2.0