Understanding-Transformer-Attention-Weights This repository contains code to compute and visualise the attention weights across different layers and different heads.