Skip to content

Commit

Permalink
add doc for megatron-gpt
Browse files Browse the repository at this point in the history
  • Loading branch information
yukirora committed Dec 5, 2023
1 parent ca49e7d commit 55bdffa
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions docs/user-tutorial/benchmarks/model-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,24 @@ For inference, supported percentiles include
| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput | throughput (samples/s) | The average inference throughput with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time\_${percentile} | time (ms) | The n<sup>th</sup> percentile inference step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with fp32/fp16 precision. |


## Megatron Model benchmarks

### `megatron-gpt`

#### Introduction

Run GPT pretrain tasks with float32, float16, bfloat16 precisions with [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) or [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed).

`tips: batch_size in this benchmark represents global batch size, the batch size on each GPU instance is micro_batch_size.`

#### Metrics
| Name | Unit | Description |
|---------------------------------------------------|------------------------|-------------------------------------------------------------------------|
| megatron-gpt/${precision}_train_step_time | time (ms) | The average training step time between each whole global batch. |
| megatron-gpt/${precision}_train_throughput | throughput (samples/s) | The average training throughput between each whole global batch. |
| megatron-gpt/${precision}_train_tflops | tflops/s | The average training tflops per second between each whole global batch. |
| megatron-gpt/${precision}_train_mem_allocated | GB | The average GPU memory allocated between each whole global batch. |
| megatron-gpt/${precision}_train_max_mem_allocated | GB | The average max GPU memory allocated between each whole global batch. |

0 comments on commit 55bdffa

Please sign in to comment.