Adding a New Megatron Model Config

This guide walks through how to add a new Megatron model config to Primus for a real small model on Hugging Face that Primus does not configure yet.

As a concrete example, we will use:

Hugging Face repo: TinyLlama/TinyLlama-1.1B-Chat-v1.0
New Primus model config: tinyllama_1.1B.yaml

The goal is that you can follow this doc step by step and end up with a config that works end‑to‑end with primus-cli direct.

1. How Megatron model configs are wired

For Megatron, there are three layers of YAML config involved:

Experiment config (entry point):

# examples/megatron/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml
modules:
  pre_trainer:
    framework: megatron
    config: pre_trainer.yaml           # module-level trainer config

    # model to run
    model: llama3.1_8B.yaml            # model config name

Module config (trainer-level defaults):

# primus/configs/modules/megatron/pre_trainer.yaml
extends:
  - trainer_base.yaml
# ...

Model config (architecture + tokenizer):

# primus/configs/models/megatron/llama3.1_8B.yaml
extends:
  - llama3_8B.yaml

tokenizer_type: HuggingFaceTokenizer
tokenizer_model: meta-llama/Llama-3.1-8B

max_position_embeddings: 131072

At runtime:

modules.pre_trainer.model: llama3.1_8B.yaml is resolved as primus/configs/models/megatron/llama3.1_8B.yaml.
extends: - llama3_8B.yaml further chains into primus/configs/models/megatron/llama3_8B.yaml, which in turn extends llama3_base.yaml and so on.

Your new model config will fit into the same resolution chain.

2. Decide the architecture for your new model

Because TinyLlama/TinyLlama-1.1B-Chat-v1.0 is not shipped in Primus, there is no existing Megatron model YAML to extend from. You have two options:

Option A (recommended): extend from a generic base like language_model.yaml and explicitly set all TinyLlama architecture fields.
Option B: extend from the closest existing model and override fields.

For clarity, this guide uses Option A.

You will need to look up the following values from the TinyLlama config on Hugging Face (usually in config.json or the model card):

hidden_size
ffn_hidden_size
num_attention_heads
num_layers
num_query_groups (if applicable)
max_position_embeddings

You will copy those values into the new Megatron YAML.

3. Create `tinyllama_1.1B.yaml` under Megatron models

Create a new file (locally) under:

primus/configs/models/megatron/tinyllama_1.1B.yaml

with the following template:

extends:
  - language_model.yaml        # generic Megatron language model base

tokenizer_type: HuggingFaceTokenizer
tokenizer_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

hidden_size: 2048
ffn_hidden_size: 5632                # intermediate_size in HF config.json
num_attention_heads: 32
num_layers: 22                       # num_hidden_layers in HF config.json
num_query_groups: 8                  # num_attention_heads / num_key_value_heads = 32 / 4

max_position_embeddings: 2048
position_embedding_type: rope

Field meanings:

extends:
- Re-uses generic language-model defaults.
- If you prefer, you can instead extend from the closest existing model (e.g. llama3_8B.yaml) and then override all differing fields.
tokenizer_type:
- Case where you want to use a Hugging Face tokenizer implementation.
tokenizer_model:
- Hugging Face model repository to load tokenizer from.
- Example: TinyLlama/TinyLlama-1.1B-Chat-v1.0 or your own fine-tuned fork.

Note: This file is not added to the repo in this PR. It is meant as a local example you can create in your workspace.

4. Point an experiment config at your new model

Pick an existing Megatron experiment config (or create one) under examples/megatron/configs/... and change only the model field.

For example, starting from examples/megatron/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml, you can create a new experiment config like this:

work_group: ${PRIMUS_TEAM:amd}
user_name: ${PRIMUS_USER:root}
exp_name: ${PRIMUS_EXP_NAME:tinyllama_1.1B-pretrain}
workspace: ${PRIMUS_WORKSPACE:./output}

modules:
  pre_trainer:
    framework: megatron
    config: pre_trainer.yaml

    # model to run
    model: tinyllama_1.1B.yaml
    overrides:
      save: null
      disable_last_saving: true
      stderr_sink_level: DEBUG

      mock_data: true
      train_iters: 50
      micro_batch_size: 2
      global_batch_size: 128

      seq_length: 2048

All other overrides (batch size, seq length, LR, etc.) can stay the same initially; you can tune them later for your new model.

5. Run a quick verification run with primus-cli

Use primus-cli direct to validate that the config chain resolves correctly and the training loop can start.

./primus-cli direct -- \
  train pretrain \
  --config examples/megatron/configs/MI300X/tinyllama_1.1B-pretrain.yaml

You should see in the printed configuration and logs that:

The framework is megatron.
The model config being used is tinyllama_1.1B.yaml.
The tokenizer matches your new file.

6. Checklist for adding a Megatron model

Choose an appropriate base model under primus/configs/models/megatron/ (e.g. language_model.yaml, or a close existing LLaMA-style model).
Create primus/configs/models/megatron/tinyllama_1.1B.yaml: - extends: - <base_model>.yaml - set tokenizer_type, tokenizer_model - copy TinyLlama architecture fields (hidden_size, num_layers, etc.)
Update / create an experiment YAML under examples/megatron/configs/... to reference model: tinyllama_1.1B.yaml.
Run ./primus-cli direct -- train pretrain --config ... to validate the config chain and run a short training job.

Once these steps are done, your new tinyllama_1.1B config behaves like any other Megatron model in Primus and can be used in experiments, sweeps, and CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a New Megatron Model Config

1. How Megatron model configs are wired

2. Decide the architecture for your new model

3. Create `tinyllama_1.1B.yaml` under Megatron models

4. Point an experiment config at your new model

5. Run a quick verification run with primus-cli

6. Checklist for adding a Megatron model

FilesExpand file tree

adding-megatron-models.md

Latest commit

History

adding-megatron-models.md

File metadata and controls

Adding a New Megatron Model Config

1. How Megatron model configs are wired

2. Decide the architecture for your new model

3. Create tinyllama_1.1B.yaml under Megatron models

4. Point an experiment config at your new model

5. Run a quick verification run with primus-cli

6. Checklist for adding a Megatron model

3. Create `tinyllama_1.1B.yaml` under Megatron models