Skip to content

Commit 37050b8

Browse files
authored
Fix lm_eval_harness for GPT models (bigscience-workshop#292)
1 parent 155ce98 commit 37050b8

File tree

7 files changed

+26
-17
lines changed

7 files changed

+26
-17
lines changed

Diff for: examples_deepspeed/MoE/ds_evalharness.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ TASKS="lambada"
2828
VOCAB_FILE=/data/Megatron-LM/data/gpt2-vocab.json
2929
MERGE_FILE=/data/Megatron-LM/data/gpt2-merges.txt
3030

31-
export HF_DATASETS_OFFLINE=1
31+
# export HF_DATASETS_OFFLINE=1
3232

3333
# Dummy arguments to make megatron happy. No need to configure them.
3434
# The reason we don't need to configure them and many other arguments is
@@ -53,6 +53,7 @@ CMD="../../tasks/eval_harness/evaluate.py \
5353
--no-load-rng \
5454
--inference \
5555
--disable-moe-token-dropping \
56+
--tokenizer-type GPT2BPETokenizer \
5657
--adaptive_seq_len\
5758
--eval_fp32\
5859
--task_list $TASKS\

Diff for: examples_deepspeed/MoE/readme_evalharness.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,10 @@ This particular setup uses the normal deepspeed checkpoint and requires no conve
1111
On login console with external network
1212

1313
Get lm-eval harness (https://github.com/EleutherAI/lm-evaluation-harness) and `best-download==0.0.7` needed to download some tasks.
14+
Below package version numbers are what we tested that work.
1415
```
1516
(maybe need pip install --upgrade pip)
16-
pip install best-download==0.0.7
17-
pip install lm-eval
18-
(previously we used "pip install git+https://github.com/EleutherAI/lm-evaluation-harness" to install, but later found the command above has less dependency issues)
17+
pip install best-download==0.0.7 lm-eval==0.2.0 datasets==1.15.1 transformers==4.20.1 huggingface-hub==0.8.1
1918
```
2019

2120
2. Pre-download needed datasets
@@ -33,7 +32,8 @@ Then install datasets for the tasks:
3332
```
3433
python ../../tasks/eval_harness/download.py --task_list hellaswag,lambada,triviaqa,webqs,winogrande,piqa,arc_challenge,arc_easy,openbookqa,race,boolq,cb,copa,rte,wic,wsc,multirc,record,anli_r1,anli_r2,anli_r3,wikitext,logiqa,mathqa,mc_taco,mrpc,prost,pubmedqa,qnli,qqp,sciq,sst,wnli
3534
```
36-
and make sure that `export HF_DATASETS_OFFLINE=1`
35+
36+
Previously we set `export HF_DATASETS_OFFLINE=1` to make the dataset offline after the above manual download. But somehow now this could trigger error on some kind of online verification for some of the datasets, so it's recommended to only set offline mode when necessary.
3737

3838
<!-- If there are things like custom tokenizers, pre-download those too, e.g.:
3939

Diff for: examples_deepspeed/compression/ds_evalharness.sh

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# This is an example zero-shot eval script. Please first read the readme_evalharness.md under the same directory.
1+
# This is an example zero-shot eval script. Please first read the readme_evalharness.md under the ../MoE directory.
22

33
# CHECKPOINT_PATH=/blob/users/minjiaz/compression_library/checkpoint/125M10L_Compression_Test_INT8_64gpu_lr6e-5_tokens5.25B_nocl_alpha-no_pp/global_step2000/
44
# CHECKPOINT_PATH=/blob/users/conglli/project/gpt3_with_pile/checkpoint/gpt3-with-pile-0.125B-lr-2.4e-3-minlr-6.0e-5-bs-2048-gpus-64-zero-0-mp-1-pp-1-no_pp-cl-startseqlen-72-step-27638-token-60B/global_step71000/
@@ -31,7 +31,7 @@ TASKS="lambada,wikitext"
3131
VOCAB_FILE=/blob/data/the_pile_public_merged_nopreprocessing/gpt2-vocab.json
3232
MERGE_FILE=/blob/data/the_pile_public_merged_nopreprocessing/gpt2-merges.txt
3333

34-
export HF_DATASETS_OFFLINE=1
34+
# export HF_DATASETS_OFFLINE=1
3535

3636
# Dummy arguments to make megatron happy. No need to configure them.
3737
# The reason we don't need to configure them and many other arguments is
@@ -56,6 +56,7 @@ CMD="../../tasks/eval_harness/evaluate.py \
5656
--no-load-rng \
5757
--inference \
5858
--disable-moe-token-dropping \
59+
--tokenizer-type GPT2BPETokenizer \
5960
--adaptive_seq_len\
6061
--eval_fp32\
6162
--task_list $TASKS\

Diff for: examples_deepspeed/data_efficiency/gpt/eval/ds_evalharness_1gpu.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ if [ ! -f "$merge_file" ]; then
2727
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
2828
fi
2929

30-
export HF_DATASETS_OFFLINE=1
30+
# export HF_DATASETS_OFFLINE=1
3131

3232
dir2=$(dirname "$checkpoint_path")
3333
dirname=$(basename "$dir2")/$(basename "$checkpoint_path")
@@ -58,6 +58,7 @@ command="../../../../tasks/eval_harness/evaluate.py \
5858
--no-load-rng \
5959
--inference \
6060
--disable-moe-token-dropping \
61+
--tokenizer-type GPT2BPETokenizer \
6162
--adaptive_seq_len \
6263
--eval_fp32 \
6364
--num_fewshot ${num_fewshot} \

Diff for: examples_deepspeed/data_efficiency/gpt/eval/ds_evalharness_parallel_run.sh

+1
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ num_fewshot=0
4848
num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
4949
cuda_id=-1
5050
total_mem=$(nvidia-smi --query-gpu=memory.total --format=csv -i 0 | grep -Eo [0-9]+)
51+
total_mem=$(( ${total_mem}*99/100 )) # somehow there could exist tiny (4MB or so) gpu memory leak
5152

5253
## Code below only works when you run each evalharness task on a single GPU.
5354
## For multi-GPU evalharness, check Megatron-DeepSpeed/blob/main/examples_deepspeed/MoE/ds_evalharness.sh

Diff for: examples_deepspeed/data_efficiency/gpt/eval/ds_evalharness_parallel_run_10shot.sh

+1
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ batch_size=16
4343
num_gpus=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
4444
cuda_id=-1
4545
total_mem=$(nvidia-smi --query-gpu=memory.total --format=csv -i 0 | grep -Eo [0-9]+)
46+
total_mem=$(( ${total_mem}*99/100 )) # somehow there could exist tiny (4MB or so) gpu memory leak
4647

4748
## Code below only works when you run each evalharness task on a single GPU.
4849
## For multi-GPU evalharness, check Megatron-DeepSpeed/blob/main/examples_deepspeed/MoE/ds_evalharness.sh

Diff for: tasks/eval_harness/evaluate.py

+13-9
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,10 @@
2323
from megatron import get_args
2424
from megatron import print_rank_0
2525
from megatron import get_tokenizer
26+
from megatron.core.enums import ModelType
2627
from megatron.core import mpu
2728
from megatron.training import setup_model_and_optimizer, get_model
28-
from megatron.mpu.mappings import gather_from_tensor_model_parallel_region
29+
from megatron.core.tensor_parallel.mappings import gather_from_tensor_model_parallel_region
2930

3031
from megatron.utils import get_ltor_masks_and_position_ids, unwrap_model
3132
from megatron.p2p_communication import recv_forward, send_forward
@@ -222,8 +223,7 @@ def _model_call(self, inps):
222223
a_output, *other_losses = self.model(tokens,
223224
position_ids,
224225
attention_mask,
225-
tokentype_ids=None,
226-
forward_method_parallel_output=False)
226+
tokentype_ids=None)
227227
output.append(a_output)
228228

229229
if output is not None:
@@ -320,7 +320,7 @@ def load_ds_checkpoint_and_setup_megatron(extra_args_provider):
320320
# avoid printing the arguments, since they will later be overridden.
321321
_print_args = megatron.arguments._print_args
322322
megatron.arguments._print_args = lambda *_args, **kwarg: None
323-
args = _parse_args(extra_args_provider)
323+
args = parse_args(extra_args_provider=extra_args_provider)
324324

325325
ds_checkpoint = DeepSpeedCheckpoint(args.load,
326326
tp_degree=args.tensor_model_parallel_size,
@@ -340,20 +340,24 @@ def load_ds_checkpoint_and_setup_megatron(extra_args_provider):
340340
cp_args.bf16 = False
341341
cp_args.params_dtype = torch.float32
342342

343+
cp_args.tokenizer_type = 'GPT2BPETokenizer'
344+
343345
override_args(args, cp_args, skip_keys, skip_if_specified)
344346

345347
# stop megatron from reparsing the arguments.
346-
megatron.global_vars._parse_args = lambda *_args, **kwarg: args
348+
megatron.arguments.parse_args = lambda *_args, **kwarg: args
349+
megatron.global_vars._ensure_var_is_not_initialized = lambda *_args, **kwarg: None
347350
megatron.global_vars._GLOBAL_ARGS = args
348351

349-
initialize_megatron()
352+
initialize_megatron(extra_args_provider=extra_args_provider)
353+
megatron.global_vars._GLOBAL_ARGS = args
350354
torch.distributed.barrier()
351355

352356
# Initializing megatron will update eg. tokenizer size. Override again.
353357
override_args(args, cp_args, skip_keys, skip_if_specified)
354358

355359
# print final arguments.
356-
_print_args(args)
360+
_print_args("eval_harness arguments", args)
357361
if args.deepspeed:
358362

359363
# Hack #3:
@@ -369,7 +373,7 @@ def load_ds_checkpoint_and_setup_megatron(extra_args_provider):
369373

370374
cp_path = args.load
371375
args.load = None
372-
model, _, _ = setup_model_and_optimizer(model_provider)
376+
model, _, _ = setup_model_and_optimizer(model_provider, ModelType.encoder_or_decoder)
373377
model = model[0]
374378
zero_enabled = model._config.zero_enabled
375379
model._config.zero_enabled = False
@@ -399,7 +403,7 @@ def tasks_args(parser):
399403
group.add_argument('--eval_fp32', default = False, action='store_true', help='Should the evaluation run in fp32')
400404
return parser
401405

402-
from megatron.global_vars import _parse_args
406+
from megatron.arguments import parse_args
403407

404408
def main():
405409
start = time.time()

0 commit comments

Comments
 (0)