Skip to content

SNU-PI/akallama-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Akallama-eval

Eval task

task description link
csatqa Tasks related to SAT and other standardized testing questions for academic assessment.
haerae Tasks focused on assessing detailed factual and historical knowledge. https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/haerae/README.md
kobest A collection of tasks designed to evaluate understanding in Korean language. https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/kobest/README.md
LogicKor 한국어 언어모델 다분야 사고력 벤치마크 https://github.com/instructkr/LogicKor

Result

doc link

How to

  1. lm-evaluation-harness
  • Install
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
pip install lm_eval[wandb]
pip install wandb==0.18.0
  • Evaluation
    • Model : checkpoint 24, 48, 72, meta-llama/Llama-3.2-3B-Instruct, Qwen/Qwen2.5-3B-Instruct, LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
    • csatqa, haerae, kobest (dropped - kmmlu : too big, kbl : legal, kormedmcqa : medical)
cd script
bash harness_all_eval.sh
  1. LogicKor
  • Install
git clone https://github.com/instructkr/LogicKor.git
pip install vllm openai
  • Evaluation
    • Model : checkpoint 24, 48, 72, meta-llama/Llama-3.2-3B-Instruct, Qwen/Qwen2.5-3B-Instruct, LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
    • Before run, modify evaluator.py
    line 146 : output_file = output_dir / file_path.relative_to(Path(args.model_output_dir).parent.parent)
    line 173 : output_file_path = output_dir / file_path.relative_to(input_dir.parent.parent)
    
cd script
bash logic_all_eval.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages