Utility scripts. Run once to set up the environment.
Downloads calibration and evaluation data from HuggingFace and saves them as JSONL files
in data/calibration/ and data/eval/.
Run once after cloning:
python scripts/prepare_data.pyDownload specific datasets only:
python scripts/prepare_data.py --dataset gsm8k alpaca humanevalAvailable dataset names:
wikitext2, alpaca, gsm8k, humaneval, qa, sharegpt, sum
Output files (per dataset):
data/calibration/{name}_128.jsonl— 128 samples for AWQ/GPTQ calibrationdata/eval/{name}_eval.jsonl— full eval split for PPL measurement
Requirements: Internet connection (one-time). The downloaded files are committed to the repository, so collaborators and CI don't need to run this again.