-
Notifications
You must be signed in to change notification settings - Fork 45
[PZ COMPETITION] yuanboyang final #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,151 @@ | ||
| # 决赛代码(可完整运行的代码库) | ||
|
|
||
| ## 文件结构 | ||
| project_root/ | ||
| ├── README.md # 使用说明(本文件) | ||
| ├── requirementsverl.txt # verl 训练环境依赖 | ||
| ├── requirementstest.txt # 测试/评测环境依赖 | ||
| ├── download.py # 训练集下载处理 | ||
| ├── verl/ # 修改过的 verl 源码 | ||
| │ └── verl/utils/reward_score/geo3k.py # reward 函数修改 | ||
| │ └── verl/examples/data_preprocess/gsm8k.py # 验证集下载处理 | ||
|
|
||
| ## 1. 数据下载与处理 | ||
| - 训练集下载处理:`download.py` | ||
| - 验证集下载处理:`verl/examples/data_preprocess/gsm8k.py` | ||
|
|
||
| ## 2. 代码修改说明 | ||
| ### 基于 [verl](https://github.com/volcengine/verl) 源码的修改 | ||
| - 主要修改点: | ||
| - 对于数据源和prompt的修改: | ||
| - examples/data_preprocess/gsm8k.py: | ||
| - 将 | ||
| ```python | ||
| import datasets | ||
| ... | ||
| data_source = "openai/gsm8k" | ||
| dataset = datasets.load_dataset(data_source, "main") | ||
| train_dataset = dataset["train"] | ||
| test_dataset = dataset["test"] | ||
| ``` | ||
| - 修改为 | ||
| ```python | ||
| from modelscope.msdatasets import MsDataset | ||
| ... | ||
| data_source = "hiyouga/geometry3k" # 注意:这里的源地址可能是一个笔误,但加载代码本身是针对 modelscope/gsm8k 的 | ||
| train_dataset = MsDataset.load('modelscope/gsm8k', subset_name='main', split='train', trust_remote_code=True) | ||
| test_dataset = MsDataset.load('modelscope/gsm8k', subset_name='main', split='test', trust_remote_code=True) | ||
| ``` | ||
| - 将 | ||
| ```python | ||
| instruction_following = 'Let\'s think step by step and output the final answer after "####".' | ||
| question = question_raw + " " + instruction_following | ||
| ``` | ||
| - 修改为 | ||
| ```python | ||
| instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:' | ||
| question = instruction + " " + question_raw | ||
| ``` | ||
| - 对于trust_remote_code=True的修改: | ||
| - verl/model_merger/base_model_merger.py: | ||
| - 将 | ||
| ```python | ||
| with init_empty_weights(): | ||
| model = auto_model_class.from_config( | ||
| self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=self.config.trust_remote_code | ||
| ) | ||
| ``` | ||
| - 修改为 | ||
| ```python | ||
| with init_empty_weights(): | ||
| model = auto_model_class.from_config( | ||
| self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True | ||
| ) | ||
| ``` | ||
| - verl/trainer/main_ppo.py: | ||
| - 将 | ||
| ```python | ||
| trust_remote_code = config.data.get("trust_remote_code", False) | ||
| ``` | ||
| - 修改为 | ||
| ```python | ||
| trust_remote_code = True | ||
| ``` | ||
| - verl/workers/fsdp_workers.py: | ||
| - 将 | ||
| ```python | ||
| trust_remote_code=trust_remote_code | ||
| ``` | ||
| - 修改为 | ||
| ```python | ||
| trust_remote_code=True | ||
| ``` | ||
|
|
||
| - 修改了 `verl/utils/reward_score/geo3k.py` 中的 reward 函数: | ||
| - verl/utils/reward_score/geo3k.py: | ||
| - 将 | ||
| ```python | ||
| pattern = re.compile(r"<think>.*</think>.*\\boxed\{.*\}.*", re.DOTALL) | ||
| ``` | ||
| - 修改为 | ||
| ```python | ||
| pattern = re.compile(r".*\\boxed\{.*\}.*", re.DOTALL) | ||
| ``` | ||
|
|
||
| ### 基于 [transformers](https://github.com/huggingface/transformers) 源码的修改 | ||
| - 修改文件: | ||
| - `/root/miniconda3/envs/verl/lib/python3.10/site-packages/transformers/configuration_utils.py` | ||
| - 修改内容: | ||
| - 将第 917 行改为: | ||
| ```python | ||
| json.dumps(config_dict, indent=2, sort_keys=False) + "\n" | ||
| ``` | ||
|
|
||
| ## 3. 环境依赖 | ||
| ```bash | ||
| # verl 环境 | ||
| pip install -r requirementsverl.txt | ||
|
|
||
| # 测试环境 | ||
| pip install -r requirementstest.txt | ||
| ``` | ||
| ## 4. 运行指令 | ||
| ```bash | ||
| nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ | ||
| algorithm.adv_estimator=grpo \ | ||
| data.train_files=/usr/train3.parquet \ # 需要自己修改位置 | ||
| data.train_batch_size=264 \ | ||
| data.max_prompt_length=2048 \ | ||
| data.max_response_length=512 \ | ||
| actor_rollout_ref.model.path=/root/.cache/modelscope/hub/models/BAAI/OpenSeek-Small-v1-SFT \ # 需要自己修改位置 | ||
| actor_rollout_ref.actor.optim.lr=1e-5 \ | ||
| actor_rollout_ref.actor.ppo_mini_batch_size=72 \ | ||
| actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ | ||
| actor_rollout_ref.rollout.name=vllm \ | ||
| +actor_rollout_ref.actor.fsdp_config.model_dtype=bf16 \ | ||
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \ | ||
| actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ | ||
| actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ | ||
| actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ | ||
| trainer.logger=tensorboard \ | ||
| trainer.val_before_train=True \ | ||
| trainer.n_gpus_per_node=6 \ | ||
| trainer.nnodes=1 \ | ||
| trainer.save_freq=200 \ | ||
| trainer.test_freq=10 \ | ||
| trainer.total_epochs=15 \ | ||
| data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
| actor_rollout_ref.rollout.n=6 \ | ||
| > train.log 2>&1 & | ||
|
Comment on lines
+114
to
+139
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 运行指令中包含了多个硬编码的路径(例如 例如: # 在您的脚本或环境中设置环境变量
export TRAIN_FILES=/path/to/your/train3.parquet
export MODEL_PATH=/path/to/your/model
# 然后在命令中使用它们
nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
data.train_files=$TRAIN_FILES \
actor_rollout_ref.model.path=$MODEL_PATH \
... |
||
| ``` | ||
| ## 5. 模型融合及评测 | ||
| ### 模型融合 | ||
| ```bash | ||
| python3 -m verl.model_merger merge \ | ||
| --backend fsdp \ | ||
| --local_dir /usr/checkpoints/verl_examples/gsm8k/global_step_8000/actor \ | ||
| --target_dir /usr/checkpoints/verl_examples/gsm8k/global_step_8000/actor/huggingface | ||
| ``` | ||
| ### 评测 | ||
| - 使用官方代码'/OpenSeek/evaluation/qwen_eval/sh/run_evaluate.sh' | ||
| - 以上均需要自行修改模型位置 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| import argparse | ||
| import os | ||
| from modelscope.msdatasets import MsDataset | ||
|
|
||
| def main(): | ||
| """ | ||
| 主函数,从 ModelScope 加载数据集,进行处理,并保存为 Parquet 文件。 | ||
| """ | ||
| parser = argparse.ArgumentParser(description="Convert Big-Math dataset from ModelScope to a verl-compatible PARQUET format.") | ||
| # 我们仍然保留 output_file 参数,以便您可以指定输出路径 | ||
| parser.add_argument("--output_file", type=str, required=True, help="Path for the output PARQUET file (e.g., train.parquet).") | ||
| args = parser.parse_args() | ||
|
|
||
| # 数据集信息 | ||
| dataset_name = 'open-r1/Big-Math-RL-Verified-Processed' | ||
| subset_name = 'all' | ||
| split = 'train' | ||
| data_source_name = "Big-Math" # 用于在数据中标记来源 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| print(f"Loading dataset '{dataset_name}' from ModelScope...") | ||
|
|
||
| # 1. 使用 MsDataset.load 直接加载数据集 | ||
| # 这一步就已经得到了一个结构化的数据集对象 | ||
| dataset = MsDataset.load(dataset_name, subset_name=subset_name, split=split) | ||
|
|
||
| print(f"Loaded {len(dataset)} records. Starting preprocessing...") | ||
|
|
||
| # 2. 定义处理函数,将原始数据格式映射到目标格式 | ||
| # 这个函数会被 .map() 方法应用到每一条记录上 | ||
| def process_fn(example, idx): | ||
| # 从原始记录中提取需要的字段 | ||
| # 注意:这里的键名 ('prompt', 'solution' 等) 需要根据您数据集的实际列名来定 | ||
| # 请根据 'open-r1/Big-Math-RL-Verified-Processed' 数据集的实际情况调整 | ||
| problem_raw = example.get("prompt", "") | ||
| answer_clean = example.get("solution", "") | ||
| domain = example.get("domain", []) | ||
| solve_rate = example.get("llama8b_solve_rate", None) | ||
|
|
||
| # 构建 prompt 内容 | ||
| instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:' | ||
| prompt_content = instruction+ " " + problem_raw | ||
|
|
||
| # 构建 reward_model 字段 | ||
| reward_model_data = { | ||
| "style": "rule", | ||
| "ground_truth": str(answer_clean) # 确保是字符串 | ||
| } | ||
|
|
||
| # 组装成最终的数据结构 | ||
| processed_data = { | ||
| "data_source": 'hiyouga/geometry3k', | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "prompt": [ | ||
| { | ||
| "role": "user", | ||
| "content": prompt_content, | ||
| } | ||
| ], | ||
| "ability": "math", | ||
| "reward_model": reward_model_data, | ||
| "extra_info": { | ||
| "index": idx, | ||
| "original_problem": problem_raw, | ||
| "domain": domain, | ||
| "llama8b_solve_rate": solve_rate, | ||
| }, | ||
| } | ||
| return processed_data | ||
|
|
||
| # 3. 使用 .map() 方法应用处理函数 | ||
| # MsDataset 的 .map() 实现通常非常稳健 | ||
| processed_dataset = dataset.map(function=process_fn, with_indices=True) | ||
|
|
||
| print("Preprocessing complete.") | ||
|
|
||
| # 确保输出目录存在 | ||
| output_dir = os.path.dirname(args.output_file) | ||
| if output_dir: | ||
| os.makedirs(output_dir, exist_ok=True) | ||
|
|
||
| # 4. 将处理好的数据集直接保存为 Parquet 文件 | ||
| print(f"Saving output to '{args.output_file}'...") | ||
| processed_dataset.to_parquet(args.output_file) | ||
| # processed_dataset.to_json(args.output_file, lines=True, force_ascii=False) | ||
|
|
||
| print("Conversion finished successfully!") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接修改
site-packages中的库文件是一种不良实践。这会使环境变得脆弱且难以复现。如果其他人尝试设置此项目,他们可能会忘记手动应用此补丁,导致行为不一致或出错。更好的方法是 forktransformers仓库,应用您的修改,然后从您的 fork 安装,或者使用.patch文件和脚本来应用更改。