test: refresh qwen3 decode pypto cases#683
Conversation
|
Note The number of changes in this pull request is too large for Gemini Code Assist to generate a review. |
Codex Review该评论由 review 机器人自动更新。
SummaryPR #683 中刷新的 Qwen3 decode golden 与现有验证 harness 的标量/launch 合约不匹配,会导致多组 sample 在 golden 生成或板端校验阶段失败。 Findings
多处 refreshed builder 仍从 |
|
/run a3 down_proj_residual,out_proj_residual,post_rmsnorm,qwen3_decode_incore_0,qwen3_decode_incore_1,qwen3_decode_incore_2,qwen3_decode_incore_3,qwen3_decode_incore_4,qwen3_decode_incore_5,qwen3_decode_incore_6,qwen3_decode_incore_7,qwen3_decode_incore_8,qwen3_decode_incore_9,qwen3_decode_incore_10,qwen3_decode_incore_11,qwen3_decode_incore_12,qwen3_decode_incore_13,qwen3_decode_incore_14,qwen3_decode_incore_15,qwen3_decode_incore_16,rmsnorm,rope_kv_cache --pto-level=level3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测成功
|
|
/run a5 down_proj_residual,out_proj_residual,post_rmsnorm,qwen3_decode_incore_0,qwen3_decode_incore_1,qwen3_decode_incore_2,qwen3_decode_incore_3,qwen3_decode_incore_4,qwen3_decode_incore_5,qwen3_decode_incore_6,qwen3_decode_incore_7,qwen3_decode_incore_8,qwen3_decode_incore_9,qwen3_decode_incore_10,qwen3_decode_incore_11,qwen3_decode_incore_12,qwen3_decode_incore_13,qwen3_decode_incore_14,qwen3_decode_incore_15,qwen3_decode_incore_16,rmsnorm,rope_kv_cache --pto-level=level3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测成功
|
|
/run a3 down_proj_residual,out_proj_residual,post_rmsnorm,qwen3_decode_incore_0,qwen3_decode_incore_1,qwen3_decode_incore_2,qwen3_decode_incore_3,qwen3_decode_incore_4,qwen3_decode_incore_5,qwen3_decode_incore_6,qwen3_decode_incore_7,qwen3_decode_incore_8,qwen3_decode_incore_9,qwen3_decode_incore_10,qwen3_decode_incore_11,qwen3_decode_incore_12,qwen3_decode_incore_13,qwen3_decode_incore_14,qwen3_decode_incore_15,qwen3_decode_incore_16,rmsnorm,rope_kv_cache --pto-level=level3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测成功
|
|
/run a5 down_proj_residual,out_proj_residual,post_rmsnorm,qwen3_decode_incore_0,qwen3_decode_incore_1,qwen3_decode_incore_2,qwen3_decode_incore_3,qwen3_decode_incore_4,qwen3_decode_incore_5,qwen3_decode_incore_6,qwen3_decode_incore_7,qwen3_decode_incore_8,qwen3_decode_incore_9,qwen3_decode_incore_10,qwen3_decode_incore_11,qwen3_decode_incore_12,qwen3_decode_incore_13,qwen3_decode_incore_14,qwen3_decode_incore_15,qwen3_decode_incore_16,rmsnorm,rope_kv_cache --pto-level=level3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测成功
|
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryQwen3DecodeA5 residual golden helpers still assume an A3-style fifth GM buffer, so both refreshed A5 residual validation cases fail at runtime. Findings
|
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryAIV goldens miss the second subblock in the new residual cases. Findings
|
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
Summary发现 2 个 P1:A3/A5 的 Findings
|
|
/review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fd62307f6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| exp_scores = np.exp(scores - cur_mi) | ||
| exp_scores_bf16 = float32_to_bf16(exp_scores) | ||
| cur_li = np.sum(bf16_to_float32(exp_scores_bf16), axis=1, keepdims=True) | ||
| out_exp = _store_group_scores(out_exp, exp_scores_bf16, group_index=gi, sb=sb, rows_per_group=Q_HEAD_BATCH, cols=SEQ_TILE) |
There was a problem hiding this comment.
Use padded row stride when storing exp scores
build_softmax stores exp_scores_bf16 into out_exp with rows_per_group=Q_HEAD_BATCH (8), but the kernel layout for all_exp_padded is padded to 16 rows per group (Q_HEAD_PAD) and computes offsets with sb * 16 for each context block. This packs scores too tightly, so values for the second head group (and all later blocks) land at the wrong rows, producing incorrect golden data for qwen3_decode_incore_5 (same issue is mirrored in the A5 golden lib).
Useful? React with 👍 / 👎.
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryPR #683 has a P1 golden-layout bug in the refreshed Qwen3 softmax case. Findings
|
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
|
/run a5 down_proj_residual out_proj_residual post_rmsnorm qwen3_decode_incore_0 qwen3_decode_incore_1 qwen3_decode_incore_2 qwen3_decode_incore_3 qwen3_decode_incore_4 qwen3_decode_incore_5 qwen3_decode_incore_6 qwen3_decode_incore_7 qwen3_decode_incore_8 qwen3_decode_incore_9 qwen3_decode_incore_10 qwen3_decode_incore_11 qwen3_decode_incore_12 qwen3_decode_incore_13 qwen3_decode_incore_14 qwen3_decode_incore_15 qwen3_decode_incore_16 rmsnorm rope_kv_cache --pto-level=level3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测成功
|
Manual Codex Review该评论由
Summary未检查到 PR #683 存在问题 FindingsNo issues found. |
39cd66e to
e77168a
Compare
A5 板测成功
|
A3 板测失败
失败用例
|
A3 板测失败详情:PR #683rope_kv_cache
down_proj_residual
out_proj_residual
qwen3_decode_incore_5
|
Summary
test/samples/Qwen3DecodeA3andtest/samples/Qwen3DecodeA5from the latestpypto-lib/models/qwen3/32b/qwen3_32b_decode.pyoutput (05127d2)qwen3_decode_incore_*layout with the current 14-fragment raw PTO topology (rmsnorm,rope_kv_cache,out_proj_residual,post_rmsnorm,down_proj_residual, plus the remainingqwen3_decode_incore_*kernels)generate_testcase.pyValidation
PTOAS_BIN=/Users/laoda/pto/PTOAS-qwen3-refresh/build/tools/ptoas/ptoas PTOAS_OUT_DIR=/private/tmp/qwen3_refresh_runop_a3 bash test/samples/runop.sh -t Qwen3DecodeA3PTOAS_BIN=/Users/laoda/pto/PTOAS-qwen3-refresh/build/tools/ptoas/ptoas PTOAS_OUT_DIR=/private/tmp/qwen3_refresh_runop_a5 bash test/samples/runop.sh -t Qwen3DecodeA5Notes