显式标记 PTO entry#785
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
f945069 to
ac786ec
Compare
Codex Review该评论由 review 机器人自动更新。
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
e9d2def to
a8e29ce
Compare
6c51b7a to
6840630
Compare
6840630 to
d59e1e2
Compare
A3 板测失败
失败用例
|
A3 板测失败详情:PR #785orchestration_example_kernel_add
vector_example_dag_kernel_add_scalar
paged_attention_example_kernel_pv_matmul
paged_attention_example_kernel_init_inplace
vector_example_dag_kernel_add
paged_attention_example_kernel_online_update
paged_attention_example_kernel_softmax_prepare
orchestration_example_kernel_add_scalar
paged_attention_example_kernel_qk_matmul
orchestration_example_kernel_mul
vector_example_dag_kernel_mul
prelu
plan_memory_bind_tile_alias_liveness
plan_memory_peak_exact_capacity
plan_memory_loop_no_reuse_outer_live
plan_memory_if_yield
plan_memory_loop_in_if
plan_memory_peak_8_overlapping
plan_memory_if_in_loop
plan_memory_fragmentation_hole_fit
plan_memory_for_iter_args_yield
plan_memory_no_reuse_overlap
plan_memory_reuse_sequential
plan_memory_nested_loops
plan_memory_fragmentation_two_holes
rems
partition_view_verify_rank_mismatch_valid
partition_view_verify_valid
partition5d_dynamic
partition5d
sparse_attn_test_incore_7
decode_hca_test_incore_54
attention_swa_test_incore_40
decode_swa_test_incore_40
decode_csa_test_incore_81
attention_hca_test_incore_54
attention_csa_test_refresh_incore_81
tensor_view_layout_dn
rope_kv_cache
qwen3_decode_incore_4
post_rmsnorm
qwen3_decode_incore_1
qwen3_decode_incore_10
qwen3_decode_incore_11
rmsnorm
qwen3_decode_incore_6
qwen3_decode_incore_2
qwen3_decode_incore_7
qwen3_decode_incore_5
qwen3_decode_incore_12
test_barrier_sync
matmul
add_double_dynamic
nested_loop_confliect
rar_optimization_test
test_dynamic_valid_shape
test_auto_sync_tail_hint
compensation_test
rem
|
背景
当前 EmitC entry 判断曾从单函数模块、
pto.kernel_kind等信息推断入口,导致 issue 780 这类输入在用户没有显式声明 entry 时也会生成extern "C" __global__。这会让 IR 形态和入口选择变得不可控。后续确认语义后,本 PR 同时统一 EmitC/VPTO 历史命名:
pto.entry、hacc.entry、pto.kernel、pto.aicore都表示显式 PTO kernel entry;pto.kernel_kind只是 vector/cube kind 元数据,不能隐含 entry。修改
isPTOEntryFunction只认显式 entry alias,不再从单函数模块或pto.kernel_kind推断入口。isPTOKernelFunction/hasPTOKernelAttr公共 helper,VPTO、driver、host stub 统一使用 entry helper,避免 kernel/entry 两套接口继续分叉。annotatePTOEntryFunctions不再物化隐式 entry,只清理内部标记。pto.aicore,不再重复写pto.entry。func.FuncOp生成的 launchable kernel 显式补 entry,保持旧 sample/runop 行为。验证
cmake --build build-simt-lowlevel --target ptoas -j 8/home/mouliangyu/projects/github.com/llvm/llvm-project/build/bin/llvm-lit -v build-simt-lowlevel/test/lit/pto/pto_entry_multifunc.pto build-simt-lowlevel/test/lit/pto/issue780_kernel_kind_not_entry.pto build-simt-lowlevel/test/lit/pto/pto_entry_single_func_default.pto:3 个通过/home/mouliangyu/projects/github.com/llvm/llvm-project/build/bin/llvm-lit -q build-simt-lowlevel/test/lit:548 个通过PYTHONPATH="$PWD/tilelang-dsl/python:$PYTHONPATH" python3 -m unittest discover -s tilelang-dsl/tests -p 'test_*.py':339 个通过WORK_SPACE="$PWD/.work/vpto-smoke" PTOAS_BIN="$PWD/build-simt-lowlevel/tools/ptoas/ptoas" CASE_NAME="micro-op/binary-vector/vadd" DEVICE=SIM test/vpto/scripts/run_host_vpto_validation.sh:通过,compare passedPTOAS_BIN="$PWD/build-simt-lowlevel/tools/ptoas/ptoas" python3 test/tilelang_st/script/run_all_st.py -r sim -v a5 -t tadd --smoke -j 1 --without-build:通过,passed=1 failed=0PTOAS_BIN="$PWD/build-simt-lowlevel/tools/ptoas/ptoas" test/samples/runop.sh -t Abs+test/npu_validation/scripts/generate_testcase.py+dav_2201camodel compare:通过git diff --check:通过已知无关失败
现有全量
ctest --test-dir build-simt-lowlevel --output-on-failure仍有 ptobc 旧失败:ptobc_stage9_e2e、ptobc_recent_ops_v0_encode、ptobc_tscatter_maskpattern_v0_encode。失败类型与本次 entry/EmitC/VPTO alias 改动无关。