Skip to content

msprof simulator on custom A5 kernel + torch_npu wrapper launch#169

Open
learning-chip wants to merge 1 commit into
166-kernel-abs-run-using-cannsimfrom
a5_msprof_sim
Open

msprof simulator on custom A5 kernel + torch_npu wrapper launch#169
learning-chip wants to merge 1 commit into
166-kernel-abs-run-using-cannsimfrom
a5_msprof_sim

Conversation

@learning-chip
Copy link
Copy Markdown
Collaborator

Reproduced in Docker image learning-chip/agent_docker_npu#8

The profiler trace correctly shows VABS instruction for this custom abs kernel.

image

@learning-chip learning-chip changed the base branch from main to 166-kernel-abs-run-using-cannsim May 18, 2026 21:15
Comment on lines +39 to +42
# msprof CA simulator models pipeline timing only, not numeric results.
print("Input X (first 16):", x.flatten()[:16].cpu())
print("Output Z (first 16):", z.flatten()[:16].cpu())
print("vabs_fp16 kernel launch completed.")
Copy link
Copy Markdown
Collaborator Author

@learning-chip learning-chip May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, accuracy test also passes in msprof simulator mode. Unlike cannsim (biprof) that only simulates cycles.

Comment on lines +17 to +19
exec msprof op simulator --soc-version=Ascend950PR_9599 \
--output="msprof_res" --kernel-name="vabs_fp16_mix_aic" --launch-count=10 \
python ./run_abs.py
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run once without --kernel-name= and a large --launch-count, then will see all full kernel names.

@learning-chip learning-chip changed the title msprof simulator on A5 kernel + torch_npu wrapper msprof simulator on custom A5 kernel + torch_npu wrapper launch May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant