Skip to content

Feat/low bit ep#1336

Draft
JiaoliangYu wants to merge 13 commits into
ROCm:jly/low-bit-epfrom
JiaoliangYu:feat/mori-fp8-dispatch-clean
Draft

Feat/low bit ep#1336
JiaoliangYu wants to merge 13 commits into
ROCm:jly/low-bit-epfrom
JiaoliangYu:feat/mori-fp8-dispatch-clean

Conversation

@JiaoliangYu

Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

JiaoliangYu and others added 12 commits June 17, 2026 15:33
Pin each worker to a contiguous core range keyed on its rank at the very
top of AsyncIOProc.__init__, before any large allocation, so Linux
first-touch also places memory on the local NUMA node. Gated by
ATOM_CPU_AFFINITY so baseline vs pinned A/B needs no code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add amd-smi performance-determinism lock before the benchmark/server runs
and an always() unlock to AUTO before container teardown, in both the main
benchmark job and the regression-rerun job. The lock is driver-level and
persists across jobs on the bare-metal runner, so the unlock must run even
on failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@JiaoliangYu JiaoliangYu force-pushed the feat/mori-fp8-dispatch-clean branch from e4c971c to 2a57892 Compare June 24, 2026 06:35
@JiaoliangYu JiaoliangYu force-pushed the feat/mori-fp8-dispatch-clean branch from 2a57892 to 11a717b Compare June 24, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant