Skip to content

micro_perf手动修改代码开启profiler后,延迟结果在部分场景下统计错误 #189

Description

@testman0001

复现方式:micro_perf测试moe_scatter_dynamic_quant
对应代码位置:https://github.com/bytedance/xpu-perf/blob/main/micro_perf/backends/GPU/backend_gpu.py#L178

                take_iters = prefer_iterations // 2
                iters_offset = prefer_iterations - take_iters

                removed_keys = []
                for kernel in kernel_latency_list:
                    if len(kernel_latency_list[kernel]) != prefer_iterations:
                        removed_keys.append(kernel)
                    average_latency += sum(kernel_latency_list[kernel][iters_offset:])
                for kernel in removed_keys:
                    kernel_latency_list.pop(kernel)

                average_latency /= take_iters

问题一:
这里本意可能是跳过部分数据,但是对于一次迭代有多个kernel调用的场景错误地统计了延迟,仅仅跳过了iters_offset个,按代码逻辑正常应该是跳过一半;另外疑惑的是就算实现正确了,是否应该根据warmup来判断要不要舍弃。

问题二:

if len(kernel_latency_list[kernel]) != prefer_iterations:
removed_keys.append(kernel)

这里对于一次迭代有多个kernel调用的场景是否不应该算作removed_keys,而是认为是正常现象,否则kernels打印获取不到内容

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions