-
Notifications
You must be signed in to change notification settings - Fork 50
create_perf_json: specify cpu type in TSC metric #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Replace "TSC" with "msr/tsc" or "msr/tsc,cpu=<cpu type>" to specify global or cpu type specific TSC metric.
|
||
for m in metrics: | ||
form = m['MetricExpr'] | ||
if "TSC" in form: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me but we've normally done similar fixing up here:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L1405
With the patch series:
https://lore.kernel.org/lkml/[email protected]/
has this resolved the metric validation test issues on hybrid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I tested the generated alderlake json with this change and:
https://lore.kernel.org/lkml/[email protected]/
and:
https://lore.kernel.org/lkml/[email protected]/
plus this fix:
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -92,7 +92,7 @@ int topdown_insert_slots_event(struct list_head *list, int idx, struct evsel *me
evsel->core.attr.config = TOPDOWN_SLOTS;
evsel->core.cpus = perf_cpu_map__get(metric_event->core.cpus);
- evsel->core.own_cpus = perf_cpu_map__get(metric_event->core.own_cpus);
+ evsel->core.pmu_cpus = perf_cpu_map__get(metric_event->core.pmu_cpus);
evsel->core.is_pmu_core = true;
evsel->pmu = metric_event->pmu;
evsel->name = strdup("slots");
and the tma_l2_latency is fixed if I add --no-scale
:
$ sudo /tmp/perf/perf stat --no-scale -M tma_l2_hit_latency -- /tmp/perf/perf bench futex hash -r 2 -s
# Running 'futex/hash' benchmark:
Run summary [PID 1599693]: 28 threads, each operating on 1024 [private] futexes for 2 secs.
Averaged 2247423 operations/sec (+- 0.79%), total secs = 2
Futex hashing: global hash
Performance counter stats for '/tmp/perf/perf bench futex hash -r 2 -s':
7,505,544,206 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ # 41.4 % tma_l2_hit_latency (42.83%)
331,067,557,224 cpu_core/slots/ (49.94%)
177,936,737,645 cpu_core/topdown-retiring/ (49.94%)
15,671,596,981 cpu_core/topdown-mem-bound/ (49.94%)
5,360,676,349 cpu_core/topdown-bad-spec/ (49.94%)
118,705,368,744 cpu_core/topdown-fe-bound/ (49.94%)
28,820,278,709 cpu_core/topdown-be-bound/ (49.94%)
8,073,335,211 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ (49.88%)
67,113,967,663 msr/tsc,cpu=cpu_core/ (3.57%)
154,135,358 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (42.79%)
110,196,035,974 cpu_core/CPU_CLK_UNHALTED.THREAD/ (49.92%)
177,643,036 cpu_core/MEM_LOAD_RETIRED.L2_HIT/ (49.94%)
116,204,655 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (42.82%)
58,612,990,535 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (49.93%)
2,031,139,007 duration_time
2.030949462 seconds time elapsed
10.485634000 seconds user
45.198350000 seconds sys
it is also possible to constrain the events to just be on the p-core like:
$ sudo /tmp/perf/perf stat --no-scale -M tma_l2_hit_latency -- taskset -c `cat /sys/bus/event_source/devices/cpu_core/cpus` /tmp/perf/perf bench futex hash -r 2 -s
# Running 'futex/hash' benchmark:
Run summary [PID 1600141]: 28 threads, each operating on 1024 [private] futexes for 2 secs.
Averaged 2253275 operations/sec (+- 0.83%), total secs = 2
Futex hashing: global hash
Performance counter stats for 'taskset -c 0-15 /tmp/perf/perf bench futex hash -r 2 -s':
7,548,932,450 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ # 45.7 % tma_l2_hit_latency (42.80%)
331,704,741,522 cpu_core/slots/ (49.96%)
176,789,039,556 cpu_core/topdown-retiring/ (49.96%)
16,859,212,234 cpu_core/topdown-mem-bound/ (49.96%)
6,041,092,894 cpu_core/topdown-bad-spec/ (49.96%)
117,700,385,076 cpu_core/topdown-fe-bound/ (49.96%)
30,614,367,237 cpu_core/topdown-be-bound/ (49.96%)
8,090,931,352 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ (49.98%)
67,257,502,068 msr/tsc,cpu=cpu_core/ (3.57%)
172,221,109 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (42.90%)
110,525,548,606 cpu_core/CPU_CLK_UNHALTED.THREAD/ (49.99%)
198,217,890 cpu_core/MEM_LOAD_RETIRED.L2_HIT/ (49.97%)
124,717,379 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (42.75%)
58,695,684,575 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (49.91%)
2,030,443,163 duration_time
2.030268760 seconds time elapsed
10.501135000 seconds user
45.270940000 seconds sys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace "TSC" with "msr/tsc" or "msr/tsc,cpu=" to specify global or cpu type specific TSC metric.