Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 48 additions & 3 deletions wa/instruments/perf.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,17 @@ class PerfInstrument(Instrument):
description="""Provides labels for pref output. If specified, the number of
labels must match the number of ``optionstring``\ s.
"""),
Parameter('perf_mode', kind=str, default='stat', allowed_values=['stat', 'record'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter can be renamed to just "mode", given that it is for perf instrument, calling it perf_mode is redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter can be renamed to just "mode"

... or (sub)command, given that it's how perf itself calls these?

$ perf help

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   [...]

 See 'perf help COMMAND' for more information on a specific command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

global_alias='perf_mode',
description="""
Choose between 'perf stat' and 'perf record'. If 'perf record' is selected
'perf report' will also be run. 'report_optionstring' can be used to generate
custom report from the trace.
"""),
Parameter('report_optionstring', kind=list_or_string, default='',
global_alias='report_perf_options',
description="""Specifies options to be used for the 'perf report' command used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format consistently with the other parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

with 'perf_mode=record'. """),
Parameter('force_install', kind=bool, default=False,
description="""
always install perf binary even if perf is already present on the device.
Expand All @@ -92,6 +103,8 @@ def initialize(self, context):
self.events,
self.optionstring,
self.labels,
self.perf_mode,
self.report_optionstring,
self.force_install)

def setup(self, context):
Expand All @@ -103,11 +116,10 @@ def start(self, context):
def stop(self, context):
self.collector.stop()

def update_output(self, context):
self.logger.info('Extracting reports from target...')
def _update_output_stat(self, context):
outdir = os.path.join(context.output_directory, 'perf')
self.collector.get_trace(outdir)

self.logger.info('Processing perf stat reports.')
for host_file in os.listdir(outdir):
label = host_file.split('.out')[0]
host_file_path = os.path.join(outdir, host_file)
Expand All @@ -134,5 +146,38 @@ def update_output(self, context):
metric = '{}_{}'.format(label, match.group(3))
context.add_metric(metric, count, classifiers=classifiers)

def _update_output_record(self, context):
outdir = os.path.join(context.output_directory, 'perf')
self.collector.get_trace(outdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the above two lines happen in both _stat and record cases, they should be moved out into update_output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.


self.logger.info('Processing perf report reports.')
for host_file in os.listdir(outdir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently attempting to process every file in the perf directory, since we are now pulling the raw trace data to this directory this fails trying to parse binary data. Should this be updated to only process .rpt files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

label = host_file.split('.rpt')[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use os.path.splitext instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

host_file_path = os.path.join(outdir, host_file)
context.add_artifact(label, host_file_path, 'raw')
with open(host_file_path) as fh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parser is extremely specific. I understand we don't want to spend the effort writing a perfectly generic parser for perf report that works in all cases but this implementation is definitely too narrow. As an example, running on my host machine:

  • Following sudo perf record -a sleep 1:

    $ sudo perf report 2>/dev/null | head -n 20
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 1K of event 'cycles:ppp'
    # Event count (approx.): 218123588
    #
    # Overhead  Command          Shared Object               Symbol                                           
    # ........  ...............  ..........................  ............................
    #
        41.86%  swapper          [kernel.kallsyms]           [k] intel_idle
         2.06%  perf             [kernel.kallsyms]           [k] smp_call_function_single
         0.92%  Chrome_ChildIOT  libpthread-2.27.so          [.] __pthread_mutex_lock
         0.69%  swapper          [kernel.kallsyms]           [k] calc_global_load
         0.52%  swapper          [kernel.kallsyms]           [k] ioread32
         0.49%  CompositorTileW  libc-2.27.so                [.] __memset_avx2_erms
         0.41%  IPDL Background  [kernel.kallsyms]           [k] cpuacct_charge
         0.41%  IPDL Background  [vdso]                      [.] __vdso_clock_gettime
         0.41%  IPDL Background  libxul.so                   [.] 0x0000000000879a17
    
    $ sudo perf report --sort=dso 2>/dev/null | head -n 20
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 1K of event 'cycles:ppp'
    # Event count (approx.): 218123588
    #
    # Overhead  Shared Object
    # ........  ..........................
    #
        67.43%  [kernel.kallsyms]
        10.79%  libxul.so
        10.40%  slack
         2.29%  libpthread-2.27.so
         1.67%  libc-2.27.so
         1.47%  firefox
         1.15%  libglib-2.0.so.0.5600.3
         0.84%  nouveau_dri.so
         0.60%  [vdso]
    
    $ sudo perf report --sort=cpu 2>/dev/null | head -n 20
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 1K of event 'cycles:ppp'
    # Event count (approx.): 218123588
    #
    # Overhead  CPU
    # ........  ...
    #
        15.64%  006
         9.37%  003
         8.75%  011
         8.75%  000
         7.14%  009
         7.07%  007
         5.44%  004
         5.29%  015
         5.02%  012
    
  • Following sudo perf record -g sleep 1:

    $ sudo perf report -g 2>/dev/null | head -n 30
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 14  of event 'cycles:ppp'
    # Event count (approx.): 1311183
    #
    # Children      Self  Command  Shared Object      Symbol
    # ........  ........  .......  .................  ..................................
    #
        32.80%     0.00%  sleep    [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
                |
                ---entry_SYSCALL_64_after_hwframe
                   do_syscall_64
                   |
                   |--22.57%--sys_exit_group
                   |          do_group_exit
                   |          do_exit
                   |          mmput
                   |          exit_mmap
                   |          tlb_finish_mmu
                   |          arch_tlb_finish_mmu
                   |          tlb_flush_mmu_free
                   |          free_pages_and_swap_cache
                   |          release_pages
                   |
                    --10.23%--sys_execve
                              do_execveat_common.isra.34
                              search_binary_handler
    

Obviously, I don't expect this parser to handle all outputs of perf report (e.g. --call-graph, --children, --header, ...) but handling the extremely useful --sort would be a nice start; I believe that parsing the table(s) of samples into pandas.DataFrame(s) would make the following logic easier to work with. I actually have implemented such a parser (currently internal) which I could integrate here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a first implementation to have it parse the basic output. As you say handling all the output options is difficult. If I update this to handle new report formats I'll likely see about committing them but I am not planning to write a more generic/complete one. If you have one that could be integrate here that would be very good.

for line in fh:
words = line.split()
if len(words) >= 1:
if words[0] == '#' and len(words) == 6:
if words[4] == 'event':
event_type = words[5]
event_type = event_type.strip("'")
if words[0] != '#' and '%' in words[0] and len(words) == 5:
metric = 'perf/{}/{}/{}'.format(event_type, words[1], words[2].strip("[]"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using classifiers rather than packing everything into the metric name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the classifiers but also left it packed in the metric name. It seems that some of the output processors, e.g. csv, default to classifier output off and that will result to a lot of meaningless/duplicate entries in the results file by default options. Which doesn't seem likely friendly behavior.

I am very open to peoples thoughts on what is the right behavior here thou.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought today that change the metric to / which should be unique key and most critical info. Then use classifier for the other entries.

temp = words[0]
count = temp.strip('%')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp not necessary here. Also, a metric value should be a number, not string.

count = numeric(words[0].strip('%'))

(note: numeric must be imported from wa.utils.types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version. (I initially thought about doing this but expected it to be written out as a string again so didn't seem necessary.)

context.add_metric(metric, count, '%')

def update_output(self, context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should go above internal methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.

self.logger.info('Extracting reports from target...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the logging inside the _stat and _output methods, this is redundant. Please keep info-level logging to a minimum. Should be converted to debug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in latest version.


# Extract data for 'perf stat'
if self.perf_mode == 'stat':
self._update_output_stat(context)
# Extract data for 'perf record'
elif self.perf_mode == 'record':
self._update_output_record(context)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# else: fail silently

You could use a dictionary of functions (method-local dictionary of instance methods, class-attribute dictionary of class methods, ... your call), accessed with self.perf_mode as a key: if it's not present, a KeyError (IIRC) is raised. Furthermore, update_output becomes "subcommand-agnostic". This would even (depending on the implementation) allow assert-based unit testing by comparing the allowed_values of perf_mode and the keys() of this dictionary, allowing to programmatically express the link between the supported perf subcommands and the need for an implemented _update_output_<...> method, currently implicit. Finally, it's more pythonic than this switch-like use of if-elif-elif-... .

(I know that self.perf_mode has been sanitized but yet, I believe there's value in this change).

def teardown(self, context):
self.collector.reset()