Skip to content

Placement of functions in application binary significantly affects performance #324

@j-piecuch

Description

@j-piecuch

We found that the placement of functions in the binary can significantly affect the performance due to instruction cache misses.
We used the Arty board for testing, with the slim+cfu CPU variant (2KiB direct-mapped icache, line size 32B).

As an example, in the hps_accel project, we can force instruction cache misses in the ConvPerChannel4x4() function by putting the LoadInput() function at an address where it would map to the same cache set as the place in ConvPerChannel4x4() where it is called. This can be accomplished by modifying the common/ld/linker.ld linker script as follows:

    .text :
    {
        _ftext = .;
        *(.text.start)
        *(.text.*ConvPerChannel4x4*)
        . = ALIGN(2048);
        . = . + 0x220;
        *(.text.*LoadInput*)
        *(.text .stub .text.* .gnu.linkonce.t.*)
        _etext = .;
    } > main_ram

With this change, running the person detection model on cat input takes 203M cycles.

On the other hand, we can prevent cache misses by putting the two functions right next to each other, like so:

    .text :
    {
        _ftext = .;
        *(.text.start)
        *(.text.*ConvPerChannel4x4*)
        *(.text.*LoadInput*)
        *(.text .stub .text.* .gnu.linkonce.t.*)
        _etext = .;
    } > main_ram

With this change, the same model on the same input takes 188M cycles.

Note that on the Arty, the application code is in RAM, so fetching a cache line is less costly compared to fetching it from flash. On HPS hardware, the code is in flash, so the performance difference would likely be significantly larger.

The obvious way to prevent the linker from causing cache conflicts is to modify the linker script, as shown by the second example. This modification should probably be project-specific, which can be done by making the LDSCRIPT variable overridable, and overriding it in specific projects. PR #323 does this for the hps_accel project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions