Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default #1228

bradgrantham-lunarg · 2023-08-16T22:40:11Z

If the option GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES is not set, is it really possible to get corruption? Or has PageGuard tracking improved so that we don't have missing updates for overlapping allocations? I don't know enough about the implementation to say. If so, shouldn't we make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES the default?

The text was updated successfully, but these errors were encountered:

panos-lunarg · 2023-08-21T16:08:19Z

is it really possible to get corruption?

Yes.
Imagine you have two buffers, let's say one is an mvp matrix that you update each frame from the host.
The other one is a single u32 value that the get updated by the gpu and the hosts reads it every frame.
Lets say the application binds those two buffer next to each other in the same device memory. One is 64 bytes and the other one is 4 bytes. Since GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES is false, depending on the driver, the buffers can actually be bound right next to each other, without any space between them. First you update the mvp and then you read the u32. That read will actually memcpy a whole page from the mapped memory overwriting the mvp.

Setting GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES to true will "fool" the application that each buffer needs a whole page of memory so the two buffers will have enough space between them.

Or has PageGuard tracking improved so that we don't have missing updates for overlapping allocations?

Page guard sees mappings and is not aware what's inside those mappings. So I don't think it's going to be some straightforward change to make page guard aware of that.

I guess the logic behind having GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES false by default is so that the application's commands are recorded with as little modification as possible.

I understand the default false is a source of problems with the CI which leads to a lot of wasted time. I agree with setting this to true by default.

davidlunarg · 2023-08-21T16:44:11Z

If an app creates a lot of small buffers, having a default value of true for GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES would result in more buffer memory usage. If an app creates lots of small buffers, each one would be a full page in size, whereas if it were set to false the buffers would all be compacted together in adjacent memory with many on the same page. Setting it to false results in less buffer memory usage. But setting it to true results in correct captures. I think correct is more important than efficient.

Is it the case that a more descriptive name for this variable might be GFXRECON_PAGE_GUARD_ONE_BUFFER_PER PAGE? (Not that I'm proposing a name change, just trying to clarify in my mind how the variable affects behavior.)

panos-lunarg · 2023-08-22T07:17:31Z

No I think GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES is the right name. When true the adjusting is done when a vkGet(Device)[Buffer|Image]MemoryRequirements(2) is called and it adjusts the size and alignment reported by these functions by rounding them up to the next multiple of the system's page size.

For example:

void PostProcess_vkGetBufferMemoryRequirements(VkDevice              device,
                                                   VkBuffer              buffer,
                                                   VkMemoryRequirements* pMemoryRequirements)
    {
        GFXRECON_UNREFERENCED_PARAMETER(device);
        GFXRECON_UNREFERENCED_PARAMETER(buffer);

        if ((GetMemoryTrackingMode() == CaptureSettings::MemoryTrackingMode::kPageGuard) &&
            GetPageGuardAlignBufferSizes() && (pMemoryRequirements != nullptr))
        {
            util::PageGuardManager* manager = util::PageGuardManager::Get();
            assert(manager != nullptr);

            GFXRECON_CHECK_CONVERSION_DATA_LOSS(size_t, pMemoryRequirements->size);
            GFXRECON_CHECK_CONVERSION_DATA_LOSS(size_t, pMemoryRequirements->alignment);

            pMemoryRequirements->size = manager->GetAlignedSize(static_cast<size_t>(pMemoryRequirements->size));
            pMemoryRequirements->alignment =
                manager->GetAlignedSize(static_cast<size_t>(pMemoryRequirements->alignment));
        }
    }

per-mathisen-arm · 2023-08-22T09:14:46Z

FWIW, I support having this on by default. I would propose naming it GFXRECON_PAGE_GUARD_OBJECT_PAGE_ALIGNMENT though, since it isn't just about buffers and the point is the page alignment, not the padding in itself. And do you really need to modify the size parameter as well as the alignment parameter?

bradgrantham-lunarg · 2023-08-24T16:48:18Z

After thinking about it, I'm a little concerned that making ALIGN_BUFFER_SIZES the default will hide the application's original intent, which was to pack buffers tightly. Being able to capture that in the file encodes the application's intent, and so if the application's behavior precipitates a driver or hardware bug then the capture file won't be able to reproduce it and someone analyzing the capture file won't see it.

That read will actually memcpy a whole page from the mapped memory overwriting the mvp.

@panos-lunarg Do you think we could work around that somehow? Like mark buffers that share pages so that we repair the other buffers within the page after reads or some such?

panos-lunarg · 2023-08-25T11:00:54Z

The logic behind page_guard_manager (whether it uses mprotect or uffd) does not know about what is bound to a memory. It only cares about which portion of that memory is mapped.

I think it can be modified to handle that but it's not going to be a straightforward/simple change.

bradgrantham-lunarg · 2023-08-25T22:00:06Z

If we don't set it to be the default, we should potentially set it in CI more frequently than we do.

andrew-lunarg · 2023-11-10T00:38:05Z

I would say we should make it the default and perhaps add a little documentation section on tweaking for fidelity for advanced users. I keep coming back to the first experience of a new user. As developers of the tool its hard to appreciate the user experience of trying us out and having a bunch of arcane environment variables and flags to twiddle to get going. For that person, they have no idea whether we are just completely broken: they won't have the confidence that one of these tweaks is going to work. Let's get them up and running and then they can dial in settings for specific needs like ultimate fidelity later.

That docs section:

Guide to Maximum Fidelity in Capture and Replay

Warning, this section covers features for advanced users with very specific needs. These features may make it much harder to achieve a working capture and may require considerable tweaking and you may ultimately find that a correct capture is impossible with them due to the usage of the target API by the application to be captured. ... explanation ensues

bradgrantham-lunarg · 2024-03-06T17:45:02Z

Fixed by #1431

bradgrantham-lunarg · 2024-08-09T16:07:57Z

A note. When an application uses VkMemoryDedicatedAllocateInfo and the developer captures with GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES, replay with --validate will trigger VUID-VkMemoryDedicatedAllocateInfo-image-02964.

During capture with GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES , we pad the size returned from vkGetMemoryRequirements to a page, so the application allocates this altered size. If VVL is run in capture "upchain" of GFXR (before GFXR in the layer list), then as far as VVL can tell, the implementation returned a size and the application allocated that size, so it honored 02964.

But then in replay with --validate, VVL will see the implementation's original size and the captured altered allocation size and trigger 02964.

However, because the application did not allocate the implementation's stated size the app technically used Vulkan in an invalid way in capture but only because GFXR capture told it to.

If the application could be captured successfully with no rendering errors or crash, then this VVL message is probably harmless.

bradgrantham-lunarg added the P1 Prevents an important capture from being replayed label Oct 31, 2023

bradgrantham-lunarg changed the title ~~Should GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES be the default?~~ Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default Feb 29, 2024

bradgrantham-lunarg assigned nickdriscoll-lunarg Feb 29, 2024

nickdriscoll-lunarg mentioned this issue Feb 29, 2024

Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default #1431

Merged

bradgrantham-lunarg closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default #1228

Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default #1228

bradgrantham-lunarg commented Aug 16, 2023

panos-lunarg commented Aug 21, 2023

davidlunarg commented Aug 21, 2023

panos-lunarg commented Aug 22, 2023

per-mathisen-arm commented Aug 22, 2023

bradgrantham-lunarg commented Aug 24, 2023

panos-lunarg commented Aug 25, 2023

bradgrantham-lunarg commented Aug 25, 2023

andrew-lunarg commented Nov 10, 2023

bradgrantham-lunarg commented Mar 6, 2024

bradgrantham-lunarg commented Aug 9, 2024

Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default #1228

Make GFXRECON_PAGE_GUARD_ALIGN_BUFFER_SIZES true by default #1228

Comments

bradgrantham-lunarg commented Aug 16, 2023

panos-lunarg commented Aug 21, 2023

davidlunarg commented Aug 21, 2023

panos-lunarg commented Aug 22, 2023

per-mathisen-arm commented Aug 22, 2023

bradgrantham-lunarg commented Aug 24, 2023

panos-lunarg commented Aug 25, 2023

bradgrantham-lunarg commented Aug 25, 2023

andrew-lunarg commented Nov 10, 2023

Guide to Maximum Fidelity in Capture and Replay

bradgrantham-lunarg commented Mar 6, 2024

bradgrantham-lunarg commented Aug 9, 2024