Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pipeline cache data handling issue #1414

Merged
merged 8 commits into from
Mar 15, 2024

Conversation

mizhen
Copy link
Contributor

@mizhen mizhen commented Jan 29, 2024

The problem

Some title proceeds pipeline creation call with pipeline cache object. Foe such title, playback may experience performance hit or failure if the pipeline cache object state is different between capture and playback.

For example, during capture time, pipeline creation command PCC-a was called with a pipeline cache object PCO-a, therefore a compile operation CO-a is not needed because the pipeline cache data can be used for it. During playback time, if the state of the pipeline cache object PCO-a was fully restored, the compile operation CO-a should be also not needed. But there are some cases which make the playback time state of the pipeline cache object PCO-a is different with capture time due to pipeline cache object missing related data, such difference may trigger compile operation CO-a during playback time.

For all following cases, playback may experience performance hit due to extra compile operation in pipeline creation or fail if some pipeline creation command has VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT in create info.

  1. Bug with --opcd option (even with same driver and platform)

    If user start playback with --opcd option, all pipeline cache data will be omitted, so the pipeline cache state will be different between capture and playback.

  2. Bug with trim handling for pipeline cache data (even with same driver and platform)

    In current trim handling, the original initial cache data is used to restore the pipeline cache object. Pipeline cache contents can be modified by some commands such as pipeline creation or pipeline cache merging, so the state of pipeline cache object is not same with the state when it was initially created. So, the handling may cause restored pipeline cache object state is not same with the state at trim starting point.

  3. Unsupported case for driver or platform change

    Current vkCreatePipelineCache handling directly use capture time pipeline cache data during playback time. Capture time pipeline cache data might be incompatible with replay time because of driver version or platform change, for such case, driver will ignore the incompatible cache data and it make the pipeline cache object state is not same.

The solution

The pull request tracks the pipeline cache data queried from vkGetPipelineCacheData and try to find the corresponding replay-time cache data, use it in vkCreatePipelineCache or clear VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT if it cannot find corresponding replay time data. The pull request also clears the VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT If user start playback with --opcd option.

Result
Tested target title full trace and trim trace capture/playback with the build of the pull request, also tested playback user provided previously failed trace files, all tests passed, the commit fixed tVK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT if it cannot find corresponding replay time data. The pull request also clears the VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT If user start playback with --opcd option.

Result
Tested target title full trace and trim trace capture/playback with the build of the pull request, also tested playback user provided previously failed trace files, all tests passed, the commit fixed the above issue.
he above issue.

If pipeline cache data is omitted, it might trigger compile operation
during replay time pipeline creation which may not exist during capture
time. For such case, if VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT
is set in pipeline creation, playback will fail. This commit fixed the
issue.
Capture time pipeline cache data might be incompatible with replay time
because of driver version or platform change, so directly using capture
time pipeline cache data in replay time vkCreatePipelineCache call may
result in the data be ignored by driver. For such case, pipeline creation
may fail if the createinfo has the flag of
VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT. The commit fixed
the issue.
Pipeline cache contents can be modified by some commands such as pipeline
creation and merging pipeline cache, so restoring pipeline cache state of
trim starting point is needed for trim handling, otherwise playback may
fail if some pipeline creation command has
VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT in its create
info and the pipeline creation trigger compile during playback.
Made some code optimization to address code review comment.
Make the changes as auto cache_data_size to address code review comment.
Add some code optimization change to address code review comment.
@ci-tester-lunarg
Copy link

CI gfxreconstruct build queued with queue ID 124386.

@ci-tester-lunarg
Copy link

CI gfxreconstruct build # 3708 running.

@ci-tester-lunarg
Copy link

CI gfxreconstruct build # 3708 passed.

Comment on lines 5024 to 5029
if (iterator != pipeline_cache_info->pipeline_cache_data.end())
{
// We found pipeline cache data vector which has same hash value, will continue to check if it has
// same capture time pipeline cache data.
auto& cache_data = const_cast<PipelineCacheInfo*>(pipeline_cache_info)
->pipeline_cache_data[capture_pipeline_cache_data_hash_];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a found iterator. I think we could use iterator->second, instead of pipeline_cache_data[capture_pipeline_cache_data_hash].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 4928 to 4935
if (iterator != pipeline_cache_info->pipeline_cache_data.end())
{
// We found some existing pipeline cache data has same hash, so we continue to check if it's same with
// current pipeline cache data

const std::vector<PipelineCacheData>& cache_data =
const_cast<PipelineCacheInfo*>(pipeline_cache_info)
->pipeline_cache_data[capture_pipeline_cache_data_hash];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a found iterator. I think we could use iterator->second, instead of pipeline_cache_data[capture_pipeline_cache_data_hash].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +1229 to +1238
bool omitted_pipeline_cache_data_;

// Temporary data used by pipeline cache data handling
// The following capture time data used for calling VisitPipelineCacheInfo as input parameters
// , replay time data used as output result.
uint32_t capture_pipeline_cache_data_hash_ = 0;
uint32_t capture_pipeline_cache_data_size_ = 0;
void* capture_pipeline_cache_data_;
bool matched_replay_cache_data_exist_ = false;
std::vector<uint8_t> matched_replay_cache_data_;
Copy link
Contributor

@locke-lunarg locke-lunarg Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the program has plural PipelineCaches, does it happen an override issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the data is used for one pipeline cache process per time, therefore no override issue.

Change to use iterator::second to address code review comment.
@ci-tester-lunarg
Copy link

CI gfxreconstruct build queued with queue ID 144464.

@ci-tester-lunarg
Copy link

CI gfxreconstruct build # 3811 running.

@ci-tester-lunarg
Copy link

CI gfxreconstruct build # 3811 passed.

// 2. Replay without command line option --opcd or --omit-pipeline-cache-data and there is
// at least one vkCreatePipelineCache call with valid initial pipeline cache data and
// the initial cache data has no corresponding replay time cache data.
bool omitted_pipeline_cache_data_;
Copy link
Contributor

@locke-lunarg locke-lunarg Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omitted_pipeline_cache_data_ set true when replay_create_info->initialDataSize != 0 in OverrideCreatePipelineCache. If it has plural CreatePipelineCache, and some are replay_create_info->initialDataSize != 0, but some are replay_create_info->initialDataSize = 0, they should have different operation for AllowCompileDuringPipelineCreation. Or maybe we don't need to care it since it doesn't hurt?

Copy link
Contributor Author

@mizhen mizhen Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are the following cases in OverrideCreatePipelineCache:

User start replay with omit_pipeline_cache_data option

    A> options_.omit_pipeline_cache_data and pipeline cache created with initialDataSize==0

    B> options_.omit_pipeline_cache_data and pipeline cache created with initialDataSize!=0, omitted_pipeline_cache_data_ set true 

User start replay without omit_pipeline_cache_data option

    C> !options_.omit_pipeline_cache_data and pipeline cache created with initialDataSize==0

    D> !options_.omit_pipeline_cache_data and pipeline cache created with initialDataSize!=0 and pInitialData can be mapped by this PR

    E> !options_.omit_pipeline_cache_data and pipeline cache created with initialDataSize!=0 and pInitialData can not be mapped by this PR, omitted_pipeline_cache_data_ set true 

In the handling of this PR, once any vkCreatePipelineCache command of B> or E> show up, for all following pipeline creation, the handling of AllowCompileDuringPipelineCreation will be added.

Compared with every pipeline cache has different operation for AllowCompileDuringPipelineCreation, we saved the cache usage tracking handling (such tracking need consider merging cache), and the running difference is:

AllowCompileDuringPipelineCreation was added to some pipeline creation call which is not really needed, this includes those pipeline cache which is successfully mapped.

I think we don't need to care of the difference. The reason being:

Suppose a pipeline creation call X, its pipeline cache is successfully mapped (we don't need to consider map failed, because for the case we must add AllowCompileDuringPipelineCreation, otherwise playback will fail), we can see creation compile operation is same if compare the pipeline creation with or without AllowCompileDuringPipelineCreation:

1> Pipeline creation with handling of AllowCompileDuringPipelineCreation, the command will proceed the pipeline creation without re-compile.
2> Pipeline creation without handling of AllowCompileDuringPipelineCreation, the command will proceed the pipeline creation without re-compile.

@locke-lunarg locke-lunarg self-requested a review March 15, 2024 19:30
@locke-lunarg
Copy link
Contributor

Thank you for fixing.

@locke-lunarg locke-lunarg merged commit 3097563 into LunarG:dev Mar 15, 2024
8 checks passed
else
{
GFXRECON_LOG_WARNING(
"There's initial pipeline cache data in VkPipelineCacheCreateInfo, but no corresponding "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mizhen @locke-lunarg Is this really an exceptional condition? If the application calls vkCreatePipelineCache with initial data passed in (e.g. from a file) before creating the Pipeline and getting the cache data, wouldn't this happen to any app that loads cache data from a file and passes into the pipeline cache? This feels like normal operation. I wrote a quick test into cube.c passing in some junk as pInitialData and replay printed this message. Can we remove this warning? (also @lunarpapillo FYI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is normal operation for any app that loads a previous cache from disk, but here's a PR to make it a DEBUG message. #1527

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once the warning shows up, it means if playback on different hardware/driver, playback performance may have difference with capture time because of extra compile in pipeline creation.
Because if an application load cache data from file, that mean these cache data come from the title previous run, we cannot get its corresponding replay time cache data. When utilizing the cache data, two scenarios typically arise:

  1. The cache data is compatible with current hardware/driver, the cache data will be used.
  2. The cache data is not compatible with current hardware/driver, the cache data will be ignored by driver, the following pipeline creations with the cache object may trigger compile.
    As for the second case, replay performance will have some difference with capture time because of the extra compile. of course, for any case, it doesn't cause playback failure, I agree change to GFXRECON_LOG_DEBUG or GFXRECON_LOG_WARNING_ONCE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's going to behave as if the user changed the GPU in the machine and ran the original app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants