Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ArchGetFileName for windows to return full path #3361

Merged

Conversation

roggiezhang-nv
Copy link
Contributor

@roggiezhang-nv roggiezhang-nv commented Oct 11, 2024

Description of Change(s)

In a fix to improve Alembic plugin to read assets with ArResolver to support other sources except local files (#3302), Alembic library provides 4 APIs: one of them supports to read from local paths and the other 3 are using istream interfaces which has flaws to support layered ABC files. Therefore, we can only use the API that supports to read from local paths. While it involves to get the mirroed local path from ArAsset, and that's ArchGetFileName supposed to work for.

However, ArchGetFileName only returns paths without drive letter, which is relative path related to the current drive the application is running from. This PR is to improve ArchGetFileName to return full path from FILE*. Here is an example before and after this PR:

Assuming the file path behind the FILE* handle is C:/path/includes/drive/letter.ext, and the application is running from D drive.
// Before --
ArchGetFileName(file_handle) => "/path/includes/drive/letter.ext"
// After --
ArchGetFileName(file_handle) => "C:/path/includes/drive/letter.ext"

The path returned from old API is /path/includes/drive/letter.ext, which is corresponding to D:/path/includes/drive/letter.ext in full path, which is wrong.

Fixes Issue(s)

  • I have verified that all unit tests pass with the proposed changes
  • I have submitted a signed Contributor License Agreement

@roggiezhang-nv roggiezhang-nv changed the base branch from release to dev October 11, 2024 05:10
@roggiezhang-nv
Copy link
Contributor Author

@nvmkuruc @nvidia-jomiller for vis.

// Open a file, check that the file path from FILE* handle is matched.
ARCH_AXIOM((firstFile = ArchOpenFile(firstName.c_str(), "rb")) != NULL);
std::string filePath = ArchGetFileName(firstFile);
ARCH_AXIOM(_DosDevicePathFilter(filePath) == _DosDevicePathFilter(firstName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would comparing ArchAbsPath be sufficient instead of _DosDevicePathFilter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work to me.

@jesschimein
Copy link
Collaborator

Filed as internal issue #USD-10302

@jesschimein
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jesschimein
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anwang2009
Copy link
Contributor

Thanks for the PR @roggiezhang-nv !

testFileSystem with these changes is failing on both macos and windows - could you address this? Here's the offending failed test and context on windows.:

72: first name C:\Windows\TEMP/archFS.12176
72: filePath C:\Windows\Temp\archFS.12176
72: ArchNormPath(first name) C:/Windows/TEMP/archFS.12176
72: ArchNormPath(filePath) C:/Windows/Temp/archFS.12176
72:  ArchError: [ArchNormPath(filePath) == ArchNormPath(firstName)] axiom failed

@nvmkuruc
Copy link
Collaborator

nvmkuruc commented Nov 6, 2024

@roggiezhang-nv Maybe the test should compare ArchAbsPath to get rid of any symlink?

https://openusd.org/dev/api/group__group__arch___system_functions.html#gab146e5fb7a6c2f805936960bc58e3911 suggests it should be returning the canonical path.

@roggiezhang-nv
Copy link
Contributor Author

Seems so. I'll fix it.

@roggiezhang-nv
Copy link
Contributor Author

Replaced. Let's see how it goes.

@jesschimein
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@anwang2009
Copy link
Contributor

By the way @roggiezhang-nv - the test is still failing on Windows with similar output as I've commented above. Is it working on your Windows build?

@roggiezhang-nv
Copy link
Contributor Author

By the way @roggiezhang-nv - the test is still failing on Windows with similar output as I've commented above. Is it working on your Windows build?

@anwang2009 Yes, it's working for my machine. The issue is that we don't have a way to return a uniform path on windows to check the equality of paths. Is it ok to use the uppercase or lowercase of paths on windows to compare?

@roggiezhang-nv
Copy link
Contributor Author

Btw, @anwang2009, @nvmkuruc wants me to compare the perf between GetFinalPathNameByHandle (new) and GetFileInformationByHandleEx (old) to give more confidence.

I called each of them 100000 times, and GetFinalPathNameByHandle is indeed twice slower than GetFileInformationByHandleEx. However, it costs only 6us for GetFinalPathNameByHandle compared to 3us for GetFileInformationByHandleEx in each call, so the perf influence can be ignored.

Let me if I can simply the uppercase or lowercase to compare the paths for windows. Or you have any new suggestions.

@roggiezhang-nv
Copy link
Contributor Author

Btw, I fixed the path compare issue with std::filesystem::equivalent, which can handle cases.

@roggiezhang-nv
Copy link
Contributor Author

roggiezhang-nv commented Nov 20, 2024

@anwang2009 I also need to correct my number as I passed a wrong param which is not similar to the real implementation to GetFinalPathNameByHandle. In reality, it's 5-6 times slower than old implementation (windows only, but not Linux and Mac) because of GetFinalPathNameByHandle. So it's 15us per call for ArchGetFileName this time on my machine. Not sure this is an acceptable cost.

@roggiezhang-nv roggiezhang-nv force-pushed the improve_arch_get_file_name branch from 4c6132d to 3b41ea1 Compare November 20, 2024 14:22
@meshula
Copy link
Member

meshula commented Nov 20, 2024

I haven't spotted it being discussed elsewhere, so I'll note here, forgive the dry tone, and ignore as you wish :)

GetFinalPathNameByHandle resolves the final path of a file, including symbolic links, mount points, or junctions in addition to the work GetFileInformationByHandleEx might perform.

Given the reported benchmark result of resolving in microseconds, I have a suspicion the benchmark is of cached performance not unwarmed performance. Also, I think there might be a more "interesting" difference (ie, tens of milliseconds or more) if a the path needs resolving. but, it's an unavoidable penalty if we want correct operation.

A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise.

If it's resolving across a network mount, right now it'll fail, and after a fix, the overhead will add up. But as I said, that's unavoidable.

Where the overhead would likely be significant is if someone wrote a loop to get info on all the files in a massive directory but that doesn't happen in the USD codebase itself.

@jesschimein
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@roggiezhang-nv
Copy link
Contributor Author

roggiezhang-nv commented Nov 21, 2024

I haven't spotted it being discussed elsewhere, so I'll note here, forgive the dry tone, and ignore as you wish :)

GetFinalPathNameByHandle resolves the final path of a file, including symbolic links, mount points, or junctions in addition to the work GetFileInformationByHandleEx might perform.

Given the reported benchmark result of resolving in microseconds, I have a suspicion the benchmark is of cached performance not unwarmed performance. Also, I think there might be a more "interesting" difference (ie, tens of milliseconds or more) if a the path needs resolving. but, it's an unavoidable penalty if we want correct operation.

A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise.

If it's resolving across a network mount, right now it'll fail, and after a fix, the overhead will add up. But as I said, that's unavoidable.

Where the overhead would likely be significant is if someone wrote a loop to get info on all the files in a massive directory but that doesn't happen in the USD codebase itself.

@meshula Thanks for your insights. Yes, it's cached result. For non-cached result, that's 75us versus 20us for both functions on my machine. And I agree it should also depend on whether it's symbolic links, etc.

A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise.

That's one of the issue I saw from Crate implementation as the filename returned by the call is not used anywhere, which means we can reduce that cost from USD codebase. @nvmkuruc for this issue.

Copy link
Contributor

@anwang2009 anwang2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @roggiezhang-nv , the tests pass now. Here are a few notes:

string result;
if (GetFileInformationByHandleEx(
hfile, FileNameInfo, static_cast<void *>(fileNameInfo), bufSize)) {
WCHAR filePath[MAX_PATH];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs aren't super clear to me, but I think using MAX_PATH for the buffer size seems like it could be insufficient for Windows configurations where long paths are enabled.

GetFilePathNameByHandleW is supposed to return the required buffer size in the case where the filename would overflow the given buffer – it seems like we should be handling that somehow.

from @sunyab

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved.

CP_UTF8, 0, fileNameInfo->FileName,
fileNameInfo->FileNameLength/sizeof(WCHAR),
CP_UTF8, 0, filePath,
wcslen(filePath),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid calling wcslen by using the return value of GetFilePathNameByHandleW?

from @sunyab

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

// the path returned is DOS device path, and the
// syntax is one of:
// \\.\C:\Test\Foo.txt
// \\?\C:\Test\Foo.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might be easier to understand as:

// Strip prefix from paths returned by GetFinalPathNameByHandleW,
// which is one of:
//   \\.\C:\Test\Foo.txt
//   \\?\C:\Test\Foo.txt

from @sunyab

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

result.compare(0, 4, "\\\\.\\") == 0)
{
result.erase(0, 4);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While doing some research, I saw that network paths on Windows will look like "\?\UNC...". I think we've run into issues with network file paths in the past, so we need to figure out whether we need to deal with this specially.

from @sunyab

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I treated it specially without stripping the prefix if it's UNC path.

@jesschimein
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +558 to +563

// Strip path prefix if necessary.
// See https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats
// for format of DOS device paths.
auto canonicalPath = std::filesystem::canonical(result);
result = canonicalPath.string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually consider removing this entire block. According to this post from three years ago std::filesystem::canonical is identical to GetFinalPathNameByHandleW. Have you found differently? If not, we should just be able to return result directly instead of trying to canonicalize it again.

One thing to also note is that GetFinalPathNameByHandleW by default (with VOLUME_NAME_DOS) includes prefixes by default, and therefore so does std::filesystem::canonical, if the above post is correct. So the prefixes are not stripped in this current implementation, which seems fine as long as downstream use cases are handled correctly. Why did you want prefixes stripped originally?

Copy link
Contributor Author

@roggiezhang-nv roggiezhang-nv Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anwang2009 We just thought we should return a canonical path.

Not sure what you mean that std::filesystem::canonical is identical to GetFinalPathNameByHandleW. std::filesystem::canonical will strip prefixes if necessary to return a canonical path. For linux paths, they are the same, but not for windows paths.

\?\C:\test_files\main.usd <---- before std::filesystem::canonical is called.
C:\test_files\main.usd <---- after std::filesystem::canonical is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the post must be out of date. Thanks @roggiezhang-nv !

@pixar-oss pixar-oss merged commit 3defe99 into PixarAnimationStudios:dev Dec 12, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants