-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ArchGetFileName
for windows to return full path
#3361
Improve ArchGetFileName
for windows to return full path
#3361
Conversation
@nvmkuruc @nvidia-jomiller for vis. |
// Open a file, check that the file path from FILE* handle is matched. | ||
ARCH_AXIOM((firstFile = ArchOpenFile(firstName.c_str(), "rb")) != NULL); | ||
std::string filePath = ArchGetFileName(firstFile); | ||
ARCH_AXIOM(_DosDevicePathFilter(filePath) == _DosDevicePathFilter(firstName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would comparing ArchAbsPath
be sufficient instead of _DosDevicePathFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work to me.
Filed as internal issue #USD-10302 |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
Thanks for the PR @roggiezhang-nv !
|
@roggiezhang-nv Maybe the test should compare https://openusd.org/dev/api/group__group__arch___system_functions.html#gab146e5fb7a6c2f805936960bc58e3911 suggests it should be returning the canonical path. |
Seems so. I'll fix it. |
Replaced. Let's see how it goes. |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
By the way @roggiezhang-nv - the test is still failing on Windows with similar output as I've commented above. Is it working on your Windows build? |
@anwang2009 Yes, it's working for my machine. The issue is that we don't have a way to return a uniform path on windows to check the equality of paths. Is it ok to use the uppercase or lowercase of paths on windows to compare? |
Btw, @anwang2009, @nvmkuruc wants me to compare the perf between I called each of them 100000 times, and Let me if I can simply the uppercase or lowercase to compare the paths for windows. Or you have any new suggestions. |
Btw, I fixed the path compare issue with |
@anwang2009 I also need to correct my number as I passed a wrong param which is not similar to the real implementation to GetFinalPathNameByHandle. In reality, it's 5-6 times slower than old implementation (windows only, but not Linux and Mac) because of GetFinalPathNameByHandle. So it's 15us per call for ArchGetFileName this time on my machine. Not sure this is an acceptable cost. |
4c6132d
to
3b41ea1
Compare
I haven't spotted it being discussed elsewhere, so I'll note here, forgive the dry tone, and ignore as you wish :) GetFinalPathNameByHandle resolves the final path of a file, including symbolic links, mount points, or junctions in addition to the work GetFileInformationByHandleEx might perform. Given the reported benchmark result of resolving in microseconds, I have a suspicion the benchmark is of cached performance not unwarmed performance. Also, I think there might be a more "interesting" difference (ie, tens of milliseconds or more) if a the path needs resolving. but, it's an unavoidable penalty if we want correct operation. A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise. If it's resolving across a network mount, right now it'll fail, and after a fix, the overhead will add up. But as I said, that's unavoidable. Where the overhead would likely be significant is if someone wrote a loop to get info on all the files in a massive directory but that doesn't happen in the USD codebase itself. |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@meshula Thanks for your insights. Yes, it's cached result. For non-cached result, that's 75us versus 20us for both functions on my machine. And I agree it should also depend on whether it's symbolic links, etc.
That's one of the issue I saw from Crate implementation as the filename returned by the call is not used anywhere, which means we can reduce that cost from USD codebase. @nvmkuruc for this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @roggiezhang-nv , the tests pass now. Here are a few notes:
pxr/base/arch/fileSystem.cpp
Outdated
string result; | ||
if (GetFileInformationByHandleEx( | ||
hfile, FileNameInfo, static_cast<void *>(fileNameInfo), bufSize)) { | ||
WCHAR filePath[MAX_PATH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs aren't super clear to me, but I think using MAX_PATH for the buffer size seems like it could be insufficient for Windows configurations where long paths are enabled.
GetFilePathNameByHandleW is supposed to return the required buffer size in the case where the filename would overflow the given buffer – it seems like we should be handling that somehow.
from @sunyab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved.
pxr/base/arch/fileSystem.cpp
Outdated
CP_UTF8, 0, fileNameInfo->FileName, | ||
fileNameInfo->FileNameLength/sizeof(WCHAR), | ||
CP_UTF8, 0, filePath, | ||
wcslen(filePath), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid calling wcslen by using the return value of GetFilePathNameByHandleW?
from @sunyab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
pxr/base/arch/fileSystem.cpp
Outdated
// the path returned is DOS device path, and the | ||
// syntax is one of: | ||
// \\.\C:\Test\Foo.txt | ||
// \\?\C:\Test\Foo.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment might be easier to understand as:
// Strip prefix from paths returned by GetFinalPathNameByHandleW,
// which is one of:
// \\.\C:\Test\Foo.txt
// \\?\C:\Test\Foo.txt
from @sunyab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
pxr/base/arch/fileSystem.cpp
Outdated
result.compare(0, 4, "\\\\.\\") == 0) | ||
{ | ||
result.erase(0, 4); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing some research, I saw that network paths on Windows will look like "\?\UNC...". I think we've run into issues with network file paths in the past, so we need to figure out whether we need to deal with this specially.
from @sunyab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I treated it specially without stripping the prefix if it's UNC path.
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
|
||
// Strip path prefix if necessary. | ||
// See https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats | ||
// for format of DOS device paths. | ||
auto canonicalPath = std::filesystem::canonical(result); | ||
result = canonicalPath.string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should actually consider removing this entire block. According to this post from three years ago std::filesystem::canonical
is identical to GetFinalPathNameByHandleW
. Have you found differently? If not, we should just be able to return result
directly instead of trying to canonicalize it again.
One thing to also note is that GetFinalPathNameByHandleW
by default (with VOLUME_NAME_DOS
) includes prefixes by default, and therefore so does std::filesystem::canonical
, if the above post is correct. So the prefixes are not stripped in this current implementation, which seems fine as long as downstream use cases are handled correctly. Why did you want prefixes stripped originally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anwang2009 We just thought we should return a canonical path.
Not sure what you mean that std::filesystem::canonical
is identical to GetFinalPathNameByHandleW
. std::filesystem::canonical
will strip prefixes if necessary to return a canonical path. For linux paths, they are the same, but not for windows paths.
\?\C:\test_files\main.usd <---- before std::filesystem::canonical
is called.
C:\test_files\main.usd <---- after std::filesystem::canonical
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the post must be out of date. Thanks @roggiezhang-nv !
Description of Change(s)
In a fix to improve Alembic plugin to read assets with ArResolver to support other sources except local files (#3302), Alembic library provides 4 APIs: one of them supports to read from local paths and the other 3 are using istream interfaces which has flaws to support layered ABC files. Therefore, we can only use the API that supports to read from local paths. While it involves to get the mirroed local path from ArAsset, and that's
ArchGetFileName
supposed to work for.However,
ArchGetFileName
only returns paths without drive letter, which is relative path related to the current drive the application is running from. This PR is to improveArchGetFileName
to return full path fromFILE*
. Here is an example before and after this PR:Assuming the file path behind the FILE* handle is
C:/path/includes/drive/letter.ext
, and the application is running fromD
drive.// Before --
ArchGetFileName(file_handle) => "/path/includes/drive/letter.ext"
// After --
ArchGetFileName(file_handle) => "C:/path/includes/drive/letter.ext"
The path returned from old API is
/path/includes/drive/letter.ext
, which is corresponding toD:/path/includes/drive/letter.ext
in full path, which is wrong.Fixes Issue(s)