-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add -EntryEncoding as a parameter on Expand-Archive cmdlet #88
Comments
Thanks for opening this issue - a fix is long overdue. As for:
While an
In fact, this logic seems to already be built into the (Strictly speaking, any UTF-8-encoded string is also a technically valid OEM-encoded string, so there is hypothetical ambiguity; in practice, however, a valid UTF-8 byte sequence resulting in a human-readable, intentional file name when interpreted as OEM-encoded is unlikely, and the .NET designers were apparently comfortable to quietly resolve this ambiguity in favor of UTF-8). Note: I'm assuming that it is the active OEM code page that is to be used, not fixed code page For instance, thanks to the try-UTF-8-first approach, the following sample command is capable of properly processing an archive [System.IO.Compression.ZipFile]::ExtractToDirectory(
"$pwd/test.zip",
"$pwd",
[System.Text.Encoding]::GetEncoding((Get-Culture).TextInfo.OEMCodePage)
) |
I agree that if at all possible to improve on the default behavior - i.e. when the proposed |
So, judging from the inactivity of this issue, My take as a user, affected by this defect, the current recommendation from Microsoft is: This is defacto accepted behavior of Expand-Archive and you should use another tool for uncompressing files. |
Summary of the new feature/enhancement
The
Extract-Archive
cmdlet does not currently allow to define the expected encoding of file names in the archive to be expanded. This means the cmdlet cannot predictably be used to expand ZIP files created with other tools than PowerShell itself (meaningCompress-Archive
).For example, the
Extract-Archive
cmdlet cannot predictably unpack an archive created from Windows File Explorer (aka Compressed Folders feature) ... unless such archive is only using ASCII127 chars for archive entry names. This is Windows not being compatible with Windows and should be fixed.Proposed technical implementation details
Unfortunately the ZIP spec does not define in enough detail information for a consumer of an archive to reliable tell which encoding was used for the names of the entries in the archive. Therefore, the only possible solution is to ask the consumer what it should be.
Proposed solution is to allow encoding as a parameter to the
Expand-Archive
cmdlet and to document what happens if such parameter is not specified, which is the current behavior. I suggest to name such parameterEntryEncoding
to make it clear that it is about how the ZIP entries are encoded, not encoding of the file content, nor encoding of the archive name itself.Note: Overall I like what
Compress-Archive
is doing, consistently using UTF-8 for the file names, but the truth of the matter here is that most PowerShell users will expect to be able to useExtract-Archive
cmdlet to also expand archives which were not created by PowerShell itself.Test case
(simply using some examples from the Western European charset here)
Create ZIP archive of these files using Windows Compressed Folders feature (or 7-Zip, or any other ZIP tool for Windows, anything except PowerShell itself).
Attempt to unpack the archive from step 2 using the
Expand-Archive
cmdlet. The result should be that file names from step 1 are preserved.The text was updated successfully, but these errors were encountered: