Description
Summary of the new feature/enhancement
The Extract-Archive
cmdlet does not currently allow to define the expected encoding of file names in the archive to be expanded. This means the cmdlet cannot predictably be used to expand ZIP files created with other tools than PowerShell itself (meaning Compress-Archive
).
For example, the Extract-Archive
cmdlet cannot predictably unpack an archive created from Windows File Explorer (aka Compressed Folders feature) ... unless such archive is only using ASCII127 chars for archive entry names. This is Windows not being compatible with Windows and should be fixed.
Proposed technical implementation details
Unfortunately the ZIP spec does not define in enough detail information for a consumer of an archive to reliable tell which encoding was used for the names of the entries in the archive. Therefore, the only possible solution is to ask the consumer what it should be.
Proposed solution is to allow encoding as a parameter to the Expand-Archive
cmdlet and to document what happens if such parameter is not specified, which is the current behavior. I suggest to name such parameter EntryEncoding
to make it clear that it is about how the ZIP entries are encoded, not encoding of the file content, nor encoding of the archive name itself.
Note: Overall I like what Compress-Archive
is doing, consistently using UTF-8 for the file names, but the truth of the matter here is that most PowerShell users will expect to be able to use Extract-Archive
cmdlet to also expand archives which were not created by PowerShell itself.
Test case
- Create some empty files with names such as
Père-Noël.txt
Plankalkül.txt
Ærø-Å.txt
(simply using some examples from the Western European charset here)
-
Create ZIP archive of these files using Windows Compressed Folders feature (or 7-Zip, or any other ZIP tool for Windows, anything except PowerShell itself).
-
Attempt to unpack the archive from step 2 using the
Expand-Archive
cmdlet. The result should be that file names from step 1 are preserved.