-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Rescuing #83 (comment)
If export mode should continue to be supported (#230), this is something to consider in order to be able to deliver any meaningful outcome for filename with non-latin (ascii) chars.
Possibly even outside export mode unicode handling would be needed, if URL keys might contain such chars.
Confirmed:
The name field at the end has a format dependent on the backend. It is always the last field, and is prefixed with "--". Unlike other fields, it may contain "-" in its content. It should not contain newline characters or "/"; otherwise nearly anything goes. The "E" variants of hash keys include a filename extension after the hash.
Unicode handling is needed uniformly.
Given that the the mangle/unmangle_path()
function pair aims to provide a reversible mapping, and unicode->ascii cannot possibly be that, we need a solution on top.
In principle this should be possible, because we never actually unmangle a path, but only use forward-mangling to match against a state reported by dataverse (code confirms no usage of unmangle_path()
outside tests).