Skip to content

Employ Unidecode for path mangling #232

@mih

Description

@mih

Rescuing #83 (comment)

If export mode should continue to be supported (#230), this is something to consider in order to be able to deliver any meaningful outcome for filename with non-latin (ascii) chars.

Possibly even outside export mode unicode handling would be needed, if URL keys might contain such chars.

Confirmed:

The name field at the end has a format dependent on the backend. It is always the last field, and is prefixed with "--". Unlike other fields, it may contain "-" in its content. It should not contain newline characters or "/"; otherwise nearly anything goes. The "E" variants of hash keys include a filename extension after the hash.

Unicode handling is needed uniformly.

Given that the the mangle/unmangle_path() function pair aims to provide a reversible mapping, and unicode->ascii cannot possibly be that, we need a solution on top.

In principle this should be possible, because we never actually unmangle a path, but only use forward-mangling to match against a state reported by dataverse (code confirms no usage of unmangle_path() outside tests).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions