fix(stdlib)!: parse_logfmt performs escaping (#777) #924
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So that
parse_logmt(encode_logfmt(X)) == X
,parse_logfmt
now processes the escape sequences currently added byencode_logfmt
:\n
\"
\\
This is a breaking change, since the output of the parse_logfmt function is now different for certain inputs (that contain the above escape sequences).
A possible alternative is to avoid escaping while encoding, but that is also a breaking change and makes less sense to me at least.
Another alternative is to simply do nothing. Leave round trip encoding-decoding of logfmt "broken", however this is surprising for users and I don't think it was originally intended to work this way.
I do think it's worth considering other parse/encode pairs and whether they can and should roundtrip, but I haven't yet done this.
As for the implementation, parsers now return a Cow<'a, str> instead of just a &'a str because they may need to remove escape characters from the input, and thus it is no longer possible to represent as a string slice.
Performance
Returning Cow as opposed to plain String avoids allocation in the "happy path" of no escapes being present, however there is always a performance cost as each str is checked upfront for the presence of any escape characters to determine whether to allocate a Cow::Owned (regardless of whether any of the escape characters are actually removed during processing).
See
parse_key_value::escape_str
for the implementation."invalid" escape sequences
I've not (yet?) added an error for trailing escape characters. At the moment they would be accepted and kept as-is, the same goes for other escape sequences such as
\r
,\t
,\u{0123}
and "invalid sequences" such as say\z
. If they're not one of the escape sequences encoded by encode_logfmt then they are ignored.See
parse_key_value::escape_char
for the implementation.