fix(stdlib)!: parse_logfmt performs escaping (#777) #924

drmason13 · 2024-07-07T21:55:04Z

So that parse_logmt(encode_logfmt(X)) == X, parse_logfmt now processes the escape sequences currently added by encode_logfmt:

\n
\"
\\

This is a breaking change, since the output of the parse_logfmt function is now different for certain inputs (that contain the above escape sequences).

A possible alternative is to avoid escaping while encoding, but that is also a breaking change and makes less sense to me at least.

Another alternative is to simply do nothing. Leave round trip encoding-decoding of logfmt "broken", however this is surprising for users and I don't think it was originally intended to work this way.

I do think it's worth considering other parse/encode pairs and whether they can and should roundtrip, but I haven't yet done this.

As for the implementation, parsers now return a Cow<'a, str> instead of just a &'a str because they may need to remove escape characters from the input, and thus it is no longer possible to represent as a string slice.

Performance

Returning Cow as opposed to plain String avoids allocation in the "happy path" of no escapes being present, however there is always a performance cost as each str is checked upfront for the presence of any escape characters to determine whether to allocate a Cow::Owned (regardless of whether any of the escape characters are actually removed during processing).
See parse_key_value::escape_str for the implementation.

"invalid" escape sequences

I've not (yet?) added an error for trailing escape characters. At the moment they would be accepted and kept as-is, the same goes for other escape sequences such as \r, \t, \u{0123} and "invalid sequences" such as say \z. If they're not one of the escape sequences encoded by encode_logfmt then they are ignored.
See parse_key_value::escape_char for the implementation.

bruceg

The code makes sense to me, as does introducing a breaking change for it. The match in escape_char is a little awkward but I can't think of a cleaner way of doing it myself. I think the changelog filename needs to change, though.

bruceg · 2024-07-15T21:48:42Z

changelog.d/777.fix.md

Since this is a breaking change, the filename for the changelog fragment needs to be X.breaking.md.

drmason13 · 2024-07-16T14:37:52Z

The code makes sense to me, as does introducing a breaking change for it. The match in escape_char is a little awkward but I can't think of a cleaner way of doing it myself. I think the changelog filename needs to change, though.

I've looked at it again, and don't want to swap out the match for some sort of and_then or chain.

Some(_) => c

This branch is the most confusing. Might be worth a comment to highlight that we are ignoring the escape sequence entirely in that case by returning c (the backslash), and that this is because we only concern ourselves with escapes that are added by encode_key_value

bruceg · 2024-07-23T15:11:20Z

I've looked at it again, and don't want to swap out the match for some sort of and_then or chain.

No, that would definitely be a step backwards. I was more thinking of how to deduplicate the repeated _ = rest.next(); but I couldn't see a good way, so my comment was not a complaint.

fix(stdlib): parse_logfmt performs escaping (vectordotdev#777)

c7b2f2b

drmason13 force-pushed the parse-encode-inverse-fix branch from b6be400 to c7b2f2b Compare July 7, 2024 22:00

jszwedko requested a review from pront July 8, 2024 19:57

bruceg requested changes Jul 15, 2024

View reviewed changes

changelog.d/777.fix.md Outdated

Copy link

Member

bruceg Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a breaking change, the filename for the changelog fragment needs to be X.breaking.md.

bruceg changed the title ~~fix(stdlib): parse_logfmt performs escaping (#777)~~ fix(stdlib)!: parse_logfmt performs escaping (#777) Jul 15, 2024

label changelog fragment as breaking

1ba29f7

explaining comment for Some(_) -> c branch

7f138e1

jszwedko requested a review from bruceg July 23, 2024 15:01

bruceg approved these changes Jul 23, 2024

View reviewed changes

bruceg added this pull request to the merge queue Jul 23, 2024

Merged via the queue into vectordotdev:main with commit 8691870 Jul 23, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(stdlib)!: parse_logfmt performs escaping (#777) #924

fix(stdlib)!: parse_logfmt performs escaping (#777) #924

drmason13 commented Jul 7, 2024

bruceg left a comment

bruceg Jul 15, 2024

drmason13 commented Jul 16, 2024

bruceg commented Jul 23, 2024

fix(stdlib)!: parse_logfmt performs escaping (#777) #924

fix(stdlib)!: parse_logfmt performs escaping (#777) #924

Conversation

drmason13 commented Jul 7, 2024

Performance

"invalid" escape sequences

bruceg left a comment

Choose a reason for hiding this comment

bruceg Jul 15, 2024

Choose a reason for hiding this comment

drmason13 commented Jul 16, 2024

bruceg commented Jul 23, 2024