Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format_char_str: Avoid costly decoding if possible #32018

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

antiguru
Copy link
Member

The format_char_str function formats a string as a fixed-length string, potentially cutting off overhanging white space. It needs to decode the string to find the position of the length character, which is a slow operation.

This change introduces an optimization where if a string certainly isn't longer than the length in characters, we bail out early. The reasoning for this is simple: a string of length $n$ can have at most $n$ characters, so the length serves as an upper bound for the number of characters.

Related: MaterializeInc/database-issues#9125

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

The `format_char_str` function formats a string as a fixed-length string,
potentially cutting off overhanging white space. It needs to decode the
string to find the position of the `length` character, which is a slow
operation.

This change introduces an optimization where if a string certainly isn't
longer than the length in characters, we bail out early. The reasoning for
this is simple: a string of length $n$ can have at most $n$ characters, so
the length serves as an upper bound for the number of characters.

Related: MaterializeInc/database-issues#9125

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru requested a review from a team as a code owner March 26, 2025 13:06
@antiguru antiguru requested a review from ggevay March 26, 2025 13:07
@antiguru antiguru enabled auto-merge (squash) March 26, 2025 13:30
@antiguru antiguru merged commit ca607f5 into MaterializeInc:main Mar 26, 2025
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants