-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new lint: char_indices_as_byte_indices
#13435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes for reviewers
// can't use #[expect] here because the .fixed file will still have the attribute and create an | ||
// unfulfilled expectation, but make sure lint level attributes work on the use expression: | ||
#[allow(clippy::chars_enumerate_for_byte_indices)] | ||
let _ = prim[..idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a fun one, I wonder if that's something that could be fixed in uitest, like removing #[expect]
attributes in the .fixed file 🤔
/// ``` | ||
#[clippy::version = "1.83.0"] | ||
pub CHARS_ENUMERATE_FOR_BYTE_INDICES, | ||
correctness, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern is technically fine if you know what your strings are (like the description mentions) so it's not always 'outright wrong' like the usual correctness lints, but the fix is also really simple and always applicable so 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said on Zulip, we could say that bytes
should always be used as a replacement if it expects ASCII. I don't think there are any downsides to that. I think blocking compilation on those cases is better than not doing so for where it could actually be a problem.
Thank you for working on this. The linter we are working on (oxlint) has encountered dozens of crashes because of this, and there were no ways to forbid such usages. |
97c4bfb
to
bf38bb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks fine to me. I'll open the FCP in a moment
☔ The latest upstream changes (presumably 1f966e9) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the FCP: We could mention .bytes()
in the description, but no issues
/// ``` | ||
#[clippy::version = "1.83.0"] | ||
pub CHARS_ENUMERATE_FOR_BYTE_INDICES, | ||
correctness, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said on Zulip, we could say that bytes
should always be used as a replacement if it expects ASCII. I don't think there are any downsides to that. I think blocking compilation on those cases is better than not doing so for where it could actually be a problem.
bf38bb1
to
4ac1ee8
Compare
Added a note for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
"passing a character position to a method that expects a byte index" | ||
}, | ||
ExprKind::Index(target, ..) | ||
if is_string_like(cx.typeck_results().expr_ty_adjusted(target).peel_refs()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the reason this isn't used in both arms is a bit non-obvious at first, especially if the comment on BYTE_INDEX_METHODS
isn't seen. It could perhaps just point to what's already written there. That's about all.
Hey @y21, can you take a look at this again? Is there anything from my review that doesn't make sense/you're stuck on? |
Oops, I got busy and forgot about this. I'll look at this again over the weekend! |
4ac1ee8
to
56124b8
Compare
56124b8
to
c739027
Compare
chars_enumerate_for_byte_indices
char_indices_as_byte_indices
Closes #10202.
This adds a new lint that checks for uses of the
.chars().enumerate()
position in a context where a byte index is required and suggests changing it to use.char_indices()
instead.I'm planning to extend this lint to also detect uses of the position in iterator chains, e.g.
s.chars().enumerate().for_each(|(i, _)| s.split_at(i));
, but that's for another timechangelog: new lint:
chars_enumerate_for_byte_indices