Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new lint: char_indices_as_byte_indices #13435

Merged
merged 4 commits into from
Mar 30, 2025

Conversation

y21
Copy link
Member

@y21 y21 commented Sep 21, 2024

Closes #10202.

This adds a new lint that checks for uses of the .chars().enumerate() position in a context where a byte index is required and suggests changing it to use .char_indices() instead.

I'm planning to extend this lint to also detect uses of the position in iterator chains, e.g. s.chars().enumerate().for_each(|(i, _)| s.split_at(i));, but that's for another time


changelog: new lint: chars_enumerate_for_byte_indices

@rustbot
Copy link
Collaborator

rustbot commented Sep 21, 2024

r? @Centri3

rustbot has assigned @Centri3.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Sep 21, 2024
Copy link
Member Author

@y21 y21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes for reviewers

Comment on lines 22 to 25
// can't use #[expect] here because the .fixed file will still have the attribute and create an
// unfulfilled expectation, but make sure lint level attributes work on the use expression:
#[allow(clippy::chars_enumerate_for_byte_indices)]
let _ = prim[..idx];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a fun one, I wonder if that's something that could be fixed in uitest, like removing #[expect] attributes in the .fixed file 🤔

/// ```
#[clippy::version = "1.83.0"]
pub CHARS_ENUMERATE_FOR_BYTE_INDICES,
correctness,
Copy link
Member Author

@y21 y21 Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern is technically fine if you know what your strings are (like the description mentions) so it's not always 'outright wrong' like the usual correctness lints, but the fix is also really simple and always applicable so 🤷‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said on Zulip, we could say that bytes should always be used as a replacement if it expects ASCII. I don't think there are any downsides to that. I think blocking compilation on those cases is better than not doing so for where it could actually be a problem.

@Boshen
Copy link

Boshen commented Sep 26, 2024

Thank you for working on this.

The linter we are working on (oxlint) has encountered dozens of crashes because of this, and there were no ways to forbid such usages.

Copy link
Member

@Centri3 Centri3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me. I'll open the FCP in a moment

@bors
Copy link
Contributor

bors commented Dec 1, 2024

☔ The latest upstream changes (presumably 1f966e9) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@Centri3 Centri3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the FCP: We could mention .bytes() in the description, but no issues

/// ```
#[clippy::version = "1.83.0"]
pub CHARS_ENUMERATE_FOR_BYTE_INDICES,
correctness,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said on Zulip, we could say that bytes should always be used as a replacement if it expects ASCII. I don't think there are any downsides to that. I think blocking compilation on those cases is better than not doing so for where it could actually be a problem.

@y21 y21 force-pushed the chars_enumerate_for_byte_index branch from bf38bb1 to 4ac1ee8 Compare December 3, 2024 03:50
@y21
Copy link
Member Author

y21 commented Dec 3, 2024

Added a note for str::bytes and also renamed the lint to be slightly more general in case we have other sources of char indices in the future as Jarcho suggested on zulip. Left them as separate commits for now so it's easier to review but will squash once you think it's ok

Copy link
Member

@Centri3 Centri3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :)

"passing a character position to a method that expects a byte index"
},
ExprKind::Index(target, ..)
if is_string_like(cx.typeck_results().expr_ty_adjusted(target).peel_refs())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the reason this isn't used in both arms is a bit non-obvious at first, especially if the comment on BYTE_INDEX_METHODS isn't seen. It could perhaps just point to what's already written there. That's about all.

@Centri3
Copy link
Member

Centri3 commented Feb 14, 2025

Hey @y21, can you take a look at this again? Is there anything from my review that doesn't make sense/you're stuck on?

@y21
Copy link
Member Author

y21 commented Feb 14, 2025

Oops, I got busy and forgot about this. I'll look at this again over the weekend!

@y21 y21 force-pushed the chars_enumerate_for_byte_index branch from 4ac1ee8 to 56124b8 Compare March 30, 2025 11:54
@y21 y21 force-pushed the chars_enumerate_for_byte_index branch from 56124b8 to c739027 Compare March 30, 2025 12:05
@y21 y21 changed the title new lint: chars_enumerate_for_byte_indices new lint: char_indices_as_byte_indices Mar 30, 2025
@Centri3 Centri3 added this pull request to the merge queue Mar 30, 2025
Merged via the queue into rust-lang:master with commit 4f58673 Mar 30, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
5 participants