Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex \W inconsistency #43928

Open
miloush opened this issue Dec 10, 2024 · 2 comments
Open

Regex \W inconsistency #43928

miloush opened this issue Dec 10, 2024 · 2 comments
Labels
dotnet-fundamentals/svc waiting-on-feedback Waiting for feedback from SMEs before they can be merged ⌚ Not Triaged Not triaged

Comments

@miloush
Copy link

miloush commented Dec 10, 2024

Type of issue

Other (describe below)

Description

The \W language element is equivalent to the following character class [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]

In other words, it matches any character except for those in the Unicode categories listed in the following table.

The table contains \p{Mn} but the equivalent class does not.

Page URL

https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions

Content source URL

https://github.com/dotnet/docs/blob/main/docs/standard/base-types/character-classes-in-regular-expressions.md

Document Version Independent Id

95abea42-fa7f-3feb-ae38-049719ab938f

Article author

@adegeo

Metadata

  • ID: a1958bf6-a17d-1256-d659-31066aa02604
  • Service: dotnet-fundamentals

Related Issues

@adegeo
Copy link
Contributor

adegeo commented Dec 12, 2024

I'm not sure which is true. I can't seem to replicate it. can you? I was trying this to visualize what it marks, but I can't seem to get the grave accent to match.

using System.Text.RegularExpressions;

string pattern = @"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}\p{Mn}]";
string source = "The old, grey mare slowly walked across the narrow, green ̀pasture.";
char[] chars = new string(' ', source.Length).ToCharArray();

Console.WriteLine(source);

foreach (Match match in Regex.Matches("The old, grey mare slowly walked across the narrow, green pasture.", pattern))
{
    for (int i = 0; i < match.Groups[0].Length; i++)
        chars[match.Groups[0].Index + i] = 'X';
}

Console.WriteLine(new string(chars));

@adegeo adegeo added waiting-on-feedback Waiting for feedback from SMEs before they can be merged needs-more-info Needs more info from OP. Auto-closed after 2 weeks if no response. [org][resolution] labels Dec 12, 2024
@dotnetrepoman dotnetrepoman bot removed the ⌚ Not Triaged Not triaged label Dec 12, 2024
@miloush
Copy link
Author

miloush commented Dec 12, 2024

Well, Regex.IsMatch("\u0300", "\\W") returns false, so the table is correct. The equivalent string is missing \p{Mn}

@dotnet-policy-service dotnet-policy-service bot removed the needs-more-info Needs more info from OP. Auto-closed after 2 weeks if no response. [org][resolution] label Dec 12, 2024
@dotnetrepoman dotnetrepoman bot added the ⌚ Not Triaged Not triaged label Dec 12, 2024
@BillWagner BillWagner removed the Pri1 label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dotnet-fundamentals/svc waiting-on-feedback Waiting for feedback from SMEs before they can be merged ⌚ Not Triaged Not triaged
Projects
None yet
Development

No branches or pull requests

3 participants