Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: New overrides for string.Replace() #110912

Open
Bond-Addict opened this issue Dec 23, 2024 · 8 comments
Open

[API Proposal]: New overrides for string.Replace() #110912

Bond-Addict opened this issue Dec 23, 2024 · 8 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime untriaged New issue has not been triaged by the area owner

Comments

@Bond-Addict
Copy link

Background and motivation

Currently, when we need to perform replacements in strings, we rely on .Replace(), which requires either a specific string value (string.Replace()) or a regular expression (new Regex.Replace()). While this works, it feels like there would be a benefit to having additional overrides for string.Remove() to handle these scenarios, rather than just the current option that requires specifying a starting and ending index. While the existing approach has its use cases, removing parts of strings—especially with patterns or substrings—feels like a more common need and would provide more flexibility.

API Proposal

Below is an extension method class to server as a working example, but of course it's up to the team's interpretation as to how would be the best to implement it.

using System.Text.RegularExpressions;

namespace TestConsoleApp.Extensions
{
    public static class CommonExtensions
    {
        // Remove string or regex pattern from the string, with an optional comparison rule and regex flag
        public static string Remove(this string sourceString, string stringToRemove, bool isRegex = false, StringComparison comparisonRule = StringComparison.Ordinal)
        {
            if (isRegex)
            {
                return sourceString.Remove(new Regex(stringToRemove));
            }
            return sourceString.Replace(stringToRemove, "", comparisonRule);
        }

        // Remove a collection of strings or regex patterns
        public static string Remove(this string sourceString, string stringToRemove, char delimitedCharacter, bool isRegex = false, StringComparison comparisonRule = StringComparison.Ordinal)
        {
            foreach (var partToRemove in stringToRemove.Split(delimitedCharacter))
            {
                sourceString = sourceString.Remove(partToRemove, isRegex, comparisonRule);
            }
            return sourceString;
        }

        public static string Remove(this string sourceString, IEnumerable<string> stringsToRemove, bool isRegex = false, StringComparison comparisonRule = StringComparison.Ordinal)
        {
            foreach (var partToRemove in stringsToRemove)
            {
                sourceString = sourceString.Remove(partToRemove, isRegex, comparisonRule);
            }
            return sourceString;
        }

        // Remove a single regex pattern
        public static string Remove(this string sourceString, Regex removalRegex) => removalRegex.Replace(sourceString, "");

        // Remove a single regex pattern with custom options
        public static string Remove(this string sourceString, Regex removalRegex, Func<RegexOptions> options) =>
            new Regex(removalRegex.ToString(), options()).Replace(sourceString, "");

        // Remove multiple regex patterns
        public static string Remove(this string sourceString, IEnumerable<Regex> regexPatternsToRemove)
        {
            foreach (Regex regexPattern in regexPatternsToRemove)
            {
                sourceString = sourceString.Remove(regexPattern);
            }
            return sourceString;
        }

        // Remove multiple regex patterns with custom options
        public static string Remove(this string sourceString, IEnumerable<Regex> regexPatternsToRemove, Func<RegexOptions> options)
        {
            foreach (Regex regexPattern in regexPatternsToRemove)
            {
                sourceString = sourceString.Remove(regexPattern, options);
            }
            return sourceString;
        }
    }
}

API Usage

There are many situations where we need to sanitize a string for various reasons. Instead of writing and maintaining minimal but still duplicated code for sanitization, it would be far more efficient to simply "Remove" whatever values we don't want. The ability to provide either a single Regex or string, or a collection of Regex or string values to remove from the source string, would simplify this process. This feature would be particularly useful for quick and easy sanitization of usernames, passphrases, or any other user input.

For example, Ally Financial provides an API endpoint (https://secure.ally.com/assets/json/invalid-strings.json) that returns a list of words or phrases they do not allow in usernames or passphrases. Note: This list includes terms that are explicitly filtered for user-provided content and may contain offensive or vulgar language. This API is called after a user successfully signs in and is taken to their dashboard. The returned list is then used to validate or sanitize user-provided input (for very good reason). Instead of manually implementing and maintaining duplicate logic to filter out these words, with the inclusion of these additional overrides, they could simply pass in their list of restricted terms and proceed with their process. This approach also makes it easier to remove characters that could potentially expose the system to injection vulnerabilities.

For instance, if Ally didn’t want users to manually clean up their passphrases, they could sanitize a phrase like:

"ThisBadWordIsOtherBadWordMyLastBadWordPassword!"

They could then call:

a.Remove(["FirstBadWord", "OtherBadWord", "LastBadWord"])

Or, for case-insensitive matching:

a.Remove(["FirstBadWord", "OtherBadWord", "LastBadWord"], StringComparison.OrdinalIgnoreCase)

The result would be the sanitized password: ThisIsMyPassword!, which they could store securely. Then, during login, they could call the same function to sanitize the entered password (which may contain vulgar or restricted terms), and compare it to the sanitized stored password. This would allow the user to log in with their original password, while the system ensures it matches the sanitized version stored securely, keeping the process transparent to the customer. This approach simplifies string sanitization without duplicating logic. Note: I recognize that this example wouldn’t be a best practice, and I wouldn't recommend or use it myself. It simply serves to demonstrate one way they could utilize these new overrides.

Overall, this would be a valuable quality-of-life improvement and would better align the semantic meanings of "Replace" vs "Remove". Currently, "Replace" is used in cases where the intention is not to replace, but rather to remove content. Clarifying this distinction would make the API more intuitive and consistent with its intended behavior.

Alternative Designs

No response

Risks

Since these would be new overrides there shouldn't be any risk as the current

string.Remove(startIndex, endIndex) 

would still exist and be untouched.

@Bond-Addict Bond-Addict added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 23, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 23, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@Bond-Addict Bond-Addict changed the title [API Proposal]: [API Proposal]: New overrides for string.Replace() Dec 23, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@colejohnson66
Copy link

If the idea is to restrict certain words, wouldn't SearchValues<string> be a better solution? I, personally, do not like the idea of a website silently changing my input (there's horror stories of banks silently truncating passwords on the recovery form, but not the login side); If something is disallowed, just tell me.

@huoyaoyuan
Copy link
Member

Regex is very complex and slow at the context of string. Moreover, it lives several layers top. Methods involving regex have to be provided by regex. You can define your own extension methods for string.

@Bond-Addict
Copy link
Author

Bond-Addict commented Dec 23, 2024

If the idea is to restrict certain words, wouldn't SearchValues<string> be a better solution? I, personally, do not like the idea of a website silently changing my input (there's horror stories of banks silently truncating passwords on the recovery form, but not the login side); If something is disallowed, just tell me.

As I specifically addressed, it's a way that I would not recommend to use and I would not personally use that way, but its just a way they could use it if that's what they wanted. There are lots of other use-cases for the overrides.

@Bond-Addict
Copy link
Author

Regex is very complex and slow at the context of string. Moreover, it lives several layers top. Methods involving regex have to be provided by regex. You can define your own extension methods for string.

I realize that you can create extension methods. The provided code contain only extension methods. Those code examples are to serve as an example of how it could be accomplished as I've tried to thoroughly test each of them. I will be implementing
BenchmarkDotNet to measure the performance of the overrides later tonight.

@Bond-Addict
Copy link
Author

Bond-Addict commented Dec 26, 2024

Using BenchmarkDotNet:

Method Mean (ns) Error (ns) StdDev (ns)
TestCurrentSingleStringReplacementMethod 20.10 0.377 0.334
TestCurrentMultiStringReplacementMethod 10.53 0.094 0.083
TestCurrentRegexStringReplacementMethod 226.35 0.942 0.881
TestCurrentMultiRegexStringReplacementMethod 94.24 0.413 0.386
TestNewSingleStringReplacementMethod 39.80 0.451 0.400
TestNewDelimitedStringReplacementMethod 104.07 0.863 0.808
TestNewMultiStringReplacementMethod 79.81 0.245 0.191
TestNewRegexStringReplacementMethod 233.11 0.953 0.845
TestNewMultiRegexStringReplacementMethod 300.28 1.212 1.133

It does look to add a little bit more overhead, mostly for the regex part which tbh was expected.
I would imagine that direct api implementations (would be different code than I've provided, just accomplish the same result) rather than a extension method would perform far better though.

@colejohnson66
Copy link

Creating a Regex on the fly in this API will always perform worse than leveraging the built-in regex source generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants