[API Proposal]: New overrides for string.Replace() #110912

Bond-Addict · 2024-12-23T12:26:12Z

Background and motivation

Currently, when we need to perform replacements in strings, we rely on .Replace(), which requires either a specific string value (string.Replace()) or a regular expression (new Regex.Replace()). While this works, it feels like there would be a benefit to having additional overrides for string.Remove() to handle these scenarios, rather than just the current option that requires specifying a starting and ending index. While the existing approach has its use cases, removing parts of strings—especially with patterns or substrings—feels like a more common need and would provide more flexibility.

API Proposal

Below is an extension method class to server as a working example, but of course it's up to the team's interpretation as to how would be the best to implement it.

using System.Text.RegularExpressions;

namespace TestConsoleApp.Extensions
{
    public static class CommonExtensions
    {
        // Remove string or regex pattern from the string, with an optional comparison rule and regex flag
        public static string Remove(this string sourceString, string stringToRemove, bool isRegex = false, StringComparison comparisonRule = StringComparison.Ordinal)
        {
            if (isRegex)
            {
                return sourceString.Remove(new Regex(stringToRemove));
            }
            return sourceString.Replace(stringToRemove, "", comparisonRule);
        }

        // Remove a collection of strings or regex patterns
        public static string Remove(this string sourceString, string stringToRemove, char delimitedCharacter, bool isRegex = false, StringComparison comparisonRule = StringComparison.Ordinal)
        {
            foreach (var partToRemove in stringToRemove.Split(delimitedCharacter))
            {
                sourceString = sourceString.Remove(partToRemove, isRegex, comparisonRule);
            }
            return sourceString;
        }

        public static string Remove(this string sourceString, IEnumerable<string> stringsToRemove, bool isRegex = false, StringComparison comparisonRule = StringComparison.Ordinal)
        {
            foreach (var partToRemove in stringsToRemove)
            {
                sourceString = sourceString.Remove(partToRemove, isRegex, comparisonRule);
            }
            return sourceString;
        }

        // Remove a single regex pattern
        public static string Remove(this string sourceString, Regex removalRegex) => removalRegex.Replace(sourceString, "");

        // Remove a single regex pattern with custom options
        public static string Remove(this string sourceString, Regex removalRegex, Func<RegexOptions> options) =>
            new Regex(removalRegex.ToString(), options()).Replace(sourceString, "");

        // Remove multiple regex patterns
        public static string Remove(this string sourceString, IEnumerable<Regex> regexPatternsToRemove)
        {
            foreach (Regex regexPattern in regexPatternsToRemove)
            {
                sourceString = sourceString.Remove(regexPattern);
            }
            return sourceString;
        }

        // Remove multiple regex patterns with custom options
        public static string Remove(this string sourceString, IEnumerable<Regex> regexPatternsToRemove, Func<RegexOptions> options)
        {
            foreach (Regex regexPattern in regexPatternsToRemove)
            {
                sourceString = sourceString.Remove(regexPattern, options);
            }
            return sourceString;
        }
    }
}

API Usage

There are many situations where we need to sanitize a string for various reasons. Instead of writing and maintaining minimal but still duplicated code for sanitization, it would be far more efficient to simply "Remove" whatever values we don't want. The ability to provide either a single Regex or string, or a collection of Regex or string values to remove from the source string, would simplify this process. This feature would be particularly useful for quick and easy sanitization of usernames, passphrases, or any other user input.

For example, Ally Financial provides an API endpoint (https://secure.ally.com/assets/json/invalid-strings.json) that returns a list of words or phrases they do not allow in usernames or passphrases. Note: This list includes terms that are explicitly filtered for user-provided content and may contain offensive or vulgar language. This API is called after a user successfully signs in and is taken to their dashboard. The returned list is then used to validate or sanitize user-provided input (for very good reason). Instead of manually implementing and maintaining duplicate logic to filter out these words, with the inclusion of these additional overrides, they could simply pass in their list of restricted terms and proceed with their process. This approach also makes it easier to remove characters that could potentially expose the system to injection vulnerabilities.

For instance, if Ally didn’t want users to manually clean up their passphrases, they could sanitize a phrase like:

"ThisBadWordIsOtherBadWordMyLastBadWordPassword!"

They could then call:

a.Remove(["FirstBadWord", "OtherBadWord", "LastBadWord"])

Or, for case-insensitive matching:

a.Remove(["FirstBadWord", "OtherBadWord", "LastBadWord"], StringComparison.OrdinalIgnoreCase)

The result would be the sanitized password: ThisIsMyPassword!, which they could store securely. Then, during login, they could call the same function to sanitize the entered password (which may contain vulgar or restricted terms), and compare it to the sanitized stored password. This would allow the user to log in with their original password, while the system ensures it matches the sanitized version stored securely, keeping the process transparent to the customer. This approach simplifies string sanitization without duplicating logic. Note: I recognize that this example wouldn’t be a best practice, and I wouldn't recommend or use it myself. It simply serves to demonstrate one way they could utilize these new overrides.

Overall, this would be a valuable quality-of-life improvement and would better align the semantic meanings of "Replace" vs "Remove". Currently, "Replace" is used in cases where the intention is not to replace, but rather to remove content. Clarifying this distinction would make the API more intuitive and consistent with its intended behavior.

Alternative Designs

No response

Risks

Since these would be new overrides there shouldn't be any risk as the current

string.Remove(startIndex, endIndex)

would still exist and be untouched.

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-12-23T12:26:38Z

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

dotnet-policy-service · 2024-12-23T13:34:11Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

colejohnson66 · 2024-12-23T14:55:42Z

If the idea is to restrict certain words, wouldn't SearchValues<string> be a better solution? I, personally, do not like the idea of a website silently changing my input (there's horror stories of banks silently truncating passwords on the recovery form, but not the login side); If something is disallowed, just tell me.

huoyaoyuan · 2024-12-23T15:25:45Z

Regex is very complex and slow at the context of string. Moreover, it lives several layers top. Methods involving regex have to be provided by regex. You can define your own extension methods for string.

Bond-Addict · 2024-12-23T19:18:21Z

If the idea is to restrict certain words, wouldn't SearchValues<string> be a better solution? I, personally, do not like the idea of a website silently changing my input (there's horror stories of banks silently truncating passwords on the recovery form, but not the login side); If something is disallowed, just tell me.

As I specifically addressed, it's a way that I would not recommend to use and I would not personally use that way, but its just a way they could use it if that's what they wanted. There are lots of other use-cases for the overrides.

Bond-Addict · 2024-12-23T19:23:07Z

Regex is very complex and slow at the context of string. Moreover, it lives several layers top. Methods involving regex have to be provided by regex. You can define your own extension methods for string.

I realize that you can create extension methods. The provided code contain only extension methods. Those code examples are to serve as an example of how it could be accomplished as I've tried to thoroughly test each of them. I will be implementing
BenchmarkDotNet to measure the performance of the overrides later tonight.

Bond-Addict · 2024-12-26T04:34:52Z

Using BenchmarkDotNet:

Method	Mean (ns)	Error (ns)	StdDev (ns)
TestCurrentSingleStringReplacementMethod	20.10	0.377	0.334
TestCurrentMultiStringReplacementMethod	10.53	0.094	0.083
TestCurrentRegexStringReplacementMethod	226.35	0.942	0.881
TestCurrentMultiRegexStringReplacementMethod	94.24	0.413	0.386
TestNewSingleStringReplacementMethod	39.80	0.451	0.400
TestNewDelimitedStringReplacementMethod	104.07	0.863	0.808
TestNewMultiStringReplacementMethod	79.81	0.245	0.191
TestNewRegexStringReplacementMethod	233.11	0.953	0.845
TestNewMultiRegexStringReplacementMethod	300.28	1.212	1.133

It does look to add a little bit more overhead, mostly for the regex part which tbh was expected.
I would imagine that direct api implementations (would be different code than I've provided, just accomplish the same result) rather than a extension method would perform far better though.

colejohnson66 · 2024-12-26T12:35:50Z

Creating a Regex on the fly in this API will always perform worse than leveraging the built-in regex source generator.

Bond-Addict added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 23, 2024

dotnet-issue-labeler bot added the area-System.Text.RegularExpressions label Dec 23, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 23, 2024

Bond-Addict changed the title ~~[API Proposal]:~~ [API Proposal]: New overrides for string.Replace() Dec 23, 2024

Bond-Addict mentioned this issue Dec 23, 2024

Overloads for string.Remove() dotnet/aspnetcore#59607

Closed

1 task

martincostello added area-System.Runtime and removed area-System.Text.RegularExpressions labels Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: New overrides for string.Replace() #110912

[API Proposal]: New overrides for string.Replace() #110912

Bond-Addict commented Dec 23, 2024

dotnet-policy-service bot commented Dec 23, 2024

dotnet-policy-service bot commented Dec 23, 2024

colejohnson66 commented Dec 23, 2024

huoyaoyuan commented Dec 23, 2024

Bond-Addict commented Dec 23, 2024 •

edited

Loading

Bond-Addict commented Dec 23, 2024

Bond-Addict commented Dec 26, 2024 •

edited

Loading

colejohnson66 commented Dec 26, 2024

[API Proposal]: New overrides for string.Replace() #110912

[API Proposal]: New overrides for string.Replace() #110912

Comments

Bond-Addict commented Dec 23, 2024

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

dotnet-policy-service bot commented Dec 23, 2024

dotnet-policy-service bot commented Dec 23, 2024

colejohnson66 commented Dec 23, 2024

huoyaoyuan commented Dec 23, 2024

Bond-Addict commented Dec 23, 2024 • edited Loading

Bond-Addict commented Dec 23, 2024

Bond-Addict commented Dec 26, 2024 • edited Loading

colejohnson66 commented Dec 26, 2024

Bond-Addict commented Dec 23, 2024 •

edited

Loading

Bond-Addict commented Dec 26, 2024 •

edited

Loading