Skip to content

Commit a73295d

Browse files
Josh-Cenahamishwilleebakkot
authored
Reference for stage 3 regex-escaping (#36928)
* Reference for stage 3 regex-escaping * Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md * Apply suggestions from code review Co-authored-by: Hamish Willee <[email protected]> * Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md * Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Co-authored-by: Kevin Gibbons <[email protected]> * Update index.md * Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md * Fix wording * Update files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md Co-authored-by: Hamish Willee <[email protected]> * Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Co-authored-by: Hamish Willee <[email protected]> * Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Co-authored-by: Kevin Gibbons <[email protected]> --------- Co-authored-by: Hamish Willee <[email protected]> Co-authored-by: Kevin Gibbons <[email protected]>
1 parent e454e36 commit a73295d

File tree

5 files changed

+116
-14
lines changed

5 files changed

+116
-14
lines changed

files/en-us/web/javascript/guide/regular_expressions/index.md

+1-12
Original file line numberDiff line numberDiff line change
@@ -157,18 +157,7 @@ For instance, to match the string "C:\\" where "C" can be any letter, you'd use
157157
If using the `RegExp` constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level.
158158
`/a\*b/` and `new RegExp("a\\*b")` create the same expression, which searches for "a" followed by a literal "\*" followed by "b".
159159

160-
If escape strings are not already part of your pattern you can add them using {{jsxref("String.prototype.replace()")}}:
161-
162-
```js
163-
function escapeRegExp(string) {
164-
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
165-
}
166-
```
167-
168-
The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches.
169-
It is explained in detail below in [Advanced Searching With Flags](#advanced_searching_with_flags).
170-
171-
_Why isn't this built into JavaScript?_ There is a [proposal](https://github.com/tc39/proposal-regex-escaping) to add such a function to RegExp.
160+
The {{jsxref("RegExp.escape()")}} function returns a new string where all special characters in regex syntax are escaped. This allows you to do `new RegExp(RegExp.escape("a*b"))` to create a regular expression that matches only the string `"a*b"`.
172161

173162
### Using parentheses
174163

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
title: RegExp.escape()
3+
slug: Web/JavaScript/Reference/Global_Objects/RegExp/escape
4+
page-type: javascript-static-method
5+
browser-compat: javascript.builtins.RegExp.escape
6+
---
7+
8+
{{JSRef}}
9+
10+
The **`RegExp.escape()`** static method [escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor.
11+
12+
When dynamically creating a {{jsxref("RegExp")}} with user-provided content, consider using this function to sanitize the input (unless the input is actually intended to contain regex syntax). In addition, don't try to re-implement its functionality by, for example, using {{jsxref("String.prototype.replaceAll()")}} to insert a `\` before all syntax characters. `RegExp.escape()` is designed to use escape sequences that work in many more edge cases/contexts than hand-crafted code is likely to achieve.
13+
14+
## Syntax
15+
16+
```js-nolint
17+
RegExp.escape(string)
18+
```
19+
20+
### Parameters
21+
22+
- `string`
23+
- : The string to escape.
24+
25+
### Return value
26+
27+
A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced:
28+
29+
- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\1`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence.
30+
- Regex [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character#description), including `^`, `$`, `\`, `.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, and `|`, as well as the `/` delimiter, are escaped by inserting a `\` character before them. For example, `RegExp.escape("foo.bar")` returns `"\\x66oo\\.bar"`, and `RegExp.escape("(foo)")` returns `"\\(foo\\)"`.
31+
- Other punctuators, including `,`, `-`, `=`, `<`, `>`, `#`, `&`, `!`, `%`, `:`, `;`, `@`, `~`, `'`, `` ` ``, and `"`, are escaped using the `\x` syntax. For example, `RegExp.escape("foo-bar")` returns `"\\x66oo\\x2dbar"`. These characters cannot be escaped by prefixing with `\` because, for example, `/foo\-bar/u` is a syntax error.
32+
- The characters with their own [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) sequences: `\f` (U+000C FORM FEED), `\n` (U+000A LINE FEED), `\r` (U+000D CARRIAGE RETURN), `\t` (U+0009 CHARACTER TABULATION), and `\v` (U+000B LINE TABULATION), are replaced with their escape sequences. For example, `RegExp.escape("foo\nbar")` returns `"\\x66oo\\nbar"`.
33+
- The space character is escaped as `"\\x20"`.
34+
- Other non-ASCII [line break and white space characters](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#white_space) are replaced with one or two `\uXXXX` escape sequences representing their UTF-16 code units. For example, `RegExp.escape("foo\u2028bar")` returns `"\\x66oo\\u2028bar"`.
35+
- [Lone surrogates](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters) are replaced with their `\uXXXX` escape sequences. For example, `RegExp.escape("foo\uD800bar")` returns `"\\x66oo\\ud800bar"`.
36+
37+
### Exceptions
38+
39+
- {{jsxref("TypeError")}}
40+
- : Thrown if `string` is not a string.
41+
42+
## Examples
43+
44+
### Using RegExp.escape()
45+
46+
The following examples demonstrate various inputs and outputs for the `RegExp.escape()` method.
47+
48+
```js
49+
RegExp.escape("Buy it. use it. break it. fix it.");
50+
// "\\x42uy\\x20it\\.\\x20use\\x20it\\.\\x20break\\x20it\\.\\x20fix\\x20it\\."
51+
RegExp.escape("foo.bar"); // "\\x66oo\\.bar"
52+
RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar"
53+
RegExp.escape("foo\nbar"); // "\\x66oo\\nbar"
54+
RegExp.escape("foo\uD800bar"); // "\\x66oo\\ud800bar"
55+
RegExp.escape("foo\u2028bar"); // "\\x66oo\\u2028bar"
56+
```
57+
58+
### Using RegExp.escape() with the RegExp constructor
59+
60+
The primary use case of `RegExp.escape()` is when you want to embed a string into a bigger regex pattern, and you want to ensure that the string is treated as a literal pattern, not as a regex syntax. Consider the following naïve example that replaces URLs:
61+
62+
```js
63+
function removeDomain(text, domain) {
64+
return text.replace(new RegExp(`https?://${domain}(?=/)`, "g"), "");
65+
}
66+
67+
const input =
68+
"Consider using [RegExp.escape()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string.";
69+
const domain = "developer.mozilla.org";
70+
console.log(removeDomain(input, domain));
71+
// Consider using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string.
72+
```
73+
74+
Inserting the `domain` above results in the regular expression literal `https?://developer.mozilla.org(?=/)`, where the "." character is a regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means the string will match the string with any character in place of the ".", such as `developer-mozilla-org`. Therefore, it would incorrectly also change the following text:
75+
76+
```js
77+
const input =
78+
"This is not an MDN link: https://developer-mozilla.org/, be careful!";
79+
const domain = "developer.mozilla.org";
80+
console.log(removeDomain(input, domain));
81+
// This is not an MDN link: /, be careful!
82+
```
83+
84+
To fix this, we can use `RegExp.escape()` to ensure that any user input is treated as a literal pattern:
85+
86+
```js
87+
function removeDomain(text, domain) {
88+
return text.replace(
89+
new RegExp(`https?://${RegExp.escape(domain)}(?=/)`, "g"),
90+
"",
91+
);
92+
}
93+
```
94+
95+
Now this function will do exactly what we intend to, and will not transform `developer-mozilla.org` URLs.
96+
97+
## Specifications
98+
99+
{{Specifications}}
100+
101+
## Browser compatibility
102+
103+
{{Compat}}
104+
105+
## See also
106+
107+
- [Polyfill of `RegExp.escape` in `core-js`](https://github.com/zloirock/core-js#regexp-escaping)
108+
- {{jsxref("RegExp")}}

files/en-us/web/javascript/reference/global_objects/regexp/index.md

+5
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,11 @@ Note that several of the `RegExp` properties have both long and short (Perl-like
115115
- [`RegExp[Symbol.species]`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/Symbol.species)
116116
- : The constructor function that is used to create derived objects.
117117

118+
## Static methods
119+
120+
- {{jsxref("RegExp.escape()")}}
121+
- : [Escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor.
122+
118123
## Instance properties
119124

120125
These properties are defined on `RegExp.prototype` and shared by all `RegExp` instances.

files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ A new string, with all matches of a pattern replaced by a replacement.
4141

4242
This method does not mutate the string value it's called on. It returns a new string.
4343

44-
Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) constructor without escaping special characters may unintentionally change its semantics.
44+
Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method replaces all occurrences of a string, not just the first one. While it is also possible to use `replace()` with a global regex dynamically constructed with [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) to replace all instances of a string, this can have unintended consequences if the string contains special characters that have meaning in regular expressions (which might happen if the replacement string comes from user input). While you can mitigate this case using {{jsxref("RegExp.escape()")}} to make the regular expression string into a literal pattern, it is better to just use `replaceAll()` and pass the string without converting it to a regex.
4545

4646
```js
4747
function unsafeRedactName(text, name) {

files/en-us/web/javascript/reference/regular_expressions/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ _Escape sequences_ in regexes refer to any kind of syntax formed by `\` followed
147147
[VCC]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class#v-mode_character_class
148148
[WBA]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion
149149

150-
`\` followed by any other digit character becomes a [legacy octal escape sequence](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences), which is forbidden in [Unicode-aware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode).
150+
`\` followed by `0` and another digit becomes a [legacy octal escape sequence](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences), which is forbidden in [Unicode-aware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode). `\` followed by any other digit sequence becomes a [backreference](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Backreference).
151151

152152
In addition, `\` can be followed by some non-letter-or-digit characters, in which case the escape sequence is always a [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) representing the escaped character itself:
153153

0 commit comments

Comments
 (0)