|
6 | 6 | use WP_HTML_Text_Replacement; |
7 | 7 |
|
8 | 8 | /** |
9 | | - * Finds string fragments that look like URLs and allow replacing them. |
10 | | - * This is the first, "thick" sieve that yields "URL candidates" that must be |
11 | | - * validated with a WHATWG-compliant parser. Some of the candidates will be |
12 | | - * false positives. |
| 9 | + * Finds string fragments that look like URLs and allows replacing them. |
13 | 10 | * |
14 | | - * This is a "thick sieve" that matches too much instead of too little. It |
15 | | - * will yield false positives, but will not miss a URL |
| 11 | + * This class implements two stages of detection: |
16 | 12 | * |
17 | | - * Looks for URLs: |
| 13 | + * 1. **A "thick" sieve** |
| 14 | + * 2. **A "fine" sieve** |
18 | 15 | * |
19 | | - * * Starting with http:// or https:// |
20 | | - * * Starting with // |
21 | | - * * Domain-only, e.g. www.example.com |
22 | | - * * Domain + path, e.g. www.example.com/path |
| 16 | + * The thick sieve uses a regular expression to match URL-like substrings. It matches too |
| 17 | + * much and may yield false positives. |
| 18 | + * |
| 19 | + * The fine sieve filters out invalid candidates using a WHATWG-compliant parser so only |
| 20 | + * real URLs are returned. |
| 21 | + * |
| 22 | + * ## URL Detection |
| 23 | + * |
| 24 | + * The thick sieve looks for URLs: |
| 25 | + * |
| 26 | + * * Starting with http://, https://, or //, e.g. //wp.org. |
| 27 | + * * With no protocol, e.g. www.wp.org or wp.org/path |
| 28 | + * |
| 29 | + * Here's a list of matching-related rules, limitations, and assumptions: |
23 | 30 | * |
24 | 31 | * ### Protocols |
25 | 32 | * |
26 | | - * As a migration-oriented tool, this processor will only consider http and https protocols. |
| 33 | + * As a site migration tool, this processor only considers URLs with HTTP |
| 34 | + * and HTTPS protocols. |
27 | 35 | * |
28 | 36 | * ### Domain names |
29 | 37 | * |
|
0 commit comments