Skip to content

Commit e434a3a

Browse files
authored
Use wp_is_valid_utf8() and wp_scrub_utf8() from the new utf8.php decoder (#200)
Replaces two instances of the old UTF-8 decoding utilities with the new utf-8.php toolkit by @dmsnell: * https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/compat-utf8.php * https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/utf8.php This PR only touches two tactical usages of the old tools: * Blueprint validation now uses `wp_is_valid_utf8` * CSSProcessor now uses `wp_scrub_utf8` instead of `_wp_scrub_utf8_fallback` More refactoring is coming once there's a faster alternative to `_wp_scan_utf8`, see https://core.trac.wordpress.org/ticket/63863#comment:51 Related to #196. Follows up on #199 and #197. ## Testing instructions If the CI passes, we're good. Unicode-related scenarios are covered by tests.
1 parent 31317cc commit e434a3a

File tree

8 files changed

+1049
-1003
lines changed

8 files changed

+1049
-1003
lines changed

components/Blueprints/class-runner.php

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
use WordPress\HttpClient\Client;
5656
use WordPress\Zip\ZipFilesystem;
5757

58-
use function WordPress\Encoding\_wp_has_noncharacters_fallback;
58+
use function WordPress\Encoding\wp_is_valid_utf8;
5959
use function WordPress\Filesystem\wp_unix_sys_get_temp_dir;
6060
use function WordPress\Zip\is_zip_file_stream;
6161

@@ -379,14 +379,7 @@ private function load_blueprint() {
379379
// Validate the Blueprint string we've just loaded.
380380

381381
// **UTF-8 Encoding:** Assert the Blueprint input is UTF-8 encoded.
382-
$is_valid_utf8 = false;
383-
if ( function_exists( 'mb_check_encoding' ) ) {
384-
$is_valid_utf8 = mb_check_encoding( $blueprint_string, 'UTF-8' );
385-
} else {
386-
$is_valid_utf8 = ! _wp_has_noncharacters_fallback( $blueprint_string );
387-
}
388-
389-
if ( ! $is_valid_utf8 ) {
382+
if ( ! wp_is_valid_utf8( $blueprint_string ) ) {
390383
throw new BlueprintExecutionException( 'Blueprint must be encoded as UTF-8.' );
391384
}
392385

components/DataLiberation/URL/class-cssprocessor.php

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
namespace WordPress\DataLiberation\URL;
44

5-
use function WordPress\Encoding\_wp_scan_utf8;
6-
use function WordPress\Encoding\_wp_scrub_utf8_fallback;
75
use function WordPress\Encoding\utf8_codepoint_at;
86
use function WordPress\Encoding\codepoint_to_utf8_bytes;
7+
use function WordPress\Encoding\compat\_wp_scan_utf8;
8+
use function WordPress\Encoding\wp_scrub_utf8;
99

1010
/**
1111
* Tokenizes CSS according to the CSS Syntax Level 3 specification.
@@ -1528,7 +1528,7 @@ private function consume_ident_start_codepoint( $at ): int {
15281528
*/
15291529
private function decode_string_or_url( int $start, int $length ): string {
15301530
// Fast path: check if any processing is needed.
1531-
$slice = _wp_scrub_utf8_fallback( substr( $this->css, $start, $length ) );
1531+
$slice = wp_scrub_utf8( substr( $this->css, $start, $length ) );
15321532
$special_chars = "\\\r\f\x00";
15331533
if ( false === strpbrk( $slice, $special_chars ) ) {
15341534
// No special chars - return raw substring (almost zero allocations).

0 commit comments

Comments
 (0)