Migrate to ICU4X #436

conor-93 · 2025-10-20T03:03:25Z

Migration of text analysis from Swash → ICU4X

Overview

ICU4X enables text analysis and internationalisation. For Parley, this includes locale and language recognition,
bidirectional text evaluation, text segmentation, emoji recognition, NFC/NFD normalisation and other Unicode character information.

ICU4X is developed and maintained by a trusted authority in the space of text internationalisation: the ICU4X Technical Committee (ICU4X-TC) in the Unicode Consortium. It is targeted at resource-constrained environments. For Parley, this means:

The potential for full locale support for complex line breaking cases (not supported by Swash).
Reliable and up-to-date Unicode data.
Reasonable performance and memory footprint (with the possibility of future improvements).
Full decoupling from Swash (following decoupling for shaping behaviour earlier this year); a significant offloading of maintenance effort.

Notable changes

Removal of first-party bidi embed level resolution logic.
select_font emoji detection improvements (Flag emoji "🇺🇸", Keycap sequences (e.g. 0️⃣ through 9️⃣) now supported in cluster detection, Swash did not support these).
Slightly more up-to-date Unicode data than Swash (e.g. a few more Scripts).

Performance/binary size

Binary size for vello_editor is ~22kB larger (9642kB vs 9620kB).
- A little unfortunate, given Swash stores more Unicode character information, ICU4X lets us bake in only the tables we need.
Performance sees quite a significant regression unfortunately (though I might have missed some performance wins that could mitigate this🤞). The difference is ~+55% latency for total layout runs, more for Latin, less for other languages, though this could just be due to a higher overhead cost impacting lower-latency tests proportionally more). To put this into perspective, this regresses back to roughly where Parley landed after shaping artifact caching, and prior to charmap caching.

Default Style - arabic 20 characters               [   9.9 us ...  14.5 us ]     +46.34%*
Default Style - latin 20 characters                [   4.3 us ...   8.1 us ]     +86.25%*
Default Style - japanese 20 characters             [   8.3 us ...  13.6 us ]     +63.34%*
Default Style - arabic 1 paragraph                 [  56.3 us ...  75.0 us ]     +33.20%*
Default Style - latin 1 paragraph                  [  18.0 us ...  33.4 us ]     +85.40%*
Default Style - japanese 1 paragraph               [  72.3 us ... 113.9 us ]     +57.51%*
Default Style - arabic 4 paragraph                 [ 240.7 us ... 310.1 us ]     +28.81%*
Default Style - latin 4 paragraph                  [  79.4 us ... 137.7 us ]     +73.42%*
Default Style - japanese 4 paragraph               [ 101.7 us ... 160.1 us ]     +57.38%*
Styled - arabic 20 characters                      [  10.8 us ...  15.0 us ]     +38.50%*
Styled - latin 20 characters                       [   5.5 us ...   9.3 us ]     +69.94%*
Styled - japanese 20 characters                    [   9.0 us ...  14.0 us ]     +56.46%*
Styled - arabic 1 paragraph                        [  58.0 us ...  76.6 us ]     +32.19%*
Styled - latin 1 paragraph                         [  21.4 us ...  37.2 us ]     +73.48%*
Styled - japanese 1 paragraph                      [  79.3 us ... 121.4 us ]     +53.20%*
Styled - arabic 4 paragraph                        [ 257.0 us ... 326.8 us ]     +27.18%*
Styled - latin 4 paragraph                         [  83.3 us ... 142.2 us ]     +70.71%*
Styled - japanese 4 paragraph                      [ 111.8 us ... 172.1 us ]     +53.97%*

Other details

Swash's Language parsing is more tolerant, e.g. it permits extra, invalid subtags (like in "en-Latn-US-a-b-c-d").
Segmenters (line, word, grapheme) are currently content-aware, and can be used without specifying a locale. However, if we plug locale data in at runtime, we can construct segmenters to target a specific locale, rather than inferring from content (which would be the most correct approach for targeting said locale).
- The full set of locale data (even with ICU4X's deduplication) is heavy, totalling ~2.5MB (in vello_editor compilation testing). In order to potentially support correct word breaking across all languages, without seeing a huge compilation size increase, we would need a way for users to attach only the locale data they need at runtime. This locale data could be generated (with icu4x-datagen) and attached (using DataProviders) at runtime in the future.
- Without full locale support, line and word breaking use Unicode rule-based approaches UAX #14 and #29 respectively (at parity with Swash).
We could also support bring-your-own-data for Unicode character information too, for users only interested in narrow character sets (e.g. basic Latin), for a small compilation size improvement (not sure how much exactly).
Swash's support for alternating word break strength is maintained, by breaking text into windows (which look back/forward an extra character for context) and performaing segmentation on each window separarely, as ICU4X doesn't natively support variable word break strength when segmenting.

… tests

- condense all byte indexes to char indexes in a single loop - track a minimal set of LineSegmenters (per LineBreakWordOption), and create as needed

- clean up tests - add tests for multi-character graphemes

- group all word boundary logic together

…start` - doc

- fix incorrect start truncation for multi-style strings which arent multi-wb style + test for this - test naming/grouping

…of allocating

…mentation

- compute `force_normalize`

- simplify ClusterInfo to just `is_emoji` - more clean-up

- remove unused conversion methods

# Conflicts: # Cargo.lock # parley/src/context.rs # parley/src/layout/data.rs # parley/src/shape/mod.rs # parley/src/swash_convert.rs

- remove LayoutContext.has_bidi

nicoburns

This isn't a full review (and doesn't really review the analysis logic at all (which I'm probably not qualified to do)), but I did a first pass on "rust-level things":

Some general notes:

This currently increases the binary size of a release build of the vello_editor example by 2-3mb. That seems very unfortunate, but it also looks like there are few easy wins to reduce this, and we should probably re-evaluate the impact once those are fixed.
I think depending on the top-level icu crate is wrong. We should be depending on sub-crates like icu_segmenter, icu_normalizer, etc directly.
We should also be disabling default features of those crates where appropriate. Currently I'm pretty sure we're compiling in the default data providers (pulled in by default features) AND our custom subset (generated by the build.rs)
We should check how much we're saving by using the custom baked data, because any application which depends on icu4x for other purposes may well end up pulling in the default data sets anyway, which would also lead to duplicated data. May or may not be worth the saving, and we may also wish to offer both options.
We should fully eliminate the Swash dependency (only a couple of trivial uses remaining).
Benchmarks are currently broken due to LayoutContext no longer being Sync. We should work out whether we want to go down the route of making LayoutContexts always thread-local or whether we want find a way to make it Sync again.

nicoburns · 2025-10-20T09:59:50Z

parley/Cargo.toml

 accesskit = { workspace = true, optional = true }
 hashbrown = { workspace = true }
 harfrust = { workspace = true }
+icu = { version = "2.0.0"}


Depending on the icu crate with default features enabled doesn't seem right to me. That's going to pull in a lot of stuff which we surely aren't using?

nicoburns · 2025-10-20T10:04:41Z

parley/src/icu_convert.rs

+}
+
+#[rustfmt::skip]
+const FONTIQUE_SCRIPT_TAGS: [[u8; 4]; 193] = [


I believe this can be replaced with short_name method on icu's Script type. But we should check that's it's completely equivalent. There may a couple special cases.

nicoburns · 2025-10-20T10:12:23Z

parley/src/shape/mod.rs

        loop {
-            if !parser.next(&mut cluster) {
-                // End of current item - process final segment
-                break;
-            }
+            cluster = match clusters_iter.next() {
+                Some(c) => c,
+                None => break, // End of current item - process final segment
+            };


This looks like it could become a while let loop

nicoburns · 2025-10-20T10:16:58Z

parley/src/shape/mod.rs

+    if has_emoji && has_zwj {
+    }


This if was probably meant to be removed?

nicoburns · 2025-10-20T10:20:15Z

parley/src/lib.rs

 extern crate alloc;

 pub use fontique;
 pub use swash;


This should go if we're not using swash anymore

There seems to be one more use of swash for the swash::Setting type in parley/src/style/font.rs. I would suggest replacing this with a custom struct in Parley. This type does also exist in font-types, but I think we want to avoid putting font-types in the public API at least until it hits 1.0.

nicoburns · 2025-10-20T10:27:09Z

parley/src/analysis/mod.rs

+        });
+}
+
+#[cfg(test)]


Somewhat relieved to see how much of this file is tests! I was dreading reviewing 1000 lines. We may wish to consider moving these to a separate files in the tests directory, although that's subjective and others (including you) may have other opinions (I personally find it harder to navigate code files with a mix of lots of tests and lots of code).

nicoburns · 2025-10-20T10:31:01Z

parley/src/analysis/cluster.rs

+
+#[derive(Debug)]
+pub(crate) struct CharCluster {
+    pub chars: Vec<Char>,


We should back this with benchmark/profile data, but we should consider using either an ArrayVec with capacity of MAX_CLUSTER_SIZE or a SmallVec with a capacity tuned to "big enough for most clusters" here.

(Or the same array + len setup used for Form)

Simply using a SmallVec of capacity 1 saw these results - seems like a great suggestion. I didn't play around too much with different sizes cc @conor-93

Woah - there's even greater improvement if we change Form's chars type to be a smallvec of 1 capacity

I believe we can be have a capacity of 4 without using any more memory (although there are other similar crates that could get us a size reduction for smaller capacities).

parley/src/analysis/mod.rs

- unused CharCluster.style_index

taj-p

I only managed to review cluster.rs today. I'll continue with more tomorrow.

As I review, I'm experimenting and learning with this branch (which has a lot of the changes suggested).

This is where that branch is compared to this branch currently. Huge props to @nicoburns for that suggestion re using SmallVec. I think using SmallVec for Form had a huge effect with everything else being mostly marginal gains.

taj-p · 2025-10-21T23:37:43Z

parley/Cargo.toml

+icu_locale = { workspace = true }
+icu_normalizer = { workspace = true }
+icu_properties = { workspace = true, features = ["unicode_bidi"] }
+icu_provider = { workspace = true }


Suggested change

icu_provider = { workspace = true }

Does this need to be a dependency (both here and by the workspace)?

It's only usages are use icu_provider::prelude::icu_locale_core::LanguageIdentifier; but they can be replaced with use icu_locale::LanguageIdentifier;

icu_locale_core::LanguageIdentifier would be even better

parley/Cargo.toml

taj-p · 2025-10-21T23:57:38Z

Cargo.toml

Taj's ongoing performance investigation...

All profiles from rendering a lot of latin (only) text.

Analyse Text

Appears 2.5x as slow as Swash

Swash ICU

Shaping

The other standout in the perf profiles is iterating over the clusters in shape/mod.rs

taj-p · 2025-10-21T23:59:16Z

parley/src/analysis/provider.rs

+pub struct BakedProvider;
+impl_data_provider!(BakedProvider);
+
+pub(crate) static PROVIDER: BakedProvider = BakedProvider;


So is this the provider that we would need pass from the consumer if we didn't want to bake the data?

taj-p · 2025-10-22T00:12:30Z

parley/src/analysis/mod.rs

+
+pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
+    // See: https://github.com/unicode-org/icu4x/blob/ee5399a77a6b94efb5d4b60678bb458c5eedb25d/components/segmenter/src/line.rs#L338-L351
+    fn is_mandatory_line_break(line_break: LineBreak) -> bool {


Nit: Could you please define this closer to where it's used?

taj-p · 2025-10-22T02:38:24Z

parley/src/analysis/cluster.rs

+
+#[derive(Debug)]
+pub(crate) struct CharCluster {
+    pub chars: Vec<Char>,


Simply using a SmallVec of capacity 1 saw these results - seems like a great suggestion. I didn't play around too much with different sizes cc @conor-93

taj-p · 2025-10-22T03:00:46Z

parley/src/analysis/cluster.rs

+
+#[derive(Debug)]
+pub(crate) struct CharCluster {
+    pub chars: Vec<Char>,


Woah - there's even greater improvement if we change Form's chars type to be a smallvec of 1 capacity

taj-p · 2025-10-22T06:00:49Z

parley/src/analysis/mod.rs

+            continue;
+        }
+        all_boundaries_byte_indexed[wb] = Boundary::Word;
+    }


We may not need to pay the cost of the all_boundaries_byte_indexed allocation by making this loop an iterator and instead push line boundary positions to a vector. There's probably a way to make line boundary positions an iterator too in order to avoid the vec we push to, but I think it's fine to leave that as a TODO

diff --git a/parley/src/analysis/mod.rs b/parley/src/analysis/mod.rs index f903b20..6ddbee2 100644 --- a/parley/src/analysis/mod.rs +++ b/parley/src/analysis/mod.rs @@ -273,16 +273,15 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext, text: &str) { let mut line_segmenters = core::mem::take(&mut lcx.analysis_data_sources.line_segmenters); - let mut all_boundaries_byte_indexed = vec![Boundary::None; text.len()]; - - // Word boundaries: - for wb in lcx.analysis_data_sources.word_segmenter().segment_str(text) { + // Collect boundary byte positions compactly + let mut wb_iter = lcx.analysis_data_sources.word_segmenter().segment_str(text).filter_map(|wb| { // icu produces a word boundary trailing the string, which we don't use. if wb == text.len() { - continue; + None + } else { + Some(wb) } - all_boundaries_byte_indexed[wb] = Boundary::Word; - } + }).peekable(); // Line boundaries (word break naming refers to the line boundary determination config). // @@ -298,27 +297,40 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext, text: &str) { &first_style ); let mut global_offset = 0; + let mut line_boundary_positions: Vec<usize> = Vec::new(); + // LINE BOUNDARIES COLLECTION for (substring_index, (substring, word_break_strength, last)) in contiguous_word_break_substrings.enumerate() { - let line_boundaries: Vec<usize> = lcx.analysis_data_sources - .line_segmenter(word_break_strength) - .segment_str(substring) - .collect(); // Fast path for text with a single word-break option. if substring_index == 0 && last { - // icu adds leading and trailing line boundaries, which we don't use. - let Some((_first, rest)) = line_boundaries.split_first() else { + let mut lb_iter = line_segmenters.get(word_break_strength).segment_str(substring); + + let _first = lb_iter.next(); + let second = lb_iter.next(); + + if second.is_none() { continue; - }; - let Some((_last, middle)) = rest.split_last() else { + } + + let third = lb_iter.next(); + + if third.is_none() { continue; - }; - for &b in middle { - all_boundaries_byte_indexed[b] = Boundary::Line; + } + + let iter = [second.unwrap(), third.unwrap()].into_iter().chain(lb_iter); + line_boundary_positions.extend(iter); + line_boundary_positions.pop(); - } break; } + let line_boundaries_iter = line_segmenters.get(word_break_strength).segment_str(substring); + let mut substring_chars = substring.chars(); if substring_index != 0 { global_offset -= substring_chars.next().unwrap().len_utf8(); @@ -328,9 +340,9 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext, text: &str) { let last_len = substring_chars.next_back().unwrap().len_utf8(); // Mark line boundaries (overriding word boundaries where present). - for (index, &pos) in line_boundaries.iter().enumerate() { + for (index, pos) in line_boundaries_iter.enumerate() { // icu adds leading and trailing line boundaries, which we don't use. - if index == 0 || index == line_boundaries.len() - 1 { + if index == 0 || pos == substring.len() { continue; } @@ -340,7 +352,7 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext, text: &str) { if !last && pos == substring.len() - last_len { continue; } - all_boundaries_byte_indexed[pos + global_offset] = Boundary::Line; + line_boundary_positions.push(pos + global_offset); } if !last { @@ -351,11 +363,35 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext, text: &str) { // BiDi embedding levels: let bidi_embedding_levels = unicode_bidi::BidiInfo::new_with_data_source(&lcx.analysis_data_sources.bidi_class(), text, None).levels; + // Merge boundaries - line takes precedence over word + let mut lb_iter = line_boundary_positions.iter().peekable(); let boundaries_and_levels_iter = text.char_indices() - .map(|(byte_pos, _)| ( - all_boundaries_byte_indexed.get(byte_pos).unwrap(), - bidi_embedding_levels.get(byte_pos).unwrap() - )); + .map(|(byte_pos, _)| { + // advance any stale word boundary positions + while let Some(&w) = wb_iter.peek() { + if w < byte_pos { _ = wb_iter.next(); } else { break; } + } + // advance any stale line boundary positions + while let Some(&l) = lb_iter.peek() { + if *l < byte_pos { _ = lb_iter.next(); } else { break; } + } + + let mut boundary = Boundary::None; + if let Some(&w) = wb_iter.peek() { + if w == byte_pos { + boundary = Boundary::Word; + _ = wb_iter.next(); + } + } + if let Some(&l) = lb_iter.peek() { + if *l == byte_pos { + boundary = Boundary::Line; + _ = lb_iter.next(); + } + } + + (boundary, bidi_embedding_levels.get(byte_pos).unwrap()) + }); fn unicode_data_iterator<'a, T: TrieValue>( text: &'a str,

taj-p · 2025-10-22T06:01:54Z

parley/src/analysis/mod.rs

+        let line_boundaries: Vec<usize> = lcx.analysis_data_sources
+            .line_segmenter(word_break_strength)
+            .segment_str(substring)
+            .collect();


We can probably skip this allocation entirely using something like:

for (substring_index, (substring, word_break_strength, last)) in contiguous_word_break_substrings.enumerate() { // Fast path for text with a single word-break option. if substring_index == 0 && last { let mut lb_iter = line_segmenters.get(word_break_strength).segment_str(substring); // CAUTION: bad names, draft code ahead let _first = lb_iter.next(); let second = lb_iter.next(); if second.is_none() { continue; } let third = lb_iter.next(); if third.is_none() { continue; } let iter = [second.unwrap(), third.unwrap()].into_iter().chain(lb_iter); for b in iter { if b == substring.len() { continue; } line_boundary_positions.push(b); } break; } let line_boundaries_iter = line_segmenters.get(word_break_strength).segment_str(substring); let mut substring_chars = substring.chars(); if substring_index != 0 { global_offset -= substring_chars.next().unwrap().len_utf8(); } // There will always be at least two characters if we are not taking the fast path for // a single word break style substring. let last_len = substring_chars.next_back().unwrap().len_utf8(); // Mark line boundaries (overriding word boundaries where present). for (index, pos) in line_boundaries_iter.enumerate() { // icu adds leading and trailing line boundaries, which we don't use. if index == 0 || pos == substring.len() { continue; } // For all but the last substring, we ignore line boundaries caused by the last // character, as this character is carried back from the next substring, and will be // accounted for there. if !last && pos == substring.len() - last_len { continue; } line_boundary_positions.push(pos + global_offset); } if !last { global_offset += substring.len() - last_len; } }

taj-p · 2025-10-22T06:10:46Z

parley/src/analysis/mod.rs

+    let Some((first_style, rest)) = lcx.styles.split_first() else {
+        panic!("No style info");
+    };
+    let contiguous_word_break_substrings = WordBreakSegmentIter::new(


If I understand WordBreakSegmentIter correctly, then we still iterate through the string even when there's no change in word break style. Is there a way to avoid that by doing an iteration through styles first before proceeding with a per character iteration?

if lcx.styles.iter().any(|s| s.style.word_break != LineBreakWordOption::Normal) { // Evaluate subranges } else { // Fast path }

Or simply update the iterator to first check whether it needs to yield subranges (by performing this check internally)

One thing that may be worth looking at is the difference between internal and external iteration (it's possible that Iterator::fold (or Iterator::try_fold) might be faster than Iterator::next() / Iterator::for_each())

taj-p · 2025-10-23T20:08:03Z

parley/src/analysis/cluster.rs

+    /// Whether the character
+    pub is_control_character: bool,
+    /// True if the character should be considered when mapping glyphs.
+    pub contributes_to_shaping: bool,


Let's use bit flags for these - either using bitflags or manually

taj-p · 2025-10-23T20:10:37Z

parley/src/analysis/provider.rs

I spent some time to create a custom provider in build.rs from existing ICU data sources that allows for 1 lookup on the character (instead of 1 lookup per property per character) and saw very good performance improvements.

As per our conversation yesterday, I'll finish this review and prepare that work to be incorporated here

taj-p · 2025-10-23T20:56:53Z

parley/src/analysis/cluster.rs

+                self.decomp.state = FormState::Invalid;
+
+                // Create a string from the original characters to normalize
+                let mut orig_str = String::with_capacity(self.len as usize * 4);


Instead of allocating these strings in composed and decomposed, are we able to instead please pass a scratch_string or similar to this and composed so that we can reuse 1 allocation?

Something like 352225f?

taj-p · 2025-10-23T21:43:19Z

parley/src/analysis/mod.rs

+    }
+
+    // BiDi embedding levels:
+    let bidi_embedding_levels = unicode_bidi::BidiInfo::new_with_data_source(&lcx.analysis_data_sources.bidi_class(), text, None).levels;


Is it true that we might only need to run this iff one of the characters is a from a BidiClass that requires resolution? If so, when we have the composite property trie, we could add a fast path for this.

taj-p · 2025-10-23T21:45:58Z

parley/src/shape/mod.rs

    }
 }

+fn is_emoji_grapheme(analysis_data_sources: &AnalysisDataSources, grapheme: &str) -> bool {


This function shows up significantly in profiles, but I'm wondering whether we can address that with the composite property provider. I.e., we store whether some character is an emoji and fast path out of this with something like:

let mut is_emoji_or_pictograph = false; let chars = segment_text.char_indices().zip(item_infos_iter.by_ref()).map(|((_, ch), (info, style_index))| { // ... is_emoji_or_pictograph |= info.is_emoji_or_pictograph; // ... let cluster = CharCluster::new( // ... is_emoji_or_pictograph || if (segment_text.len() > 1) {is_emoji_grapheme(analysis_data_sources, segment_text) } else { false }, // ... );

where is_emoji_or_pictograph is a new field on item infos

taj-p · 2025-10-23T21:48:15Z

parley/src/shape/mod.rs

+                  let cluster = CharCluster::new(
+                      chars,
+                      is_emoji_grapheme(analysis_data_sources, segment_text),
+                      len,


Do we need len if it can be obtained by chars.len?

taj-p · 2025-10-23T22:38:09Z

parley/src/analysis/mod.rs

+    }
+
+    // BiDi embedding levels:
+    let bidi_embedding_levels = unicode_bidi::BidiInfo::new_with_data_source(&lcx.analysis_data_sources.bidi_class(), text, None).levels;


This bidi impl doesn't enable passing in allocations (it allocates within its impl). I created an issue in the upstream repo asking whether it's possible to pass allocations in: servo/unicode-bidi#146

conor-93 added 30 commits September 15, 2025 08:11

side-by-side analysis with Swash (bidi levels, boundaries, scripts) +…

586f39f

… tests

- resolve Mandatory boundaries

f03116d

- condense all byte indexes to char indexes in a single loop - track a minimal set of LineSegmenters (per LineBreakWordOption), and create as needed

- boundary analysis clean-up / condense logic / todos for optimisations

4054508

- clean up tests - add tests for multi-character graphemes

- avoid consuming and re-creating iterator over word boundary data

0ec33d4

- group all word boundary logic together

- remove previous_substring_end, made redundant by `building_range_…

4ab3adc

…start` - doc

avoid consuming iterator when getting first/last char lens

ca6b03a

avoid unnecessarily consuming iterators for script/line break data

8de6f53

.

b2a0fc2

.

9895c38

- dont reallocate string for fast path

326fd09

- fix incorrect start truncation for multi-style strings which arent multi-wb style + test for this - test naming/grouping

.

2d9a397

avoid allocating vecs for boundaries/bidi levels

8ce1898

.

663f79f

.

0971ca3

establish an iterator for contiguous_word_break_substrings instead …

d3303ae

…of allocating

just store index, not char too

fb20378

.

26a8591

.

2581a69

.

ac31353

address TODOs

f3a53a4

add Swash-equivalent Cluster types to Parley, WIP select_font reimple…

4406abd

…mentation

icu-backed select_font equivalent impl

00609fb

select_font working, minus force_normalize

36ed996

force_normalize pseudocode/groundwork

0621f5d

- frontload/simplify analysis info access

fb24f87

- compute `force_normalize`

fix crash on empty style ranges

1ea1c3a

use icu for everything except script

09bac72

use icu for script/locale/language

c77d830

- simplify and fix bidi level retrieval + add tests

77c2a36

- simplify ClusterInfo to just `is_emoji` - more clean-up

optimise is_emoji_grapheme

1269dcc

conor-93 added 17 commits October 14, 2025 14:47

.

c28f356

.

c4d79ef

populate analysis mod

99866c5

migrate from swash Whitespace

8d41e9e

migrate from swash Boundary

a01e3a2

migrate from swash WordBreakStrength

3f984d0

linting

1168372

swash_convert -> icu_convert w/ fixes

cf307f4

test conversion (intermediate)

119a30b

.

43f69c0

remove redundant/excessive assertions

046d7e5

- baked data source build configuration

5197277

- remove unused conversion methods

remove bidi.rs and remaining swash info stores

7dc55ab

.

31515ec

.

0718714

Merge remote-tracking branch 'origin/main' into icu4x

395b52f

# Conflicts: # Cargo.lock # parley/src/context.rs # parley/src/layout/data.rs # parley/src/shape/mod.rs # parley/src/swash_convert.rs

- remove print statements

62578aa

- remove LayoutContext.has_bidi

nicoburns requested changes Oct 20, 2025

View reviewed changes

conor-93 added 6 commits October 21, 2025 10:01

- use modular icu dependencies

d59c170

- unused CharCluster.style_index

move deps to workspace

04bf451

move Setting into Parley, remove swash dep entirely from the main crate

858b9c4

.

e9e107c

.

e7612cd

stop compiling all locale data, just use en for now

ba430a3

taj-p self-requested a review October 21, 2025 22:32

taj-p reviewed Oct 22, 2025

View reviewed changes

taj-p reviewed Oct 23, 2025

View reviewed changes

taj-p mentioned this pull request Oct 23, 2025

Reduce / pass in allocations servo/unicode-bidi#146

Open

taj-p reviewed Oct 23, 2025

View reviewed changes

Migrate to ICU4X #436

Are you sure you want to change the base?

Migrate to ICU4X #436

Conversation

conor-93 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration of text analysis from Swash → ICU4X

Uh oh!

nicoburns left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taj-p left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taj-p Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Analyse Text

Shaping

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taj-p Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

conor-93 commented Oct 20, 2025 •

edited

Loading

nicoburns left a comment •

edited

Loading

taj-p left a comment •

edited

Loading

taj-p Oct 21, 2025 •

edited

Loading

taj-p Oct 22, 2025 •

edited

Loading

taj-p Oct 23, 2025 •

edited

Loading