Skip to content

Conversation

@conor-93
Copy link

@conor-93 conor-93 commented Oct 20, 2025

Migration of text analysis from Swash → ICU4X

Overview

ICU4X enables text analysis and internationalisation. For Parley, this includes locale and language recognition,
bidirectional text evaluation, text segmentation, emoji recognition, NFC/NFD normalisation and other Unicode character information.

ICU4X is developed and maintained by a trusted authority in the space of text internationalisation: the ICU4X Technical Committee (ICU4X-TC) in the Unicode Consortium. It is targeted at resource-constrained environments. For Parley, this means:

  • The potential for full locale support for complex line breaking cases (not supported by Swash).
  • Reliable and up-to-date Unicode data.
  • Reasonable performance and memory footprint (with the possibility of future improvements).
  • Full decoupling from Swash (following decoupling for shaping behaviour earlier this year); a significant offloading of maintenance effort.

Notable changes

  • Removal of first-party bidi embed level resolution logic.
  • select_font emoji detection improvements (Flag emoji "🇺🇸", Keycap sequences (e.g. 0️⃣ through 9️⃣) now supported in cluster detection, Swash did not support these).
  • Slightly more up-to-date Unicode data than Swash (e.g. a few more Scripts).

Performance/binary size

  • Binary size for vello_editor is ~22kB larger (9642kB vs 9620kB).
    • A little unfortunate, given Swash stores more Unicode character information, ICU4X lets us bake in only the tables we need.
  • Performance sees quite a significant regression unfortunately (though I might have missed some performance wins that could mitigate this🤞). The difference is ~+55% latency for total layout runs, more for Latin, less for other languages, though this could just be due to a higher overhead cost impacting lower-latency tests proportionally more). To put this into perspective, this regresses back to roughly where Parley landed after shaping artifact caching, and prior to charmap caching.
Default Style - arabic 20 characters               [   9.9 us ...  14.5 us ]     +46.34%*
Default Style - latin 20 characters                [   4.3 us ...   8.1 us ]     +86.25%*
Default Style - japanese 20 characters             [   8.3 us ...  13.6 us ]     +63.34%*
Default Style - arabic 1 paragraph                 [  56.3 us ...  75.0 us ]     +33.20%*
Default Style - latin 1 paragraph                  [  18.0 us ...  33.4 us ]     +85.40%*
Default Style - japanese 1 paragraph               [  72.3 us ... 113.9 us ]     +57.51%*
Default Style - arabic 4 paragraph                 [ 240.7 us ... 310.1 us ]     +28.81%*
Default Style - latin 4 paragraph                  [  79.4 us ... 137.7 us ]     +73.42%*
Default Style - japanese 4 paragraph               [ 101.7 us ... 160.1 us ]     +57.38%*
Styled - arabic 20 characters                      [  10.8 us ...  15.0 us ]     +38.50%*
Styled - latin 20 characters                       [   5.5 us ...   9.3 us ]     +69.94%*
Styled - japanese 20 characters                    [   9.0 us ...  14.0 us ]     +56.46%*
Styled - arabic 1 paragraph                        [  58.0 us ...  76.6 us ]     +32.19%*
Styled - latin 1 paragraph                         [  21.4 us ...  37.2 us ]     +73.48%*
Styled - japanese 1 paragraph                      [  79.3 us ... 121.4 us ]     +53.20%*
Styled - arabic 4 paragraph                        [ 257.0 us ... 326.8 us ]     +27.18%*
Styled - latin 4 paragraph                         [  83.3 us ... 142.2 us ]     +70.71%*
Styled - japanese 4 paragraph                      [ 111.8 us ... 172.1 us ]     +53.97%*

Other details

  • Swash's Language parsing is more tolerant, e.g. it permits extra, invalid subtags (like in "en-Latn-US-a-b-c-d").
  • Segmenters (line, word, grapheme) are currently content-aware, and can be used without specifying a locale. However, if we plug locale data in at runtime, we can construct segmenters to target a specific locale, rather than inferring from content (which would be the most correct approach for targeting said locale).
    • The full set of locale data (even with ICU4X's deduplication) is heavy, totalling ~2.5MB (in vello_editor compilation testing). In order to potentially support correct word breaking across all languages, without seeing a huge compilation size increase, we would need a way for users to attach only the locale data they need at runtime. This locale data could be generated (with icu4x-datagen) and attached (using DataProviders) at runtime in the future.
    • Without full locale support, line and word breaking use Unicode rule-based approaches UAX #14 and #29 respectively (at parity with Swash).
  • We could also support bring-your-own-data for Unicode character information too, for users only interested in narrow character sets (e.g. basic Latin), for a small compilation size improvement (not sure how much exactly).
  • Swash's support for alternating word break strength is maintained, by breaking text into windows (which look back/forward an extra character for context) and performaing segmentation on each window separarely, as ICU4X doesn't natively support variable word break strength when segmenting.

- condense all byte indexes to char indexes in a single loop
- track a minimal set of LineSegmenters (per LineBreakWordOption), and create as needed
- clean up tests
- add tests for multi-character graphemes
- fix incorrect start truncation for multi-style strings which arent multi-wb style + test for this
- test naming/grouping
- compute `force_normalize`
- simplify ClusterInfo to just `is_emoji`
- more clean-up
Copy link
Contributor

@nicoburns nicoburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a full review (and doesn't really review the analysis logic at all (which I'm probably not qualified to do)), but I did a first pass on "rust-level things":

Some general notes:

  • This currently increases the binary size of a release build of the vello_editor example by 2-3mb. That seems very unfortunate, but it also looks like there are few easy wins to reduce this, and we should probably re-evaluate the impact once those are fixed.
  • I think depending on the top-level icu crate is wrong. We should be depending on sub-crates like icu_segmenter, icu_normalizer, etc directly.
  • We should also be disabling default features of those crates where appropriate. Currently I'm pretty sure we're compiling in the default data providers (pulled in by default features) AND our custom subset (generated by the build.rs)
  • We should check how much we're saving by using the custom baked data, because any application which depends on icu4x for other purposes may well end up pulling in the default data sets anyway, which would also lead to duplicated data. May or may not be worth the saving, and we may also wish to offer both options.
  • We should fully eliminate the Swash dependency (only a couple of trivial uses remaining).
  • Benchmarks are currently broken due to LayoutContext no longer being Sync. We should work out whether we want to go down the route of making LayoutContexts always thread-local or whether we want find a way to make it Sync again.

accesskit = { workspace = true, optional = true }
hashbrown = { workspace = true }
harfrust = { workspace = true }
icu = { version = "2.0.0"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the icu crate with default features enabled doesn't seem right to me. That's going to pull in a lot of stuff which we surely aren't using?

}

#[rustfmt::skip]
const FONTIQUE_SCRIPT_TAGS: [[u8; 4]; 193] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can be replaced with short_name method on icu's Script type. But we should check that's it's completely equivalent. There may a couple special cases.

Comment on lines 346 to +350
loop {
if !parser.next(&mut cluster) {
// End of current item - process final segment
break;
}
cluster = match clusters_iter.next() {
Some(c) => c,
None => break, // End of current item - process final segment
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could become a while let loop

Comment on lines +266 to +267
if has_emoji && has_zwj {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if was probably meant to be removed?

extern crate alloc;

pub use fontique;
pub use swash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go if we're not using swash anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be one more use of swash for the swash::Setting type in parley/src/style/font.rs. I would suggest replacing this with a custom struct in Parley. This type does also exist in font-types, but I think we want to avoid putting font-types in the public API at least until it hits 1.0.

});
}

#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat relieved to see how much of this file is tests! I was dreading reviewing 1000 lines. We may wish to consider moving these to a separate files in the tests directory, although that's subjective and others (including you) may have other opinions (I personally find it harder to navigate code files with a mix of lots of tests and lots of code).


#[derive(Debug)]
pub(crate) struct CharCluster {
pub chars: Vec<Char>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should back this with benchmark/profile data, but we should consider using either an ArrayVec with capacity of MAX_CLUSTER_SIZE or a SmallVec with a capacity tuned to "big enough for most clusters" here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or the same array + len setup used for Form)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply using a SmallVec of capacity 1 saw these results - seems like a great suggestion. I didn't play around too much with different sizes cc @conor-93

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah - there's even greater improvement if we change Form's chars type to be a smallvec of 1 capacity

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can be have a capacity of 4 without using any more memory (although there are other similar crates that could get us a size reduction for smaller capacities).

@taj-p taj-p self-requested a review October 21, 2025 22:32
Copy link
Contributor

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only managed to review cluster.rs today. I'll continue with more tomorrow.

As I review, I'm experimenting and learning with this branch (which has a lot of the changes suggested).

This is where that branch is compared to this branch currently. Huge props to @nicoburns for that suggestion re using SmallVec. I think using SmallVec for Form had a huge effect with everything else being mostly marginal gains.

Image

icu_locale = { workspace = true }
icu_normalizer = { workspace = true }
icu_properties = { workspace = true, features = ["unicode_bidi"] }
icu_provider = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
icu_provider = { workspace = true }

Does this need to be a dependency (both here and by the workspace)?

It's only usages are use icu_provider::prelude::icu_locale_core::LanguageIdentifier; but they can be replaced with use icu_locale::LanguageIdentifier;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icu_locale_core::LanguageIdentifier would be even better

Copy link
Contributor

@taj-p taj-p Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taj's ongoing performance investigation...

All profiles from rendering a lot of latin (only) text.

Analyse Text

Appears 2.5x as slow as Swash

Swash ICU
image image

Shaping

The other standout in the perf profiles is iterating over the clusters in shape/mod.rs

pub struct BakedProvider;
impl_data_provider!(BakedProvider);

pub(crate) static PROVIDER: BakedProvider = BakedProvider; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this the provider that we would need pass from the consumer if we didn't want to bake the data?


pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
// See: https://github.com/unicode-org/icu4x/blob/ee5399a77a6b94efb5d4b60678bb458c5eedb25d/components/segmenter/src/line.rs#L338-L351
fn is_mandatory_line_break(line_break: LineBreak) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you please define this closer to where it's used?


#[derive(Debug)]
pub(crate) struct CharCluster {
pub chars: Vec<Char>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply using a SmallVec of capacity 1 saw these results - seems like a great suggestion. I didn't play around too much with different sizes cc @conor-93

image


#[derive(Debug)]
pub(crate) struct CharCluster {
pub chars: Vec<Char>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah - there's even greater improvement if we change Form's chars type to be a smallvec of 1 capacity

image

continue;
}
all_boundaries_byte_indexed[wb] = Boundary::Word;
}
Copy link
Contributor

@taj-p taj-p Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need to pay the cost of the all_boundaries_byte_indexed allocation by making this loop an iterator and instead push line boundary positions to a vector. There's probably a way to make line boundary positions an iterator too in order to avoid the vec we push to, but I think it's fine to leave that as a TODO

diff --git a/parley/src/analysis/mod.rs b/parley/src/analysis/mod.rs
index f903b20..6ddbee2 100644
--- a/parley/src/analysis/mod.rs
+++ b/parley/src/analysis/mod.rs
@@ -273,16 +273,15 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
     
     let mut line_segmenters = core::mem::take(&mut lcx.analysis_data_sources.line_segmenters);
 
-    let mut all_boundaries_byte_indexed = vec![Boundary::None; text.len()];
-
-    // Word boundaries:
-    for wb in lcx.analysis_data_sources.word_segmenter().segment_str(text) {
+    // Collect boundary byte positions compactly
+    let mut wb_iter =  lcx.analysis_data_sources.word_segmenter().segment_str(text).filter_map(|wb| {
         // icu produces a word boundary trailing the string, which we don't use.
         if wb == text.len() {
-            continue;
+            None
+        } else {
+            Some(wb)
         }
-        all_boundaries_byte_indexed[wb] = Boundary::Word;
-    }
+    }).peekable();
 
     // Line boundaries (word break naming refers to the line boundary determination config).
     //
@@ -298,27 +297,40 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
         &first_style
     );
     let mut global_offset = 0;
+    let mut line_boundary_positions: Vec<usize> = Vec::new();
+    // LINE BOUNDARIES COLLECTION
     for (substring_index, (substring, word_break_strength, last)) in contiguous_word_break_substrings.enumerate() {
-        let line_boundaries: Vec<usize> = lcx.analysis_data_sources
-            .line_segmenter(word_break_strength)
-            .segment_str(substring)
-            .collect();
 
         // Fast path for text with a single word-break option.
         if substring_index == 0 && last {
-            // icu adds leading and trailing line boundaries, which we don't use.
-            let Some((_first, rest)) = line_boundaries.split_first() else {
+            let mut lb_iter = line_segmenters.get(word_break_strength).segment_str(substring);
+
+            let _first = lb_iter.next();
+            let second = lb_iter.next();
+
+            if second.is_none() {
                 continue;
-            };
-            let Some((_last, middle)) = rest.split_last() else {
+            }
+
+            let third = lb_iter.next();
+
+            if third.is_none() {
                 continue;
-            };
-            for &b in middle {
-                all_boundaries_byte_indexed[b] = Boundary::Line;
+            }
+
+            let iter = [second.unwrap(), third.unwrap()].into_iter().chain(lb_iter);
+            line_boundary_positions.extend(iter);
+            line_boundary_positions.pop();
-             }
             break;
         }
 
+        let line_boundaries_iter = line_segmenters.get(word_break_strength).segment_str(substring);
+
         let mut substring_chars = substring.chars();
         if substring_index != 0 {
             global_offset -= substring_chars.next().unwrap().len_utf8();
@@ -328,9 +340,9 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
         let last_len = substring_chars.next_back().unwrap().len_utf8();
 
         // Mark line boundaries (overriding word boundaries where present).
-        for (index, &pos) in line_boundaries.iter().enumerate() {
+        for (index, pos) in line_boundaries_iter.enumerate() {
             // icu adds leading and trailing line boundaries, which we don't use.
-            if index == 0 || index == line_boundaries.len() - 1 {
+            if index == 0 || pos == substring.len() {
                 continue;
             }
 
@@ -340,7 +352,7 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
             if !last && pos == substring.len() - last_len {
                 continue;
             }
-            all_boundaries_byte_indexed[pos + global_offset] = Boundary::Line;
+            line_boundary_positions.push(pos + global_offset);
         }
 
         if !last {
@@ -351,11 +363,35 @@ pub(crate) fn analyze_text<B: Brush>(lcx: &mut LayoutContext<B>, text: &str) {
     // BiDi embedding levels:
     let bidi_embedding_levels = unicode_bidi::BidiInfo::new_with_data_source(&lcx.analysis_data_sources.bidi_class(), text, None).levels;
 
+    // Merge boundaries - line takes precedence over word
+    let mut lb_iter = line_boundary_positions.iter().peekable();
     let boundaries_and_levels_iter = text.char_indices()
-        .map(|(byte_pos, _)| (
-            all_boundaries_byte_indexed.get(byte_pos).unwrap(),
-            bidi_embedding_levels.get(byte_pos).unwrap()
-        ));
+        .map(|(byte_pos, _)| {
+            // advance any stale word boundary positions
+            while let Some(&w) = wb_iter.peek() {
+                if w < byte_pos { _ = wb_iter.next(); } else { break; }
+            }
+            // advance any stale line boundary positions
+            while let Some(&l) = lb_iter.peek() {
+                if *l < byte_pos { _ = lb_iter.next(); } else { break; }
+            }
+
+            let mut boundary = Boundary::None;
+            if let Some(&w) = wb_iter.peek() {
+                if w == byte_pos {
+                    boundary = Boundary::Word;
+                    _ = wb_iter.next();
+                }
+            }
+            if let Some(&l) = lb_iter.peek() {
+                if *l == byte_pos {
+                    boundary = Boundary::Line;
+                    _ = lb_iter.next();
+                }
+            }
+
+            (boundary, bidi_embedding_levels.get(byte_pos).unwrap())
+        });
 
     fn unicode_data_iterator<'a, T: TrieValue>(
         text: &'a str,

let line_boundaries: Vec<usize> = lcx.analysis_data_sources
.line_segmenter(word_break_strength)
.segment_str(substring)
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably skip this allocation entirely using something like:

    for (substring_index, (substring, word_break_strength, last)) in contiguous_word_break_substrings.enumerate() {
        // Fast path for text with a single word-break option.
        if substring_index == 0 && last {
            let mut lb_iter = line_segmenters.get(word_break_strength).segment_str(substring);
            // CAUTION: bad names, draft code ahead

            let _first = lb_iter.next();
            let second = lb_iter.next();

            if second.is_none() {
                continue;
            }

            let third = lb_iter.next();

            if third.is_none() {
                continue;
            }

            let iter = [second.unwrap(), third.unwrap()].into_iter().chain(lb_iter);

            for b in iter {
                if b == substring.len() {
                    continue;
                }
                line_boundary_positions.push(b);
            }
            break;
        }

        let line_boundaries_iter = line_segmenters.get(word_break_strength).segment_str(substring);

        let mut substring_chars = substring.chars();
        if substring_index != 0 {
            global_offset -= substring_chars.next().unwrap().len_utf8();
        }
        // There will always be at least two characters if we are not taking the fast path for
        // a single word break style substring.
        let last_len = substring_chars.next_back().unwrap().len_utf8();

        // Mark line boundaries (overriding word boundaries where present).
        for (index, pos) in line_boundaries_iter.enumerate() {
            // icu adds leading and trailing line boundaries, which we don't use.
            if index == 0 || pos == substring.len() {
                continue;
            }

            // For all but the last substring, we ignore line boundaries caused by the last
            // character, as this character is carried back from the next substring, and will be
            // accounted for there.
            if !last && pos == substring.len() - last_len {
                continue;
            }
            line_boundary_positions.push(pos + global_offset);
        }

        if !last {
            global_offset += substring.len() - last_len;
        }
    }

let Some((first_style, rest)) = lcx.styles.split_first() else {
panic!("No style info");
};
let contiguous_word_break_substrings = WordBreakSegmentIter::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand WordBreakSegmentIter correctly, then we still iterate through the string even when there's no change in word break style. Is there a way to avoid that by doing an iteration through styles first before proceeding with a per character iteration?

    if lcx.styles.iter().any(|s| s.style.word_break != LineBreakWordOption::Normal) {
       // Evaluate subranges
    } else {
        // Fast path
    }

Or simply update the iterator to first check whether it needs to yield subranges (by performing this check internally)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that may be worth looking at is the difference between internal and external iteration (it's possible that Iterator::fold (or Iterator::try_fold) might be faster than Iterator::next() / Iterator::for_each())

Comment on lines +41 to +44
/// Whether the character
pub is_control_character: bool,
/// True if the character should be considered when mapping glyphs.
pub contributes_to_shaping: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use bit flags for these - either using bitflags or manually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time to create a custom provider in build.rs from existing ICU data sources that allows for 1 lookup on the character (instead of 1 lookup per property per character) and saw very good performance improvements.

As per our conversation yesterday, I'll finish this review and prepare that work to be incorporated here

Image

self.decomp.state = FormState::Invalid;

// Create a string from the original characters to normalize
let mut orig_str = String::with_capacity(self.len as usize * 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating these strings in composed and decomposed, are we able to instead please pass a scratch_string or similar to this and composed so that we can reuse 1 allocation?

Something like 352225f?

}

// BiDi embedding levels:
let bidi_embedding_levels = unicode_bidi::BidiInfo::new_with_data_source(&lcx.analysis_data_sources.bidi_class(), text, None).levels;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that we might only need to run this iff one of the characters is a from a BidiClass that requires resolution? If so, when we have the composite property trie, we could add a fast path for this.

}
}

fn is_emoji_grapheme(analysis_data_sources: &AnalysisDataSources, grapheme: &str) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function shows up significantly in profiles, but I'm wondering whether we can address that with the composite property provider. I.e., we store whether some character is an emoji and fast path out of this with something like:

                  let mut is_emoji_or_pictograph = false;

                  let chars = segment_text.char_indices().zip(item_infos_iter.by_ref()).map(|((_, ch), (info, style_index))| {
                      // ...
                      is_emoji_or_pictograph |= info.is_emoji_or_pictograph;

                     // ...

                  let cluster = CharCluster::new(
                      // ...
                      is_emoji_or_pictograph || if (segment_text.len() > 1) {is_emoji_grapheme(analysis_data_sources, segment_text) } else { false },
                      // ...
                  );

where is_emoji_or_pictograph is a new field on item infos

let cluster = CharCluster::new(
chars,
is_emoji_grapheme(analysis_data_sources, segment_text),
len,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need len if it can be obtained by chars.len?

}

// BiDi embedding levels:
let bidi_embedding_levels = unicode_bidi::BidiInfo::new_with_data_source(&lcx.analysis_data_sources.bidi_class(), text, None).levels;
Copy link
Contributor

@taj-p taj-p Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bidi impl doesn't enable passing in allocations (it allocates within its impl). I created an issue in the upstream repo asking whether it's possible to pass allocations in: servo/unicode-bidi#146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants