Calculate minimum and maximum content width #259

wfdewith · 2025-01-28T18:31:57Z

This adds min_content_width and max_content_width functions to the layout. These functions provide an lower and upper bound on the layout width. The content widths are trailing white space aware, which means that they correspond to layout.width and not layout.full_width.

Follow-up work includes using the max_content_width to ensure the max_advance is always capped. This will become useful for floated boxes (#99), because those require Parley to align individual lines based on their max_advance, which in turn requires a reasonable upper bound on max_advance.

tomcur

I think this is right, modulo potential trailing white space issues with RTL or mixed-direction text.

Firefox also appears to calculate min-content this way, even with overflow-wrap: break-word.

tomcur · 2025-01-29T10:00:33Z

parley/src/layout/data.rs

+                            self.min_content_width =
+                                self.min_content_width.max(min_width - trailing_whitespace);
+                            min_width = 0.0;
+                            if boundary == Boundary::Mandatory {


I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.

Maybe @dfrg can shine some light on this, but from my quick look at Swash, I assume that Boundary::Mandatory corresponds the mandatory break in the Unicode standard here (see also Table 1), which does seem to be the correct choice for line breaking as well.

I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.

It used to! This was changed in the recent refactor of that code to fix selection/cursors. I can't remember why it was changed, but I believe it was a matter of convenience as part of a wider refactor and that it ought to be possible to change it back to using Boundary::Mandatory.

Swash follows the Unicode LBA and attaches Boundary::Mandatory to the cluster after the newline sequence which means we never encounter that state when the source text ends with a newline. This is arguably a swash bug but I “fixed” it in parley by just checking for the newline white space flag and that actually simplified a lot of code.

tomcur · 2025-01-29T10:23:10Z

parley/src/layout/data.rs

+                        trailing_whitespace = if cluster.info.whitespace().is_space_or_nbsp() {
+                            cluster.advance
+                        } else {
+                            0.0
+                        };


Does this also work for RTL runs, or should those take the whitespace of the first cluster? Or perhaps the endianness should be determined based on the layout's bidi base level direction...

Maybe add a TODO note here as a breadcrumb. Parley needs some more tests with RTL and mixed-direction text.

Wow, I was looking at your RTL PR and I still forgot to account for it in my own. To support RTL I just need to iterate the clusters in reverse, it is otherwise correct.

I don't have a good mental model for mixed directions yet, so I have no idea what that should look like in practice, but I'll add a comment.

Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?

Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster approach then either we also need to record prev_item_kind or prev_cluster should be an Option.

Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?

We can, and hoisting conditionals out of loops always seems sensible if possible. I'd assume this one gets compiled to a conditional move, but it's probably better not to rely on that.

Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster approach then either we also need to record prev_item_kind or prev_cluster should be an Option.

Very true, I'll also add a test for this scenario.

nicoburns

This generally looks great, but I have requested some changes which I think are necessary for correctness (but please do push back if you don't agree!)

I have also requested one (lazily computing content sizes) for performance. It's possible that size computation is fast enough in comparison to shaping and building the layout that it doesn't make sense to do it lazily, and I also don't feel like we would need to block this PR on that change.

nicoburns · 2025-01-29T21:14:04Z

parley/src/layout/data.rs

+        self.apply_spacing();
+        self.calculate_content_widths();


I think these should be computed lazily (and fields should be Option<T>). It's possible that there may be multiple edits before the sizes need to be known.

Interesting, I sort of assumed (and never actually checked) that the build_into function on the builders would take a self instead of a &mut self, and that this function would only ever be called once, but that is not the case.

I prefer to do something with interior mutability (OnceCell or LazyCell) to cache the values, because that allows for *_content_width() to work with a &Layout instead of &mut Layout.

nicoburns · 2025-01-29T21:16:05Z

parley/src/layout/data.rs

+                            self.min_content_width =
+                                self.min_content_width.max(min_width - trailing_whitespace);
+                            min_width = 0.0;
+                            if boundary == Boundary::Mandatory {


I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.

It used to! This was changed in the recent refactor of that code to fix selection/cursors. I can't remember why it was changed, but I believe it was a matter of convenience as part of a wider refactor and that it ought to be possible to change it back to using Boundary::Mandatory.

parley/src/layout/data.rs

nicoburns · 2025-01-29T21:20:14Z

parley/src/layout/data.rs

+                        trailing_whitespace = if cluster.info.whitespace().is_space_or_nbsp() {
+                            cluster.advance
+                        } else {
+                            0.0
+                        };


Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?

Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster approach then either we also need to record prev_item_kind or prev_cluster should be an Option.

wfdewith · 2025-01-30T12:53:20Z

I have also requested one (lazily computing content sizes) for performance. It's possible that size computation is fast enough in comparison to shaping and building the layout that it doesn't make sense to do it lazily, and I also don't feel like we would need to block this PR on that change.

It should be fairly straightforward, I need to update a couple of things regardless and this PR is not really blocking any other progress, so I'm fine with just addressing it now.

wfdewith · 2025-01-30T20:35:05Z

Trailing white space and lazy init are fixed. I'll wait for #250 to land before rebasing and fixing RTL.

parley/src/layout/data.rs

parley/src/layout/mod.rs

parley/src/layout/data.rs

wfdewith · 2025-02-07T16:59:03Z

Apparently, the shaper splits RTL glyphs into separate runs. I didn't dive too deep into it, but it means that the trailing whitespace is always the only cluster in a separate run. I cannot write a failing test for RTL trailing whitespace scenario. I've added code that I think is correct, but might not be when this ever changes. I also added a comment warning that the method doesn't handle mixed-direction text at all.

I did uncover another issue while writing the RTL test. There are some floating point inaccuracies between how the max content width and the line advances are calculated, so in some cases, the max content width is very slightly smaller than the advance of a line as the line breaker calculates it. I've fixed this by always rounding up the content width values, so that they are equal to or strictly greater than whatever the line breaker calculates.

Finally, I've updated the changelog as well.

tomcur · 2025-02-10T09:06:48Z

parley/src/layout/data.rs

+        ContentWidths {
+            min: min_width.ceil(),
+            max: max_width.ceil(),
+        }


I'm not sure this works in all cases. For example, if the line breaker calculates a layout width of 406.00003, but max_width is calculated as 405.99997, the calculated widths differ by only two units in the last place for an f32. The ceil here would bump max_width to 406.0, but the assumption layout.width() <= max_width would not hold. (As a separate issue, for some funky inputs with very small scales, the bump to ceil could be quite a large magnitude.)

I think the source of the differences in rounding behavior is from subtle differences like the greedy line breaker accumulating ligature cluster advances separately, before adding them to the total. Other than that, we can probably expect the basic arithmetic to always be the same for all line breakers: i.e., a sum of inline box and cluster advances, but with potentially different ordering and accumulating.

Perhaps a way to get this to work is to just fuzz it for the content width calculation: force a round up for every single basic float operation, thereby always overestimating all other possible sums, regardless of their order of operations and accumulation. For example, the following using next_up may work, though I haven't verified it. That method is nightly-only at the moment, but we could copy it here.

running_max_width = (running_max_width + run.advance).next_up()

If at all possible, I think we ought to try to fix this by applying floating point operations consistently such that there is never a discrepancy, rather than trying to fix it either rounding afterwards.

In this case that's probably possible. For other line breaking algorithms, it could be somewhat tricky.

Another option to consider is to accumulate layout width in f64 for both the content widths and line breaking – all other parts, including cluster advances, remain f32. Cast the content widths to f32 at the end. For all but the most extreme cases, the algorithms should now agree exactly within f32 precision: there's 9 orders of magnitude of tolerance. A smudge factor (one next_up() step in f32) can be added to the line breaking max_advance, to ensure we don't accidentally line wrap when max_advance == max_content_width. Maybe apply next_up() (in f64) for every operation in the content width calculations for correctness, but practically speaking that's probably unnecessary.

I'd expect the performance hit to be very small, and there's no real memory hit (it's just the accumulation variables that become double precision).

In this case, the problem occurs when summing up advances per cluster or per run (example). It's an easy fix for now, so I can follow Nico's suggestion and just make sure that the operations are exactly the same, which allows this PR to be merged. I'd like to move discussion around this to #273 for the more general case.

tomcur

Looks good! I haven't closely verified whether the floating point operations are now equivalent, but I think you have.

nicoburns · 2025-02-17T20:54:05Z

@wfdewith Is anything blocking this now? A reminder that Linebender has an "author merges" policy for PRs unless the PR author requests otherwise or isn't a Linebender member (note: if you're just rebasing an approved PR then feel free to rebase and then merge without re-review if you feel confident to do so).

wfdewith · 2025-02-17T21:26:01Z

@wfdewith Is anything blocking this now? A reminder that Linebender has an "author merges" policy for PRs unless the PR author requests otherwise or isn't a Linebender member (note: if you're just rebasing an approved PR then feel free to rebase and then merge without re-review if you feel confident to do so).

Nothing is blocking it, I just wasn't aware of this policy (which means I didn't read the contributor guidelines 🫢).

nicoburns · 2025-02-17T21:32:33Z

No worries. The policy is a little unusual I think. The advantage is that it allows the PR author to add any last minute fixes (e.g. in response to non-blocking review feedback) if they want to. The disadvantage is that it takes a bit longer to get things merged.

Tiny mistake that I made in #259.

tomcur approved these changes Jan 29, 2025

View reviewed changes

nicoburns requested changes Jan 29, 2025

View reviewed changes

wfdewith force-pushed the content-widths branch 4 times, most recently from e14c7ad to 5311fc5 Compare January 30, 2025 20:33

nicoburns reviewed Jan 31, 2025

View reviewed changes

parley/src/layout/data.rs Outdated Show resolved Hide resolved

nicoburns reviewed Jan 31, 2025

View reviewed changes

parley/src/layout/mod.rs Outdated Show resolved Hide resolved

wfdewith force-pushed the content-widths branch from 1fb104a to 76b16cb Compare January 31, 2025 08:44

tomcur mentioned this pull request Feb 6, 2025

Prepare Parley and Fontique 0.3.0 #266

Merged

wfdewith force-pushed the content-widths branch from 76b16cb to 8939749 Compare February 7, 2025 15:17

tomcur reviewed Feb 7, 2025

View reviewed changes

parley/src/layout/data.rs Show resolved Hide resolved

wfdewith force-pushed the content-widths branch from 2d210e9 to 3c876c1 Compare February 7, 2025 16:59

wfdewith added 8 commits February 7, 2025 18:01

Calculate minimum and maximum content width

14a5168

Fix trailing white space with inline boxes

1fc33c3

Lazily initialize content widths

ef77c4d

Create named struct for content widths

3109048

Round content widths up

5c43008

Add RTL test

9306242

Handle RTL text

972f041

Update CHANGELOG

3650ee6

wfdewith force-pushed the content-widths branch from 3c876c1 to 3650ee6 Compare February 7, 2025 17:01

tomcur requested changes Feb 10, 2025

View reviewed changes

wfdewith mentioned this pull request Feb 13, 2025

Account for floating point inaccuracies #273

Open

Sum max content width per cluster

acf421f

wfdewith requested review from nicoburns and tomcur February 13, 2025 19:06

tomcur approved these changes Feb 15, 2025

View reviewed changes

nicoburns approved these changes Feb 16, 2025

View reviewed changes

nicoburns added this to the 0.3 Release milestone Feb 17, 2025

wfdewith added this pull request to the merge queue Feb 17, 2025

Merged via the queue into linebender:main with commit 43ba11d Feb 17, 2025
21 checks passed

wfdewith deleted the content-widths branch February 17, 2025 21:29

wfdewith mentioned this pull request Feb 18, 2025

Fix content width changelog entry #281

Merged

github-merge-queue bot pushed a commit that referenced this pull request Feb 18, 2025

Fix content width changelog entry (#281)

4937e56

Tiny mistake that I made in #259.

wfdewith mentioned this pull request Feb 18, 2025

Make container_width a required alignment parameter #283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate minimum and maximum content width #259

Calculate minimum and maximum content width #259

wfdewith commented Jan 28, 2025

tomcur left a comment

tomcur Jan 29, 2025

wfdewith Jan 29, 2025

nicoburns Jan 29, 2025

dfrg Jan 29, 2025

tomcur Jan 29, 2025

tomcur Jan 29, 2025 •

edited

Loading

wfdewith Jan 29, 2025

nicoburns Jan 29, 2025

wfdewith Jan 30, 2025 •

edited

Loading

nicoburns left a comment

nicoburns Jan 29, 2025

wfdewith Jan 30, 2025

nicoburns Jan 29, 2025

nicoburns Jan 29, 2025

wfdewith commented Jan 30, 2025

wfdewith commented Jan 30, 2025

wfdewith commented Feb 7, 2025

tomcur Feb 10, 2025 •

edited

Loading

nicoburns Feb 10, 2025

tomcur Feb 10, 2025

wfdewith Feb 13, 2025

tomcur left a comment

nicoburns commented Feb 17, 2025

wfdewith commented Feb 17, 2025

nicoburns commented Feb 17, 2025

Calculate minimum and maximum content width #259

Calculate minimum and maximum content width #259

Conversation

wfdewith commented Jan 28, 2025

tomcur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wfdewith Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

nicoburns left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wfdewith commented Jan 30, 2025

wfdewith commented Jan 30, 2025

wfdewith commented Feb 7, 2025

tomcur Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur left a comment

Choose a reason for hiding this comment

nicoburns commented Feb 17, 2025

wfdewith commented Feb 17, 2025

nicoburns commented Feb 17, 2025

tomcur Jan 29, 2025 •

edited

Loading

wfdewith Jan 30, 2025 •

edited

Loading

tomcur Feb 10, 2025 •

edited

Loading