Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate minimum and maximum content width #259

Merged
merged 9 commits into from
Feb 17, 2025

Conversation

wfdewith
Copy link
Contributor

This adds min_content_width and max_content_width functions to the layout. These functions provide an lower and upper bound on the layout width. The content widths are trailing white space aware, which means that they correspond to layout.width and not layout.full_width.

Follow-up work includes using the max_content_width to ensure the max_advance is always capped. This will become useful for floated boxes (#99), because those require Parley to align individual lines based on their max_advance, which in turn requires a reasonable upper bound on max_advance.

Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right, modulo potential trailing white space issues with RTL or mixed-direction text.

Firefox also appears to calculate min-content this way, even with overflow-wrap: break-word.

self.min_content_width =
self.min_content_width.max(min_width - trailing_whitespace);
min_width = 0.0;
if boundary == Boundary::Mandatory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @dfrg can shine some light on this, but from my quick look at Swash, I assume that Boundary::Mandatory corresponds the mandatory break in the Unicode standard here (see also Table 1), which does seem to be the correct choice for line breaking as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.

It used to! This was changed in the recent refactor of that code to fix selection/cursors. I can't remember why it was changed, but I believe it was a matter of convenience as part of a wider refactor and that it ought to be possible to change it back to using Boundary::Mandatory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swash follows the Unicode LBA and attaches Boundary::Mandatory to the cluster after the newline sequence which means we never encounter that state when the source text ends with a newline. This is arguably a swash bug but I “fixed” it in parley by just checking for the newline white space flag and that actually simplified a lot of code.

Comment on lines 504 to 508
trailing_whitespace = if cluster.info.whitespace().is_space_or_nbsp() {
cluster.advance
} else {
0.0
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also work for RTL runs, or should those take the whitespace of the first cluster? Or perhaps the endianness should be determined based on the layout's bidi base level direction...

Copy link
Member

@tomcur tomcur Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO note here as a breadcrumb. Parley needs some more tests with RTL and mixed-direction text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I was looking at your RTL PR and I still forgot to account for it in my own. To support RTL I just need to iterate the clusters in reverse, it is otherwise correct.

I don't have a good mental model for mixed directions yet, so I have no idea what that should look like in practice, but I'll add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?

Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster approach then either we also need to record prev_item_kind or prev_cluster should be an Option.

Copy link
Contributor Author

@wfdewith wfdewith Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?

We can, and hoisting conditionals out of loops always seems sensible if possible. I'd assume this one gets compiled to a conditional move, but it's probably better not to rely on that.

Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster approach then either we also need to record prev_item_kind or prev_cluster should be an Option.

Very true, I'll also add a test for this scenario.

Copy link
Contributor

@nicoburns nicoburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks great, but I have requested some changes which I think are necessary for correctness (but please do push back if you don't agree!)

I have also requested one (lazily computing content sizes) for performance. It's possible that size computation is fast enough in comparison to shaping and building the layout that it doesn't make sense to do it lazily, and I also don't feel like we would need to block this PR on that change.

Comment on lines 453 to 454
self.apply_spacing();
self.calculate_content_widths();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be computed lazily (and fields should be Option<T>). It's possible that there may be multiple edits before the sizes need to be known.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I sort of assumed (and never actually checked) that the build_into function on the builders would take a self instead of a &mut self, and that this function would only ever be called once, but that is not the case.

I prefer to do something with interior mutability (OnceCell or LazyCell) to cache the values, because that allows for *_content_width() to work with a &Layout instead of &mut Layout.

self.min_content_width =
self.min_content_width.max(min_width - trailing_whitespace);
min_width = 0.0;
if boundary == Boundary::Mandatory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.

It used to! This was changed in the recent refactor of that code to fix selection/cursors. I can't remember why it was changed, but I believe it was a matter of convenience as part of a wider refactor and that it ought to be possible to change it back to using Boundary::Mandatory.

Comment on lines 504 to 508
trailing_whitespace = if cluster.info.whitespace().is_space_or_nbsp() {
cluster.advance
} else {
0.0
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?

Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster approach then either we also need to record prev_item_kind or prev_cluster should be an Option.

@wfdewith
Copy link
Contributor Author

I have also requested one (lazily computing content sizes) for performance. It's possible that size computation is fast enough in comparison to shaping and building the layout that it doesn't make sense to do it lazily, and I also don't feel like we would need to block this PR on that change.

It should be fairly straightforward, I need to update a couple of things regardless and this PR is not really blocking any other progress, so I'm fine with just addressing it now.

@wfdewith wfdewith force-pushed the content-widths branch 4 times, most recently from e14c7ad to 5311fc5 Compare January 30, 2025 20:33
@wfdewith
Copy link
Contributor Author

Trailing white space and lazy init are fixed. I'll wait for #250 to land before rebasing and fixing RTL.

@wfdewith
Copy link
Contributor Author

wfdewith commented Feb 7, 2025

Apparently, the shaper splits RTL glyphs into separate runs. I didn't dive too deep into it, but it means that the trailing whitespace is always the only cluster in a separate run. I cannot write a failing test for RTL trailing whitespace scenario. I've added code that I think is correct, but might not be when this ever changes. I also added a comment warning that the method doesn't handle mixed-direction text at all.

I did uncover another issue while writing the RTL test. There are some floating point inaccuracies between how the max content width and the line advances are calculated, so in some cases, the max content width is very slightly smaller than the advance of a line as the line breaker calculates it. I've fixed this by always rounding up the content width values, so that they are equal to or strictly greater than whatever the line breaker calculates.

Finally, I've updated the changelog as well.

Comment on lines 543 to 546
ContentWidths {
min: min_width.ceil(),
max: max_width.ceil(),
}
Copy link
Member

@tomcur tomcur Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this works in all cases. For example, if the line breaker calculates a layout width of 406.00003, but max_width is calculated as 405.99997, the calculated widths differ by only two units in the last place for an f32. The ceil here would bump max_width to 406.0, but the assumption layout.width() <= max_width would not hold. (As a separate issue, for some funky inputs with very small scales, the bump to ceil could be quite a large magnitude.)

I think the source of the differences in rounding behavior is from subtle differences like the greedy line breaker accumulating ligature cluster advances separately, before adding them to the total. Other than that, we can probably expect the basic arithmetic to always be the same for all line breakers: i.e., a sum of inline box and cluster advances, but with potentially different ordering and accumulating.

Perhaps a way to get this to work is to just fuzz it for the content width calculation: force a round up for every single basic float operation, thereby always overestimating all other possible sums, regardless of their order of operations and accumulation. For example, the following using next_up may work, though I haven't verified it. That method is nightly-only at the moment, but we could copy it here.

running_max_width = (running_max_width + run.advance).next_up()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at all possible, I think we ought to try to fix this by applying floating point operations consistently such that there is never a discrepancy, rather than trying to fix it either rounding afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case that's probably possible. For other line breaking algorithms, it could be somewhat tricky.

Another option to consider is to accumulate layout width in f64 for both the content widths and line breaking ­– all other parts, including cluster advances, remain f32. Cast the content widths to f32 at the end. For all but the most extreme cases, the algorithms should now agree exactly within f32 precision: there's 9 orders of magnitude of tolerance. A smudge factor (one next_up() step in f32) can be added to the line breaking max_advance, to ensure we don't accidentally line wrap when max_advance == max_content_width. Maybe apply next_up() (in f64) for every operation in the content width calculations for correctness, but practically speaking that's probably unnecessary.

I'd expect the performance hit to be very small, and there's no real memory hit (it's just the accumulation variables that become double precision).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the problem occurs when summing up advances per cluster or per run (example). It's an easy fix for now, so I can follow Nico's suggestion and just make sure that the operations are exactly the same, which allows this PR to be merged. I'd like to move discussion around this to #273 for the more general case.

@wfdewith wfdewith requested review from nicoburns and tomcur February 13, 2025 19:06
Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I haven't closely verified whether the floating point operations are now equivalent, but I think you have.

@nicoburns
Copy link
Contributor

@wfdewith Is anything blocking this now? A reminder that Linebender has an "author merges" policy for PRs unless the PR author requests otherwise or isn't a Linebender member (note: if you're just rebasing an approved PR then feel free to rebase and then merge without re-review if you feel confident to do so).

@nicoburns nicoburns added this to the 0.3 Release milestone Feb 17, 2025
@wfdewith
Copy link
Contributor Author

@wfdewith Is anything blocking this now? A reminder that Linebender has an "author merges" policy for PRs unless the PR author requests otherwise or isn't a Linebender member (note: if you're just rebasing an approved PR then feel free to rebase and then merge without re-review if you feel confident to do so).

Nothing is blocking it, I just wasn't aware of this policy (which means I didn't read the contributor guidelines 🫢).

@wfdewith wfdewith added this pull request to the merge queue Feb 17, 2025
Merged via the queue into linebender:main with commit 43ba11d Feb 17, 2025
21 checks passed
@wfdewith wfdewith deleted the content-widths branch February 17, 2025 21:29
@nicoburns
Copy link
Contributor

No worries. The policy is a little unusual I think. The advantage is that it allows the PR author to add any last minute fixes (e.g. in response to non-blocking review feedback) if they want to. The disadvantage is that it takes a bit longer to get things merged.

github-merge-queue bot pushed a commit that referenced this pull request Feb 18, 2025
Tiny mistake that I made in #259.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants