-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate minimum and maximum content width #259
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, modulo potential trailing white space issues with RTL or mixed-direction text.
Firefox also appears to calculate min-content
this way, even with overflow-wrap: break-word
.
self.min_content_width = | ||
self.min_content_width.max(min_width - trailing_whitespace); | ||
min_width = 0.0; | ||
if boundary == Boundary::Mandatory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if greedy.rs
can be rewritten to use Boundary::Mandatory
rather than checking for newline white space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @dfrg can shine some light on this, but from my quick look at Swash, I assume that Boundary::Mandatory
corresponds the mandatory break in the Unicode standard here (see also Table 1), which does seem to be the correct choice for line breaking as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.
It used to! This was changed in the recent refactor of that code to fix selection/cursors. I can't remember why it was changed, but I believe it was a matter of convenience as part of a wider refactor and that it ought to be possible to change it back to using Boundary::Mandatory
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swash follows the Unicode LBA and attaches Boundary::Mandatory
to the cluster after the newline sequence which means we never encounter that state when the source text ends with a newline. This is arguably a swash bug but I “fixed” it in parley by just checking for the newline white space flag and that actually simplified a lot of code.
parley/src/layout/data.rs
Outdated
trailing_whitespace = if cluster.info.whitespace().is_space_or_nbsp() { | ||
cluster.advance | ||
} else { | ||
0.0 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also work for RTL runs, or should those take the whitespace of the first cluster? Or perhaps the endianness should be determined based on the layout's bidi base level direction...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO note here as a breadcrumb. Parley needs some more tests with RTL and mixed-direction text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I was looking at your RTL PR and I still forgot to account for it in my own. To support RTL I just need to iterate the clusters in reverse, it is otherwise correct.
I don't have a good mental model for mixed directions yet, so I have no idea what that should look like in practice, but I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a premature optimisation, but could we just store the index of prev_cluster
, and then compute this lazily when a line break is encoutered?
Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster
approach then either we also need to record prev_item_kind
or prev_cluster
should be an Option
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a premature optimisation, but could we just store the index of prev_cluster, and then compute this lazily when a line break is encoutered?
We can, and hoisting conditionals out of loops always seems sensible if possible. I'd assume this one gets compiled to a conditional move, but it's probably better not to rely on that.
Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the
prev_cluster
approach then either we also need to recordprev_item_kind
orprev_cluster
should be anOption
.
Very true, I'll also add a test for this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks great, but I have requested some changes which I think are necessary for correctness (but please do push back if you don't agree!)
I have also requested one (lazily computing content sizes) for performance. It's possible that size computation is fast enough in comparison to shaping and building the layout that it doesn't make sense to do it lazily, and I also don't feel like we would need to block this PR on that change.
parley/src/layout/data.rs
Outdated
self.apply_spacing(); | ||
self.calculate_content_widths(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be computed lazily (and fields should be Option<T>
). It's possible that there may be multiple edits before the sizes need to be known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I sort of assumed (and never actually checked) that the build_into
function on the builders would take a self
instead of a &mut self
, and that this function would only ever be called once, but that is not the case.
I prefer to do something with interior mutability (OnceCell
or LazyCell
) to cache the values, because that allows for *_content_width()
to work with a &Layout
instead of &mut Layout
.
self.min_content_width = | ||
self.min_content_width.max(min_width - trailing_whitespace); | ||
min_width = 0.0; | ||
if boundary == Boundary::Mandatory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if greedy.rs can be rewritten to use Boundary::Mandatory rather than checking for newline white space.
It used to! This was changed in the recent refactor of that code to fix selection/cursors. I can't remember why it was changed, but I believe it was a matter of convenience as part of a wider refactor and that it ought to be possible to change it back to using Boundary::Mandatory
.
parley/src/layout/data.rs
Outdated
trailing_whitespace = if cluster.info.whitespace().is_space_or_nbsp() { | ||
cluster.advance | ||
} else { | ||
0.0 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a premature optimisation, but could we just store the index of prev_cluster
, and then compute this lazily when a line break is encoutered?
Either way, we need to handle the effect of inline boxes on trailing whitespace (they should reset it to 0). If we use the prev_cluster
approach then either we also need to record prev_item_kind
or prev_cluster
should be an Option
.
It should be fairly straightforward, I need to update a couple of things regardless and this PR is not really blocking any other progress, so I'm fine with just addressing it now. |
e14c7ad
to
5311fc5
Compare
Trailing white space and lazy init are fixed. I'll wait for #250 to land before rebasing and fixing RTL. |
1fb104a
to
76b16cb
Compare
76b16cb
to
8939749
Compare
2d210e9
to
3c876c1
Compare
Apparently, the shaper splits RTL glyphs into separate runs. I didn't dive too deep into it, but it means that the trailing whitespace is always the only cluster in a separate run. I cannot write a failing test for RTL trailing whitespace scenario. I've added code that I think is correct, but might not be when this ever changes. I also added a comment warning that the method doesn't handle mixed-direction text at all. I did uncover another issue while writing the RTL test. There are some floating point inaccuracies between how the max content width and the line advances are calculated, so in some cases, the max content width is very slightly smaller than the advance of a line as the line breaker calculates it. I've fixed this by always rounding up the content width values, so that they are equal to or strictly greater than whatever the line breaker calculates. Finally, I've updated the changelog as well. |
3c876c1
to
3650ee6
Compare
parley/src/layout/data.rs
Outdated
ContentWidths { | ||
min: min_width.ceil(), | ||
max: max_width.ceil(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this works in all cases. For example, if the line breaker calculates a layout width of 406.00003
, but max_width
is calculated as 405.99997
, the calculated widths differ by only two units in the last place for an f32
. The ceil
here would bump max_width
to 406.0
, but the assumption layout.width() <= max_width
would not hold. (As a separate issue, for some funky inputs with very small scales, the bump to ceil
could be quite a large magnitude.)
I think the source of the differences in rounding behavior is from subtle differences like the greedy line breaker accumulating ligature cluster advances separately, before adding them to the total. Other than that, we can probably expect the basic arithmetic to always be the same for all line breakers: i.e., a sum of inline box and cluster advances, but with potentially different ordering and accumulating.
Perhaps a way to get this to work is to just fuzz it for the content width calculation: force a round up for every single basic float operation, thereby always overestimating all other possible sums, regardless of their order of operations and accumulation. For example, the following using next_up may work, though I haven't verified it. That method is nightly-only at the moment, but we could copy it here.
running_max_width = (running_max_width + run.advance).next_up()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If at all possible, I think we ought to try to fix this by applying floating point operations consistently such that there is never a discrepancy, rather than trying to fix it either rounding afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case that's probably possible. For other line breaking algorithms, it could be somewhat tricky.
Another option to consider is to accumulate layout width in f64
for both the content widths and line breaking – all other parts, including cluster advances, remain f32
. Cast the content widths to f32
at the end. For all but the most extreme cases, the algorithms should now agree exactly within f32
precision: there's 9 orders of magnitude of tolerance. A smudge factor (one next_up()
step in f32
) can be added to the line breaking max_advance
, to ensure we don't accidentally line wrap when max_advance == max_content_width
. Maybe apply next_up()
(in f64
) for every operation in the content width calculations for correctness, but practically speaking that's probably unnecessary.
I'd expect the performance hit to be very small, and there's no real memory hit (it's just the accumulation variables that become double precision).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the problem occurs when summing up advances per cluster or per run (example). It's an easy fix for now, so I can follow Nico's suggestion and just make sure that the operations are exactly the same, which allows this PR to be merged. I'd like to move discussion around this to #273 for the more general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I haven't closely verified whether the floating point operations are now equivalent, but I think you have.
@wfdewith Is anything blocking this now? A reminder that Linebender has an "author merges" policy for PRs unless the PR author requests otherwise or isn't a Linebender member (note: if you're just rebasing an approved PR then feel free to rebase and then merge without re-review if you feel confident to do so). |
Nothing is blocking it, I just wasn't aware of this policy (which means I didn't read the contributor guidelines 🫢). |
No worries. The policy is a little unusual I think. The advantage is that it allows the PR author to add any last minute fixes (e.g. in response to non-blocking review feedback) if they want to. The disadvantage is that it takes a bit longer to get things merged. |
Tiny mistake that I made in #259.
This adds
min_content_width
andmax_content_width
functions to the layout. These functions provide an lower and upper bound on the layout width. The content widths are trailing white space aware, which means that they correspond tolayout.width
and notlayout.full_width
.Follow-up work includes using the
max_content_width
to ensure themax_advance
is always capped. This will become useful for floated boxes (#99), because those require Parley to align individual lines based on theirmax_advance
, which in turn requires a reasonable upper bound onmax_advance
.