-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress hyphenations (splits) on multiple contiguous lines? #5
Comments
The last version of |
bramstein/typeset#27 mentions that a demerit value of 3000 was used in typesetting Seminumerical Algorithms, while the value used in typeset defaults to 100. Perhaps this has something to do with this reported problem, and could be helped by allowing Text::KnuthPlass to set this parameter to a higher value. |
My first pass through @bramstein / typeset hasn't turned up any explicit code to find and handle contiguous hyphenated lines, so the only thing it may be doing is discouraging hyphenation overall via a small penalty (and a lack of adjacent hyphenated lines is just a beneficial byproduct). This case of three in a row may just be bad luck. |
@PhilterPaper That's correct, there's no explicit handling of consecutive hyphenated lines. It is all handled by the penalty system. It is thus very possible to get several consecutively hyphenated lines if that is the most optimal justification possible. |
Thanks for quickly responding on this. My thoughts are that, if once a line (with hyphenated word) is laid down and "frozen", it would not be too difficult on the next line, if it ends with a hyphenated word, to go back and check the previous line for a hyphenation. If there is, add an additional penalty on this line's hyphenation. On the other hand, if that previous line isn't frozen until it's too late to rearrange the paragraph, that might not do any good. I see that the final output (at least, for the Perl version) has a hyphenation penalty after every word fragment's box, so I'm not sure where in the code that KP ends up counting just the line-end hyphens for penalties. Back for another dive into the code at some point, I guess. I also need to check whether a naturally hyphenated word split at that hyphen (e.g., "absent-minded") gets the hyphenation penalty. |
The KP algorithm tries to minimize the penalties over the entire paragraph, so what is chosen is most likely the most optimal choice. That means that if you try to avoid consecutive hyphenated lines something else has to give (for example the inter-word spacing.) So it's a trade off; you can avoid the hyphenation, but something else will get (slightly) worse. As for naturally hyphenated words, it depends on the code that generates the sequence of boxes, glue, and penalties. If I remember correctly (and it has been a while), I allow a linebreak after the hyphen but do not add a penalty for naturally hyphenated words. If you haven't already, you can implement the Unicode line breaking algorithm to find the line breaking properties of the input tokens (for example, see: https://github.com/bramstein/unicode-tokenizer). |
I am aware that KP is trying to globally optimize (minimize) penalties over an entire paragraph, so "fixing" contiguous hyphenated lines may force something else to "give". I still think that multiple contiguous hyphenated lines are really a glaring fault (that really catches the eye), and it would be good to get rid of them, even at the cost of slightly worse fitting elsewhere. For N lines in a row hyphenated, perhaps line 1 gets the standard penalty, line 2 gets 1.5 penalties, line 3 gets 3 penalties, etc. I have briefly looked at the Unicode TR 14 (I think that's the one) on allowed line-breaking points; I will take a look at your code, too. Thanks for the suggestion! |
Anyone with thoughts on how to best update the KP algorithm to discourage multiple contiguous hyphenated lines is welcome to chime in. It should be configurable, of course. I don't recall seeing anything in Text::KnuthPlass to allow "naturally hyphenated" words (I presume that means a compound word with explicit hyphen(s) already in it) to be split at the hyphen without penalty, although it should probably count as the first of a run (without its own penalty) if following line(s) want to be hyphenated. When I get back to this later this year, I'll see about handling it in that manner (after checking whether typeset actually does it). Thinking ahead to adding KP to PDF::Builder to do proper paragraph shaping, I may have to extend the code to handle the array of text chunks, each with its own font and size, rather than just one long string. In addition, line lengths may vary unpredictably due to column shapes and where a line falls due to font and image vertical extents (i.e., you can't give a fixed list of line lengths in advance). It promises to be a major task! I'm also mulling over brewing up my own hyphenation package, improving upon Text::Hyphen et al. in being able to switch among multiple human languages on the fly (among other things), and handling letter changes/repeats needed by some languages. Alex Holkner's thesis (https://citeseerx.ist.psu.edu/pdf/ee95750a9dd047b52901efda59819864bb9ede4a) page 11 has an interesting data structure for such things. |
When adding a third sample text to examples/KP.pl, I saw three consecutive lines with hyphenated (split) words, including the penultimate line. It is my understanding that the K-P algorithm is supposed to avoid such runs of hyphenation (as well as hyphenating the next-to-last line), so I'm going to consider this a bug.
The text was updated successfully, but these errors were encountered: