-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
content: Handle multiple math blocks in <p>
#1156
Conversation
My test message renders like this: I think this is a result of parsing each |
lib/model/content.dart
Outdated
// it seems like a glitch in the server's Markdown processing, | ||
// so hopefully there just aren't any further such glitches. | ||
if (child case dom.Element(localName: 'br')) continue; | ||
if (child case dom.Text(text: '\n' || '\n\n')) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this no longer checks if the <br>\n
is at the end. Would it be unexpected to have them appear in the middle?
\n\n
on the other hand, appear to be somehow separate from the quirk commented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated the handling of both these cases.
'<span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">b</span></span></span></span></span></p>', [ | ||
MathBlockNode(texSource: 'a'), | ||
MathBlockNode(texSource: 'b'), | ||
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also have a test with multiple math blocks in a quote.
Thanks for the PR! Apparently I've missed this one as it has been sitting for some time. Overall this works fine for me. Left some comments. |
9d222da
to
f4dde28
Compare
Thanks for the review @PIG208. Pushed a new revision, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Left some small comments.
test/model/content_test.dart
Outdated
@@ -522,6 +539,26 @@ class ContentExample { | |||
'<br>\n</p>\n</blockquote>', | |||
[QuotationNode([MathBlockNode(texSource: r'\lambda')])]); | |||
|
|||
static const mathBlocksMultipleInQuote = ContentExample( | |||
'math blocks, multiple in quote', | |||
"````quote\n```math\na\n\nb```````", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a \n
between ``` and ````
"````quote\n```math\na\n\nb```\n````"
// each node is interleaved by '\n\n'. Whitespaces are ignored in HTML | ||
// on web but each node has `display: block`, which renders each node | ||
// on a newline. So do the same here. | ||
if (child case dom.Text(text: '\n\n')) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how the first sentence is relevant. Regarding "do the same here", continue
itself does not seem to address the part about rendering each node with display: block
, as that is more relevant to blocks.add(MathBlockNode(texSource: texSource, debugHtmlNode: debugHtmlNode));
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more info about how it tries to replicate web behavior.
f4dde28
to
5f0bfa1
Compare
Looks good to me! Thanks for the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajveermalviya for taking care of this, and thanks @PIG208 for the previous reviews!
Generally this looks good. Comments below, including one fun edge case in the actual content syntax.
lib/model/content.dart
Outdated
result.addAll(parseMathBlocks(node)); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state this loop maintains is a bit more complex than this — there's result
but also imageNodes
, and when adding to result
we need to flush imageNodes
first, like is done at the bottom of the loop body.
For an example message where that makes a difference:
https://chat.zulip.org/#narrow/channel/7-test-here/topic/Greg/near/2035891
https://upload.wikimedia.org/wikipedia/commons/7/78/Verregende_bloem_van_een_Helenium_%27El_Dorado%27._22-07-2023._%28d.j.b%29.jpg
```math
a
```
https://upload.wikimedia.org/wikipedia/commons/thumb/7/71/Zaadpluizen_van_een_Clematis_texensis_%27Princess_Diana%27._18-07-2023_%28actm.%29_02.jpg/1280px-Zaadpluizen_van_een_Clematis_texensis_%27Princess_Diana%27._18-07-2023_%28actm.%29_02.jpg
It should be image, then math block, then image, but this version makes it math block, then a row with both images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For another comparison, see the _isPossibleInlineNode
case in the function above this, which has a loop with a very similar structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which, let's make the same changes in that other method parseImplicitParagraphBlockContentList
as in this method parseBlockContentList
, so that they remain as similar as possible and differ only in the ways they need to differ for their different jobs.
Probably there should be a refactoring in this code's future so that these two methods have less duplication. But that refactoring will be easier if we minimize the extent they've diverged from each other by then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think it's currently not possible to exercise the case where this logic would matter in parseImplicitParagraphBlockContentList
, because it seems to be currently not possible to get a math block inside a list item in Zulip. But CommonMark allows it — see this example — and in principle it's a useful capability which probably should be allowed, so perhaps in the future it will be.)
lib/model/content.dart
Outdated
blocks.add(UnimplementedBlockContentNode(htmlNode: child)); | ||
continue; | ||
} | ||
blocks.add(MathBlockNode(texSource: texSource, debugHtmlNode: debugHtmlNode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debugHtmlNode should be child
(or null, depending), rather than element
, because child
is what this particular MathBlockNode is parsed from.
lib/model/content.dart
Outdated
@@ -1451,6 +1451,61 @@ class _ZulipContentParser { | |||
return tableNode ?? UnimplementedBlockContentNode(htmlNode: tableElement); | |||
} | |||
|
|||
List<BlockContentNode> parseMathBlocks(dom.Element element) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
List<BlockContentNode> parseMathBlocks(dom.Element element) { | |
List<BlockContentNode> parseMathBlocks(dom.NodeList nodes) { |
After my preceding comment about debugHtmlNode, this function doesn't actually make any use of element
except via element.nodes
, so this simplifies things a bit.
lib/model/content.dart
Outdated
})()); | ||
final debugHtmlNode = kDebugMode ? element : null; | ||
|
||
final blocks = <BlockContentNode>[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My performance-paranoia for this parsing code doesn't like the idea of allocating this list, just to immediately empty it into the surrounding list, particularly in the common case where there's only going to be a single math block in the <p>
element. (And here "common case" means "everything but the weird edge case".)
But I think avoiding that would require a larger refactor, so let's go ahead with this way. It's probably fine (that is, it's probably not a big performance impact in practice). And at least it only affects messages with math blocks in them, which isn't a super common feature in the first place.
lib/model/content.dart
Outdated
if (child case dom.Text(text: '\n\n')) continue; | ||
|
||
if (child is! dom.Element) { | ||
blocks.add(UnimplementedBlockContentNode(htmlNode: child)); | ||
continue; | ||
} | ||
if (child.className != 'katex-display') { | ||
blocks.add(UnimplementedBlockContentNode(htmlNode: child)); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My performance-paranoia about this code also doesn't like having all these checks for the first child, where we already know the answers to them; particularly since that'll almost always be the only child.
But for this one, there is a clean solution, so let's do that. Before the loop, operate on just the first child; it'll need an as dom.Element
, but none of the rest of these checks. And it'll duplicate the last few lines of the loop body, but that's OK.
Then the loop runs from 1 to < length
, so will normally have no iterations at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the suggested optimization for the first child.
127395d
to
d072321
Compare
Thanks for the review @gnprice! Pushed an update, PTAL. |
d072321
to
84ff50f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision! This logic generally all looks good; small comments below.
Would you also split the changes into two commits? I think that would help with thinking through exactly what cases are affected. Specifically:
- A first commit is nearly NFC, and moves the logic from
parseBlockContent
up to its callers. I think this can be NFC apart from some unimplemented-content cases. (It wouldn't change which possible messages contain unimplemented content, but might change the details of what the unimplemented nodes look like.) - Then the second commit makes the desired substantive change here, supporting multiple blocks in one
p
element.
lib/model/content.dart
Outdated
if (child case dom.Element(localName: 'span', className: 'katex-display')) { | ||
final texSource = _parseMath(child, block: true); | ||
if (texSource != null) { | ||
result.add(MathBlockNode(texSource: texSource, debugHtmlNode: child)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, #1156 (comment) was maybe ambiguous: this should be child
when in debug mode (rather than the parent node as it was in the previous revision), but debugHtmlNode
should always be null when not in debug mode.
That way we avoid holding onto these DOM nodes after we're done parsing the content — I think they could be a significant amount of memory.
// The case with the `<br>\n` can happen when at the end of a quote; | ||
// it seems like a glitch in the server's Markdown processing, | ||
// so hopefully there just aren't any further such glitches. | ||
return parseMathBlock(child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this the only call site to parseMathBlock
? That can be removed, then.
(Doing so in the same commit as the other changes is helpful for understanding those other changes, even, because it makes visible in the same diff the old code that the new code corresponds to.)
lib/model/content.dart
Outdated
// on web but each node has `display: block`, which renders each node | ||
// on a newline. Since the emitted MathBlockNode are BlockContentNode, | ||
// we skip these newlines here to replicate the same behavior as on web. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// on web but each node has `display: block`, which renders each node | |
// on a newline. Since the emitted MathBlockNode are BlockContentNode, | |
// we skip these newlines here to replicate the same behavior as on web. | |
// on web but each node has `display: block`, which renders each node | |
// on a new line. Since the emitted MathBlockNode are BlockContentNode, | |
// we skip these newlines here to replicate the same behavior as on web. |
A "newline" is a particular character (or in some contexts a particular pair of characters); a "new line" is just a line that's new (for any of the meanings that "line" or "new" could have, depending on context). So "newline" is right in the last sentence here but not the one before it.
lib/model/content.dart
Outdated
if (node case | ||
dom.Element(localName: 'p', nodes: [ | ||
dom.Element(localName: 'span', className: 'katex-display'), ...])) { | ||
if (currentParagraph.isNotEmpty) consumeParagraph(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: condition should be indented a level deeper than body (so it doesn't look like the start of the body)
lib/model/content.dart
Outdated
if (node case | ||
dom.Element(localName: 'p', nodes: [ | ||
dom.Element(localName: 'span', className: 'katex-display'), ...])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check className
on the p
element, too, like the existing logic did. In general our parser keeps to specific expectations of how className
should look, as well as localName
, in order for some HTML to be recognized as a pattern it knows about.
So combined with the nit about indentation:
if (node case | |
dom.Element(localName: 'p', nodes: [ | |
dom.Element(localName: 'span', className: 'katex-display'), ...])) { | |
if (node case dom.Element(localName: 'p', className: '', nodes: [ | |
dom.Element(localName: 'span', className: 'katex-display'), ...])) { |
lib/model/content.dart
Outdated
if (nodes | ||
case [..., dom.Element(localName: 'br'), dom.Text(text: '\n')] | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this fits on one line, right?
664ab54
to
b6036c9
Compare
Thanks for the review @gnprice! Pushed an update, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision! This all looks great; just a couple of small comments below about the commit structure.
lib/model/content.dart
Outdated
} else { | ||
if (currentParagraph.isNotEmpty) consumeParagraph(); | ||
if (imageNodes.isNotEmpty) consumeImageNodes(); | ||
result.add(UnimplementedBlockContentNode(htmlNode: node)); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this first commit is not quite NFC because the behavior changes in this case. In the old code, I believe this would end up with a ParagraphNode containing an UnimplementedInlineContentNode (and maybe other nodes), instead of with this one UnimplementedBlockContentNode.
That's the non-NFC-ness I was expecting at #1156 (review) . The change is fine; just the commit message should be adjusted: remove the NFC tag, and have the body explain that it's NFC except for this particular behavior change.
(I did work through the logic just now and satisfy myself that there's no other behavior change in this commit. So this splitting of commits was helpful, thanks.)
lib/model/content.dart
Outdated
if (currentParagraph.isNotEmpty) consumeParagraph(); | ||
if (imageNodes.isNotEmpty) consumeImageNodes(); | ||
result.add(parseMathBlock(child)); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about moving the two "if not empty, consume" lines above this if/else, and the continue
below it?
That would deduplicate them between the if
and the else
cases, showing that those parts don't vary. And it would also reduce the diff in the second commit.
b6036c9
to
1d737e4
Compare
Thanks for the review @gnprice! Pushed an update, PTAL. |
Thanks! Looks good; merging. |
Prepares for zulip#1130, this commit is almost NFC with only difference that, in an error case we previously emitted a `ParagraphNode` containing an `UnimplementedInlineContentNode` (along with any adjacent nodes), now we emit a single `UnimplementedBlockContentNode` instead.
1d737e4
to
866faf5
Compare
Fixes: #1130