Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of self-closing syntax #254

Merged
merged 2 commits into from
Dec 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion c-api/include/lol_html.h
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,15 @@ int lol_html_element_tag_name_set(
size_t name_len
);

// Whether the element is explicitly self-closing, e.g. `<foo />`.
// Whether the tag syntactically ends with `/>`. In HTML content this is purely a decorative, unnecessary, and has no effect of any kind.
//
// The `/>` syntax only affects parsing of elements in foreign content (SVG and MathML).
// It will never close any HTML tags that aren't already defined as void in HTML.
//
// This function only reports the parsed syntax, and will not report which elements are actually void in HTML.
// Use `lol_html_element_can_have_content` to check if the element is non-void.
//
// If the `/` is part of an unquoted attribute, it's not parsed as the self-closing syntax.
bool lol_html_element_is_self_closing(
lol_html_element_t *element
);
Expand Down
30 changes: 26 additions & 4 deletions src/rewritable_units/element.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,10 @@ impl<'r, 't, H: HandlerTypes> Element<'r, 't, H> {
}

/// Sets the tag name of the element.
///
/// The new tag name must be in the same namespace, have the same content model, and be valid in its location.
/// Otherwise change of the tag name may cause the resulting document to be parsed in an unexpected way,
/// out of sync with this library.
#[inline]
pub fn set_tag_name(&mut self, name: &str) -> Result<(), TagNameError> {
let name = self.tag_name_bytes_from_str(name)?;
Expand All @@ -134,16 +138,31 @@ impl<'r, 't, H: HandlerTypes> Element<'r, 't, H> {
Ok(())
}

/// Whether the element is explicitly self-closing, e.g. `<foo />`.
/// Whether the tag syntactically ends with `/>`. In HTML content this is purely a decorative, unnecessary, and has no effect of any kind.
///
/// The `/>` syntax only affects parsing of elements in foreign content (SVG and MathML).
/// It will never close any HTML tags that aren't already defined as [void][spec] in HTML.
///
/// This function only reports the parsed syntax, and will not report which elements are actually void in HTML.
/// Use [`can_have_content()`][Self::can_have_content] to check if the element is non-void.
///
/// [spec]: https://html.spec.whatwg.org/multipage/syntax.html#start-tags
///
/// If the `/` is part of an unquoted attribute, it's not parsed as the self-closing syntax.
#[inline]
#[must_use]
pub fn is_self_closing(&self) -> bool {
self.start_tag.self_closing()
}

/// Whether the element can have inner content. Returns `true` unless the element is an [HTML void
/// element](https://html.spec.whatwg.org/multipage/syntax.html#void-elements) or has a
/// self-closing tag (eg, `<foo />`).
/// Whether the element can have inner content.
///
/// Returns `true` if the element isn't a [void element in HTML][void],
/// or is in **foreign content** and doesn't have a self-closing tag (eg, `<svg />`).
///
/// [void]: https://html.spec.whatwg.org/multipage/syntax.html#void-elements
///
/// Note that the self-closing syntax has no effect in HTML content.
#[inline]
#[must_use]
pub fn can_have_content(&self) -> bool {
Expand Down Expand Up @@ -351,6 +370,7 @@ impl<'r, 't, H: HandlerTypes> Element<'r, 't, H> {

fn prepend_chunk(&mut self, chunk: StringChunk) {
if self.can_have_content {
self.start_tag.set_self_closing_syntax(false);
self.start_tag
.mutations
.mutate()
Expand Down Expand Up @@ -415,6 +435,7 @@ impl<'r, 't, H: HandlerTypes> Element<'r, 't, H> {

fn append_chunk(&mut self, chunk: StringChunk) {
if self.can_have_content {
self.start_tag.set_self_closing_syntax(false);
self.end_tag_mutations_mut().content_before.push_back(chunk);
}
}
Expand Down Expand Up @@ -473,6 +494,7 @@ impl<'r, 't, H: HandlerTypes> Element<'r, 't, H> {

fn set_inner_content_chunk(&mut self, chunk: StringChunk) {
if self.can_have_content {
self.start_tag.set_self_closing_syntax(false);
self.remove_content();
self.start_tag
.mutations
Expand Down
15 changes: 14 additions & 1 deletion src/rewritable_units/tokens/start_tag.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,25 @@ impl<'i> StartTag<'i> {
}
}

/// Whether the tag is explicitly self-closing, e.g. `<foo />`.
/// Whether the tag syntactically ends with `/>`. In HTML content this is purely a decorative, unnecessary, and has no effect of any kind.
///
/// The `/>` syntax only affects parsing of elements in foreign content (SVG and MathML).
/// It will never close any HTML tags that aren't already defined as [void](spec) in HTML.
///
/// This function only reports the parsed syntax, and will not report which elements are actually void in HTML.
///
/// [spec]: https://html.spec.whatwg.org/multipage/syntax.html#start-tags
///
/// If the `/` is part of an unquoted attribute, it's not parsed as the self-closing syntax.
#[inline]
pub fn self_closing(&self) -> bool {
self.self_closing
}

pub(crate) fn set_self_closing_syntax(&mut self, has_slash: bool) {
self.self_closing = has_slash;
}

/// Inserts `content` before the start tag.
///
/// Consequent calls to the method append `content` to the previously inserted content.
Expand Down
24 changes: 24 additions & 0 deletions src/rewriter/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,30 @@ mod tests {
assert_eq!(res, "<!-- 42 --><span><!--hello--></span>");
}

#[test]
fn rewrite_incorrect_self_closing() {
let res = rewrite_str::<LocalHandlerTypes>(
"<title /></title><div/></div><style /></style><script /></script>
<br/><br><embed/><embed> <svg><a/><path/><path></path></svg>",
RewriteStrSettings {
element_content_handlers: vec![element!("*:not(svg)", |el| {
el.set_attribute("s", if el.is_self_closing() { "y" } else { "n" })?;
el.set_attribute("c", if el.can_have_content() { "y" } else { "n" })?;
el.append("…", ContentType::Text);
Ok(())
})],
..RewriteStrSettings::new()
},
)
.unwrap();

assert_eq!(
res,
r#"<title s="y" c="y">…</title><div s="y" c="y">…</div><style s="y" c="y">…</style><script s="y" c="y">…</script>
<br s="y" c="n" /><br s="n" c="n"><embed s="y" c="n" /><embed s="n" c="n"> <svg><a s="y" c="n" /><path s="y" c="n" /><path s="n" c="y">…</path></svg>"#
);
}

#[test]
fn rewrite_arbitrary_settings() {
let res = rewrite_str("<span>Some text</span>", Settings::new()).unwrap();
Expand Down
Loading