-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt of getting <pre> to not parse inner contents similar to <script> #582
Conversation
Is this behavior specified in the HTML parsing specification? |
@jdm your question is hard to answer! html standard related to
|
@jdm I checked the tests in html-serializer.rs and they seem correct!
I think that html5ever handles it correctly and in sauron, in order to do DOM / vDOM diff/patches, there needs to be a translatation into a different node abstraction. I've added a check on the tag name for "pre" and execute this code instead: fn process_handle<MSG>(node: &Handle) -> Result<Option<Node<MSG>>, ParseError> {
let children: Vec<Node<MSG>> = node
.children
.borrow()
.iter()
.filter_map(|child| process_handle(child).ok().flatten())
.collect();
match &node.data {
NodeData::Document => {
let child_nodes_len = children.len();
match child_nodes_len {
0 => Ok(Some(node_list([]))),
1 => Ok(Some(children.into_iter().next().unwrap())),
_ => Ok(Some(node_list(children))),
}
}
NodeData::Text { contents } => {
let content = contents.borrow().to_string();
Ok(Some(text(content)))
}
NodeData::Element { name, attrs, .. } => {
let tag_name = name.local.to_string();
if tag_name == "pre".to_string() {
//println!("tag_name: {}", tag_name);
let mut buffer: Vec::<u8> = vec![];
let document: SerializableHandle = node.clone().into();
serialize(&mut buffer, &document, Default::default()).expect("serialization failed");
let writer_string = String::from_utf8(buffer).expect("Could not write buffer as string");
println!("--- {} ---", writer_string);
let content: String = format!("<pre>{}</pre>", writer_string);
Ok(Some(text(content)))
}
else { In words: When a test
|
The html5ever implementation for |
This branch is used to implement the fix required to the issue: #580
Motivation
With the current implementation the parser will evaluate arbitraty html tags inside a
<pre>...</pre>
and with this patch,<pre>
will behave more like<script>
.This behaviour should be optional as sometimes it also makes sense to parse tags inside a
<pre>
, for instance for styling but most often the content inside a<pre>
should be pretty much ignored and copied 1:1 from the source document into the generated output document and not reformatted (removing spaces, newlines or tabs) or should the parsed content have any influence on the overal consistenty of the document.That said:
<html><pre></html>test foo</pre></html>
should not be fixed into<html><pre>test foo</pre></html>
Status
branch: servo_issue_580 with hash: 2094a85
This evaluates:
<hello>XML</hello><pre>\n<bad> </bad>text-in pre</pre><p>asdf</p><script>script</html> magic string</script>
into
<html><head></head><body><hello>XML</hello><pre>\n<bad> </bad>text-in pre</pre><p>asdf</p><script>script</html> magic string</script></body></html>
This shows that the content inside the
<pre>...</pre>
is grabbed and not parsed already. Yet the result should be no HTML escaped string but rather a 1:1 copy of the original tags.This can be evaluated by running:
clear && cargo run --example html2html
Todo
process_to_completion
is called for<script>
but not for<pre>
<pre>...</pre>
content or not