Skip to content

Does not parse HTML properly #18

Open
@wosc

Description

@wosc

Our production application contains quite a few inline <script> tags with accumulated javascript inside. An excerpt looks like this:

<head>
<script>
// snip
                            if ( something < other ) {
// snip
                            // explanatory comment: we replace " and ' as late as possible
// snip
</script>

<esi:remove>This directive is not executed</esi:remove>
</head>

When processing this kind of content, the esi crate does not execute any esi-directives (at least inside <head> in the example, directives later in <body> are picked up). I guess this is due to using quick_xml as the parser, which expects XML, where e.g. < inside the script tag would have to be escaped as &lt;, but is getting HTML, where the escaping rules are much more relaxed -- and conversely, applying XML-style escapes in an HTML document results in JavaScript syntax errors, so that's not a solution. I think we really need an HTML-aware parser here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions