Skip to content

Conversation

@ruricolist
Copy link
Contributor

I added this change to the previous pull request, but it didn't get merged, I assume because I added it late and it was overlooked. It fixes a bug introduced by the previous fix to DTD embedding, which can result in errors due to duplicate internal subsets and breaks ~60 tests in the SAX test suite (it does not affect Klacks).

Copy link
Member

@scymtym scymtym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what is going on.

This change causes an error to be signaled from the parser before sax:start-internal-subset is called, but the sink, which must be a sink instance for the have-internal-subset call to work, does its own check:

(defmethod sax:start-internal-subset ((sink sink))
  (when (have-internal-subset sink)
    (error "duplicate internal subset"))
  …)

(ensure-dtd)
(sax:start-internal-subset (handler *ctx*))
;; Avoid duplicate internal subsets.
(unless (ignore-errors (have-internal-subset (handler *ctx*)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the ignore-errors necessary? The only method on have-internal-subset is a slot reader, the slot has :initform nil and there doesn't seem to be a `slot-makunbound' call for the slot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just running the tests, I see that cxml:sax-proxy does not inherit from cxml:sink, and so does not have an internal-subset slot at all, and rune-dom::dom-builder, while it does inherit from cxml:sink, overrides the definition of the slot and does not provide an initform. Of course there might be others.

@ruricolist
Copy link
Contributor Author

ruricolist commented Oct 9, 2018

The bug this pull request addresses only arises with XML documents that have both a DTD and an internal subset.

The patch in 731dd98 solves the problem of getting the serializer to print definitions from DTDs as an internal subset by triggering the start-internal-subset and end-internal-subset events when encountering a DTD.

The problem is that a document could actually have both a DTD and an internal subset, in which case start-internal-subset gets called twice and signals an error.

The simplest thing is to avoid calling start-internal-subset twice and thus avoid the error. Alternately, it might (or might not) be worthwhile to dig into the doctype parsing and serializing code and find a better solution to the problem of serializing documents with DTDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants