Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declaring summary to have markup other than text/html? #620

Open
trwnh opened this issue Oct 17, 2024 · 5 comments
Open

Declaring summary to have markup other than text/html? #620

trwnh opened this issue Oct 17, 2024 · 5 comments
Labels
Needs primer page Need to add a page at https://www.w3.org/wiki/Activity_Streams/Primer on this topic needs-fep Needs a FEP

Comments

@trwnh
Copy link

trwnh commented Oct 17, 2024

Description of issue

name is defined as "A simple, human-readable, plain-text name for the object. HTML markup MUST NOT be included."

summary is defined as "A natural language summarization of the object encoded as HTML."

content includes in its definition that "By default, the value of content is HTML. The mediaType property can be used in the object to indicate a different content type."

So to synthesize these three definitions:

  • name is always text/plain
  • content is whatever the value of mediaType is, where mediaType defaults to text/html
  • summary is always text/html?

But there are cases where a producer might want to signal a different content type for summary; for example, text/plain or text/markdown. Recently, mastodon/mastodon#32538 came up as an example of wanting to produce a summary that is NOT text/html. So the question is, might it make sense to provide a mechanism for declaring that summary is something other than text/html?

Potential solutions

  • Extending the definition of mediaType to cover both content and summary could work, but would prevent using different formats for each of the two separately.
  • Defining a separate property like mediaTypeOfSummary seems clunky, but might end up making sense or being necessary.
  • More explicit construction of a "text node" where every @value can have its own mediaType, although this would be pretty complicated and not backwards-compatible. (It would also break JSON-LD language containers, so nameMap/summaryMap/contentMap would not work.)

Action items

  • Needs Primer: guidance on the natural language properties, including a section on the content-type considerations (and potential sanitizing/stripping resulting from that)
  • Next Version: Possibly allowing summary to be something other than text/html, with either mediaType extending to cover it, or defining mediaTypeOfSummary as an analogous property.
@nightpool
Copy link
Collaborator

nightpool commented Oct 17, 2024 via email

@evanp evanp added Needs primer page Need to add a page at https://www.w3.org/wiki/Activity_Streams/Primer on this topic needs-fep Needs a FEP labels Oct 25, 2024
Copy link

This issue has been labelled as potentially needing a FEP, and contributors are welcome to submit a FEP on the topic.
Note that issues may be closed without the FEP being created; that does not mean that the FEP is no longer needed.

@evanp
Copy link
Collaborator

evanp commented Oct 25, 2024

So, I think the problem with adding flags to indicate that the summary is not HTML is that it's not backwards compatible; consumers will expect summary to always be HTML as documented.

I agree that a primer page makes sense.

I'd also suggest a FEP for defining a new description or other property that can have different media types. Using a new property instead of summary allows us to define new semantics for that property, that aren't encumbered with the pretty strict requirement that summary be HTML.

@evanp
Copy link
Collaborator

evanp commented Oct 25, 2024

One thing about the primer page is that there is the question of when an object does not have a name and should have a summary without HTML. Not all plain text is valid HTML; for example, text that uses unescaped characters that are meaningful in HTML like <>'".

@trwnh
Copy link
Author

trwnh commented Nov 5, 2024

why any content type other than HTML would be useful or preferred.

minimal example where HTML parsing is destructive:

{
  "summary": "I am trying to serialize the RDF statement <Alice> <knows> <bob> into plain-text, but a naive HTML sanitizer is stripping the statement completely"
}
{
  "summary": "I am trying to serialize the RDF statement  into plain-text, but a naive HTML sanitizer is stripping the statement completely"
}

the workaround is to HTML-escape the angle brackets which might not be unescaped by every consumer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs primer page Need to add a page at https://www.w3.org/wiki/Activity_Streams/Primer on this topic needs-fep Needs a FEP
Projects
None yet
Development

No branches or pull requests

3 participants