Skip to content

Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? #929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brentzundel opened this issue Sep 15, 2022 · 80 comments
Closed

Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? #929

brentzundel opened this issue Sep 15, 2022 · 80 comments
Labels

Comments

@brentzundel
Copy link
Member

brentzundel commented Sep 15, 2022

Tracking Issue from TPAC: What does JSON-LD compatible JSON mean?

Please use comments to make concrete proposals.

@dlongley
Copy link
Contributor

First attempt at getting things started...

The core data model should be RDF, but serialized using a profile of JSON-LD that is idiomatic JSON. This should be a contextualized, compact form that could be easily checked against a JSON schema and then consumed by JSON developers that are otherwise unfamiliar with JSON-LD.

@nadalin
Copy link

nadalin commented Sep 15, 2022

First attempt at getting things started...

The core data model should be RDF, but serialized using a profile of JSON-LD that is idiomatic JSON. This should be a contextualized, compact form that could be easily checked against a JSON schema and then consumed by JSON developers that are otherwise unfamiliar with JSON-LD.

What would be the requirement for RDF from a data model perspective, I don't see anything in the data model that would require RDF

@dlongley
Copy link
Contributor

dlongley commented Sep 15, 2022

The data model today is essentially subject-property-value statements -- and containers / wrappers around those statements (aka "graphs"). This is essentially RDF so we should just reuse it, it's a standard (if this buys us having to write some text).

@msporny
Copy link
Member

msporny commented Oct 4, 2022

A cut at some language that might provide some clarity around what "JSON-LD compatible JSON" means. The Verifiable Credentials Data Model is "JSON-LD compatible JSON", which means the following:

  • All Verifiable Credentials expressed MUST utilize the @context parameter where the values SHOULD be URLs. That is, in-line JSON-LD Contexts are strongly discouraged (we might even want to go as far as forbidding them).
  • JSON-LD Compact Form is the only allowed form of JSON-LD. JSON-LD expanded form is disallowed to eliminate the requirement to always perform JSON-LD processing in processing pipelines where its not needed.
  • The underlying data model is JSON-LD, which is a superset of (and round-trippable to/from) RDF.

In order to make development easier for developers coming from a JSON background, we might consider:

  • The Verifiable Credentials specification will provide an "experimental" JSON-LD Context (https://www.w3.org/ns/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process.
  • Implementations SHOULD reject verification of any VC that utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment.

The above makes the data model crystal clear, is compatible with all known JSON-LD processors, helps developers coming from a purely JSON background to get started quickly, and retains JSON-only processing modes that are compatible with JSON-LD. Some concrete proposals that we could put in front of the group are:

PROPOSAL: Verifiable Credentials MUST utilize the @context parameter where the values SHOULD be URLs.

PROPOSAL: Verifiable Credentials SHOULD NOT utilize inline JSON-LD Contexts (objects as values) for the @context.

PROPOSAL: Verifiable Credentials MUST be expressed in JSON-LD Compact form.

PROPOSAL: The underlying data model for Verifiable Credentials is JSON-LD.

And the proposals to help make development easier for developers coming from a JSON background:

PROPOSAL: As an initial iteration on the idea, the Verifiable Credentials specification will define an "experimental" JSON-LD Context (https://www.w3.org/2018/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process.

PROPOSAL: A conforming processor SHOULD raise an error if a VC utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment. The definition of "production environment" is left as an exercise to the implementer.

/cc @OR13 @tplooker @mprorock @peacekeeper @philarcher @mkhraisha @dlongley @brentzundel @Sakurann

@msporny msporny changed the title Tracking Issue from TPAC: What does JSON-compatible JSON-LD mean? Tracking Issue from TPAC: What does JSON-LD compatible JSON mean? Oct 4, 2022
@OR13
Copy link
Contributor

OR13 commented Oct 4, 2022

@msporny Thank you for taking the time to write this up!

That is, in-line JSON-LD Contexts are strongly discouraged (we might even want to go as far as forbidding them).

-1 to this guidance... it also contradicts conversations with schema.org / google regarding usage of JSON-LD on web pages.

Suggest the working group NOT provide guidance of this form in a W3C TR.

JSON-LD Compact Form is the only allowed form of JSON-LD. JSON-LD expanded form is disallowed to eliminate the requirement to always perform JSON-LD processing in processing pipelines where its not needed.

+1 to this.

The underlying data model is JSON-LD, which is a superset of (and round-trippable to/from) RDF.

+1 to this.

The Verifiable Credentials specification will provide an "experimental" JSON-LD Context (https://www.w3.org/ns/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process.
Implementations SHOULD reject verification of any VC that utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment.

-1 to this, counter offer:

{
  "@context": [
    // "https://www.w3.org/2018/credentials/v1",
    "https://www.w3.org/ns/credentials/v2",
    // "https://www.w3.org/2018/credentials/examples/v1"
    { "@vocab": "https://www.w3.org/ns/credentials#" } 
  ],
  "id": "http://example.edu/credentials/1872",
  "type": ["VerifiableCredential", "NewCredentialType"],
  "issuer": { 
    "id": "did:example:123", 
     "type": ["Organization", "OrganizationType"] 
   },
  "issuanceDate": "2010-01-01T19:23:24Z",
  "credentialSubject": {
    "id": "did:example:456", 
    "type": ["Person", "JobType"],
    "claimName": "claimValue"
  }
}

#935

PROPOSAL: Verifiable Credentials MUST utilize the @context parameter where the values SHOULD be URLs.

+1

PROPOSAL: Verifiable Credentials SHOULD NOT utilize inline JSON-LD Contexts (objects as values) for the @context.

-1 (with serious concerns about how this will cripple adoption in certain use cases).

Suggest the working group NOT provide guidance of this form in a W3C TR.

PROPOSAL: Verifiable Credentials MUST be expressed in JSON-LD Compact form.

+1

PROPOSAL: The underlying data model for Verifiable Credentials is JSON-LD.

+1 (noting that we don't need to propose this, @context is required in v1.1 and the data model is JSON).

PROPOSAL: As an initial iteration on the idea, the Verifiable Credentials specification will define an "experimental" JSON-LD Context (https://www.w3.org/2018/credentials/experimental/v1, with an @vocab value set to https://www.w3.org/2018/credentials/undefined# such that developers need not define a JSON-LD Context or Vocabulary semantics as an initial step in the development process.

-1 to "experimental"... but this proposal could probably be restructured in a way that I would accept... see my counter offer.

Suggest the working group NOT provide guidance of this form in a W3C TR.

PROPOSAL: A conforming processor SHOULD raise an error if a VC utilizes the https://www.w3.org/ns/credentials/experimental/v1 JSON-LD Context in a production environment. The definition of "production environment" is left as an exercise to the implementer.

-1 to this.

Suggest the working group NOT provide guidance of this form in a W3C TR.

@msporny
Copy link
Member

msporny commented Oct 4, 2022

@OR13 (and anyone else that weighs in) could you please re-edit your comment above and 1) explain your -1s in more depth, and ideally, 2) provide counter-proposals for all -1s. It will help us figure out areas where it might be possible to reach consensus. Like the @vocab / "experimental" thing seems close... but the schema.org thing feels like it needs a lot more discussion (there's history there that much of the WG is probably missing).

@OR13
Copy link
Contributor

OR13 commented Oct 4, 2022

edited, mostly my counter proposals are "don't say this in a W3C TR."... leave the power in the hands of the developers / system builders and users.

@dlongley
Copy link
Contributor

dlongley commented Oct 4, 2022

+1 to all the proposals in #929 (comment).

@selfissued
Copy link
Contributor

selfissued commented Oct 4, 2022

It’s time to let JSON be JSON

The Verifiable Credentials spec currently has conflicting guidance about the use of @context when the VC is not JSON-LD. On one hand, Section 6.1 (JSON) describes the pure JSON representation without making any requirements to use @context. On the other hand, Section 4.1 (Contexts) says “Verifiable credentials and verifiable presentations MUST include a @context property.” Later in the same section it says “Though this specification requires that a @context property be present, it is not required that the value of the @context property be processed using JSON-LD. This is to support processing using plain JSON libraries”.

Yes, it’s clear what @context means when the VC is JSON-LD. It’s also very unclear what @context means when the VC is pure JSON.

Now that we have experience with deployments of Verifiable Credentials, it’s clear that many developers don’t know how to use @context. As a result, they’ve deployed non-interoperable VCs. As Joe Andrieu said during TPAC in Vancouver, “If we didn’t have Dave Longley as a resource to help us, we wouldn’t have known how to get @context right.” Joe’s far from alone.

Proposal

Modify Section 4.1 (Contexts) to say that @context MUST be present when the VC is JSON-LD and that @context MUST NOT be present when the VC is JSON but not JSON-LD. This has multiple benefits for developers:

  1. When @context is present, it’s an unmistakable indication that the VC is JSON-LD and all JSON-LD processing rules apply.
  2. When @context is not present, it’s an unmistakable indication that the VC is not JSON-LD and no JSON-LD processing rules apply.
  3. When @context is not present, developers do not bear the complexity burden of JSON-LD.

Answering the Issue Question

The tracking issue asked the question “What does JSON-LD compatible JSON mean?”. Given the proposal above, the answer is crystal clear: JSON-LD compatible JSON means JSON-LD; JSON that is not compliant with JSON-LD is not JSON-LD compatible JSON.

@TallTed
Copy link
Member

TallTed commented Oct 5, 2022

@selfissued -- It's time to let @context be @context. Please edit your latest comment, and wrap all 13 occurrences of @context in code fences. That GitHub user is not part of any relevant groups, and does not need to be pinged every time a comment is made on this thread.

Also, please note that, in fact, JSON that uses URIs for all terms, thus requiring no mapping from "simple" term literals to URIs via @context, is also "JSON-LD compatible JSON". I think this "JSON-LD compatible JSON" would impose no "complexity burden of JSON-LD" (by which I think you actually mean a "complexity burden of @context") because there is no @context and no term mapping; each term URI should be interpreted and maintained exactly as written.

@nadalin
Copy link

nadalin commented Oct 5, 2022 via email

@nadalin
Copy link

nadalin commented Oct 5, 2022 via email

@msporny
Copy link
Member

msporny commented Oct 5, 2022

@selfissued wrote:

@context MUST NOT be present when the VC is JSON

@nadalin wrote:

There is absolutely no reason to have or process a @context

Could either of your please explain what the extensibility model is for your proposal? How are global semantics achieved?

In the past, the answer has been some variation of:

  • Use a centralized registry to register terms across every market vertical in existence.
  • There are no global semantics, only local semantics based on agreement within a market vertical and/or use case.

Is something new being offered this time around?

@David-Chadwick
Copy link
Contributor

David-Chadwick commented Oct 5, 2022 via email

@David-Chadwick
Copy link
Contributor

David-Chadwick commented Oct 5, 2022 via email

@nadalin
Copy link

nadalin commented Oct 5, 2022 via email

@David-Chadwick
Copy link
Contributor

David-Chadwick commented Oct 5, 2022 via email

@nadalin
Copy link

nadalin commented Oct 5, 2022 via email

@msporny
Copy link
Member

msporny commented Oct 5, 2022

@David-Chadwick wrote:

This sounds like a broken proposal to me.

Yes, agreed. Removing a feature that provides global interoperability and then replacing it with nothing will lead to non-interoperability; thus, it seems a non-solution is being proposed.

"let JSON be JSON" and "keep things simple" are sound bites. Being generous, they could be construed as design guidelines; it is not a workable technical architecture.

Please answer the question being asked instead of stating that we do not need a feature that has achieved consensus (multiple times) over many years. To re-state the question:

Could either of your please explain what the extensibility model is for your proposal? How are global semantics (and thus, interop) achieved?

@dlongley
Copy link
Contributor

dlongley commented Oct 5, 2022

I agree with "let's keep simple things simple". They should be as simple as they can be -- but not simpler. Here's my perspective on that:


If all you want to do is share a couple of very common fields of information between a closed or mostly closed system of well-known data providers, use a standard that doesn't place constraints on data modeling.

If you don't care about data providers tracking your users behavior or want this property so that you can monetize it, or if you generally don't care about the "phone home problem", use a standard that doesn't have the extra features designed to work against this.

If you don't mind using a centralized registry for achieving interop with how you modeled your data or your data is simple enough so that this solution meets your scaling needs, don't use a standard that introduces a decentralized mechanism.

If you don't need to be able to atomize your data into simple statements, no matter how they are nested, so that they can be merged and linked with other data to build powerful knowledge graphs, or selectively disclosed, don't use a standard that requires you to apply constraints to how you model your data.

If your use cases fit into the above -- use a JWT. JWTs aren't opinionated on the data model, just the data format: JSON. They don't impose any extra constraints to achieve additional use cases -- because they aren't designed for those and you, in particular, don't need them.


Now, what about everyone else?


If you have more than a few common fields of information (perhaps you even have rich and well-connected data) and you want it to be easily shareable and usable across an open ecosystem where you do not even know who might consume it, use a standard that places a simple, minimum set of constraints on data modeling to enable this to happen.

If you care about privacy issues with data providers tracking your users behavior and want to stop the "phone home problem", use a standard designed to help achieve this.

If you don't want to use a centralized registry for defining your data or if that solution doesn't scale to meet your needs, use a standard that defines a decentralized mechanism.

If you want your data to be mergeable and linkable with other data to build powerful knowledge graphs via interoperable, common tooling, or selectively disclosed, use a standard that requires all data to be modeled in a common way, using the simplest constraints to achieve this goal.

If your use cases fit into this section here, use a VC.


VCs specify a data model, not just a data format. The data model, when expressed in JSON, says that each object is a "thing" with properties. The properties are expressed as JSON keys and link to other values or other things -- where the same rules then repeat. It is true that you cannot just "do whatever you want" when modeling your data, because then there is no common structure for interoperable tools to work with; instead, all the data is bespoke and looks different, which fails to meet the requirements. This data model is the simplest set of constraints to understand and apply when modeling your data -- to achieve the above requirements. And that's why we should do it: it keeps things as simple as they can be, but not simpler.

VCs provide a decentralized registry mechanism called @context, borrowed from another standard, JSON-LD. This mechanism is the simplest, standard way to map simple JSON keys onto globally unambiguous URLs.

So, yes, let's keep simple things simple, but not simpler. That includes making sure we understand that we have a common data model and a decentralized registry mechanism in order to meet requirements that these use cases have. For those use cases that don't have any of these requirements, you don't need to use VCs. Use something like a JWT -- that's a technology that's been around for over a decade. If your use case can be solved with it -- do it. If not, and you've been waiting for over a decade for a standard that provides the additional features you need to solve your use case, VCs may be for you. But it is not simpler to try to change VCs to look like another standard that already exists but is too simple to meet the requirements. That only creates two standards that address the same use cases -- and that both fail to specify what people need to achieve interoperability on other ones.

@nadalin
Copy link

nadalin commented Oct 5, 2022 via email

@TallTed
Copy link
Member

TallTed commented Oct 5, 2022

@David-Chadwick, @nadalin — Please revisit your comments, above, and

  1. remove all unnecessarily quoted content, which makes it harder to digest your comments
  2. codefence all remaining occurrences of @context and any other @ entities, as it's well beyond unfriendly to constantly ping GitHub users who are not otherwise participating in our conversations!

@selfissued
Copy link
Contributor

@msporny wrote:

Could either of your please explain what the extensibility model is for your proposal? How are global semantics achieved?

The extensibility model is the normal JSON one: Add fields as you need them. If you want them to be globally interoperable use collision-resistant names or register the names in the appropriate claims registry. This model is described at https://datatracker.ietf.org/doc/html/rfc7519#section-4 and implemented in the registry https://www.iana.org/assignments/jwt/jwt.xhtml#claims. We can and likely will have a similar registry for interoperable VC claims. This is all normal JSON interop stuff.

@OR13
Copy link
Contributor

OR13 commented Oct 5, 2022

See also:

Screen Shot 2022-10-05 at 10 33 52 AM

I'm firmly against, any changes that causes interoperability issues for regulators verifying credentials in VC-JWT or Data Integrity form... some of the proposals in this thread are heading that direction, and I don't believe the W3C is the right place for that work.

At IETF, we have the ability to sign arbitrary data as JSON or CBOR... We don't need another way to do this at W3C.

@iherman
Copy link
Member

iherman commented Oct 5, 2022

The issue was discussed in a meeting on 2022-10-05

  • no resolutions were taken
View the transcript

3. Concrete Proposals for Core Data Model.

See github issue vc-data-model#929.

Manu Sporny: attempt at a number of proposals - some discussion over those proposals with a couple of +1's, some back-and-forth with Orie, wondering if the best approach is to put proposals forward to see if there's agreement.

Manu Sporny: See proposal.

Michael Jones: +1 to make VCs easier to use and develop.

Manu Sporny: there are a number of things which make JSON-LD processing mandatory, one idea here would be to not allow JSON-LD expanded form - only bald JSON-LD is one in compacted form.

Michael Prorock: +1 to easier to use and develop.

Ivan Herman: See JSON-LD compact form definition.

Manu Sporny: another, have @context use URLs and recommend against inline contexts.

Michael Prorock: -1 to guidance against inline context - see comments in issue re twitter and other discussions on schema.rg.

Manu Sporny: finally, a discussion about @vocab and make it easier for developers to pick up and use without the first thing to do being to define a JSON-LD document..
… a suggestion by Orie that its RDF, my proposal is it is JSON-LD which is a superset of RDF, others that it is JSON-only.

Michael Jones: My proposal titled "It's time to let JSON be JSON" is at #929 (comment).

Michael Jones: after discussions with a lot of people including at TPAC, there is ample evidence that there are developers who get it wrong if they use @context, and who would be happy with a more typical JSON model. The current text is halfway between, requiring an @context without requiring JSON-LD. The simplest way to resolve this is to define two kinds of credentials - ones which include @context and are JSON-LD, and ones which don't and are JSON.

Gabe Cohen: +1 Mike.

Michael Jones: it is a little unfortunate if we have two representations, but that is what we are seeing. We should restrict the usage of @context if the data is not JSON-LD, and require it if it is.

Jeremie Miller: +1 to two clearly different kinds, w/ @context and without.

Orie Steele: appreciate comment about developers, various skillsets mean that some struggle with certain technologies while others find it easier. We should strive to make it easier to implement for unskilled developers..

Michael Prorock: +1 orie - ietf vs w3c and a place for all things.

Joe Andrieu: +1 to point out that restricting context would be a violation of JWT's extensibility framework.

Shawn Butterfield: +100 Orie.

Manu Sporny: -1 (splitting into two different formats) that will guarantee a non-interoperable VC ecosystem..

Dave Longley: +1 to Orie.

Michael Prorock: -1 to splitting into two different formats.

Dave Longley: -1 to splitting into two different formats, if don't want data model constraints and open world decentralized semantics, use a JWT -- that already exists..

Michael Prorock: +1 semantics are important to this work.

Manu Sporny: +1 on semantics being important and are a key differentiator here..

Orie Steele: COSE/JOSE work in IETF, have their place in signing unstructured data. To the original point on implementation complexity, should be trivial to implement but should have value in implementing. Combining things together means that they lose the value of their specificity. My value in Verifiable credentials is that they provide semantic data..

Michael Prorock: https://lists.w3.org/Archives/Public/public-credentials/2022Sep/0253.html.

Orie Steele: example of a mill test report signed by a steel company in Mexico - want them to choose between using VC-JWT or Data Integrity - but regulators consuming the document should have the same semantic data at the end.
… mission we are on is to create an open world model for structured semantic data, treating this work as an extension of COSE/JOSE with a few new terms doesn't solve these objectives or help issuers and verifiers..

Manu Sporny: +1 to what Orie is saying..

Dave Longley: +1 to Orie.

Michael Prorock: +1 Orie.

David Chadwick: +1 to Orie (or plus infinity).

Joe Andrieu: comment on selfissued's comment - if the extensibility model is to just add whatever terms to the JSON to extend it, why not allow @context..

Orie Steele: Example of awesome work at IETF, on signing arbitrary data... https://datatracker.ietf.org/doc/html/draft-ietf-cose-countersign.

Michael Jones: My comment on the JSON-only extensibility model is at #929 (comment).

Orie Steele: +1 to adding @vocab to the core data models v2 context..

Michael Prorock: applause to Orie on comments, a reasonable proposal is that @context is required given the semantic nature of work - however, it is important to recognize that there are a large body of use cases existing in the wild that also utilize some of the properties of JSON-LD like @vocab.

Manu Sporny: -1 to adding @vocab in core context, but +1 to add it in a "poc/developer" context..

Dave Longley: +1 to adding @vocab in some way that makes it easy for less skilled developers to use, not necessarily in the core context, but perhaps in another context that can be used and will signal its usage to simple processors (that just read the @context strings).

Michael Prorock: I should also note that @vocab prevents developer errors in terms of what is getting signed or not.

Manu Sporny: It creates errors as well -- :).

Manu Sporny: However, there is a way to address this concern and we shouldn't conflate that discussion w/ the core data model discussion..

Michael Jones: not trying to change JSON extensibility model, as it is a claim with specific meaning that could conflict. We should register it as a claim in the IANA JWT claim registry. You need to use claims in the way they are registered. If you use @context, use it as it is defined..

Orie Steele: +1 to registering JWT claims, -1 to thinking that IANA registries are the only way to understand claims... we are literally here to break that cycle..

Michael Prorock: +1 orie.

Manu Sporny: +1 orie.

Joe Andrieu: +1 to letting @context be used as intended, and allowed anywhere in the JSON serialization.

Dave Longley: +1 to Orie.

Orie Steele: luckily we don't need a new standard to "just sign JSON or CBOR" :).

Manu Sporny: nor do we need a new JWT spec, it's there, if people don't want semantics -- use that. :).

Orie Steele: don't forget about signing with "sd-jwt" :).

Manu Sporny: or jwp! :).

Orie Steele: or acdcs.

Manu Sporny: or AnonCreds.

Michael Jones: Orie made a point that signing things should be a distinct activity from the type of data which is signed - we are defining what is signed, whether with JOSE, COSE, Data Integrity. The value we are adding is in defining the additional claims which are in a typical VC, and what they mean.

Dave Longley: my comments here: #929 (comment) <-- use the right tool for your use case ... if that's a JWT, use one, if it's a VC, use a VC ... but these aren't the same things and shouldn't be made to be the same..

Orie Steele: this is why we shouldn't let the core data model be "wagged" by a security format..

Kristina Yasuda: JWT spec cannot be used as-is to do a JSON-encoded VC.

Orie Steele: ? we use it in 1.1... not sure what you mean kristina..

Michael Prorock: there is a 3rd proposal on the table: @context + @vocab in core data model.

Manu Sporny: https://w3c-ccg.github.io/traceability-vocab/#credentials.

Manu Sporny: the problem here is that we are discussing splitting the ecosystem into two communities with different extensibility models. JSON-LD uses identifiers which do not require a central registry, while JSON would defines claims in a centralized registry..

Orie Steele: Don't look at us... look at schema.org, GS1, UN CEFACT, CHEBI, QUDT, FIBO, etc....

Manu Sporny: if you just look at the traceability work, the amount of claims necessary would be massive. Argument is to go register claims in a centralized registry at IANA, use reverse domain names, etc..

Joe Andrieu: +1 to domain-specific terms, managed by each domain, as they wish. No need to centralize everything into a single registry. That's an anti-pattern we're trying to fix here..

Manu Sporny: that approach has been discussed time and time again and that approach just does not scale.

Orie Steele: Look at how people are already using the open world capabilities of JSON-LD in industry today... look at knowledge graphs... look at OntoText and Neo4j..

Manu Sporny: the ramifications of splitting the data model into two things with different extensibility will split the ecosystem, and is one of the greatest things we could do to damage the ecosystem today. Today, some people are doing it wrong but things like @vocab could be used to help.

Kristina Yasuda: @orie: JWT spec defines the claims, but there is a need for a profile like vc-data-model or an ID Token section in oidc to make those claims meaningful - iss/iat/etc are all optional in the JWT spec itself.

Dave Longley: i don't understand how a "vanilla JSON 'VC' that doesn't have data model constraints and uses a centralized claims registry" would be different from a JWT -- what would we be doing here?.

Shawn Butterfield: If I am forced to include @context, but I do nothing to actually use it and none of the relying parties for my use-case rely upon it then what purpose does it serve? Requiring it isn't something I can fully support, but I can absolutely see the value in it for some use-cases, so I'm more than happy to optionally use it..

Orie Steele: kristina: ahh yes, we have the "securing specs" to handle those profiles / mappings..

Kristina Yasuda: +1 shawnb...

Manu Sporny: shawnb if you don't use use @context, what's your extensibility story?.

Dave Longley: shawnb, when you read a spec that says what the context is (what the mappings are) and you hard code your software to look for its URL identifier and its mappings, you don't have to programmatically process it..

Orie Steele: Guys... you can sign JSON today... with JOSE... why are you here if you just want to process JSON and JWTs / JWS ?.

Kristina Yasuda: Orie, umm securing is how to secure/sign; JWT body of what is signed is separate - why JWT and JWS are separate...

Manu Sporny: +1 to Orie.

Dave Longley: +1 to Orie.

Joe Andrieu: +1 to Orie.

Jeremie Miller: +1 shawnb.

Antony Nadalin: not proposing to get rid of @context, it should be optional whether you use it or not. You have troubles today because people find they are not needing it - but you are forcing the parser and logic to understand it. Mandating @context has made the world more complex - you don't need it while processing just JSON and JWTs. As far as interoperability is concerned - you hurt interoperability by forcing people to go down this route..

Kristina Yasuda: Orie, it's not how to sign, but the body of what's being signed...

Orie Steele: -1 to "hurting interop"... its like saying OIDC hurts interop.... profiling does not hurt interop, in enables it..

Michael Prorock: +1 orie (to his -1).

Dave Longley: +1 to Orie.

Kevin Dean: if we have an envelope model, where we use @context as a wrapper for the verifiable credential model where inside the envelope the issuer can do what they please..

Shawn Butterfield: @manu I don't need semantic meaning to have extensibility in the datamodel..

Shawn Butterfield: dlongley - if I do that then what purpose does @context serve for my software in processing the payload?.

Orie Steele: shawnb, not sure what your use case is, but maybe JOSE / COSE is a better fit for it ?.

David Waite: One of the issues I have with @context environmens where people are not ready to handle it, @vocab are not ignorable, especially within data integrity and cnaonicalization of RDF, as well as without it, you wind up having two different data models for the same piece of data and that matters in a security context..

Michael Prorock: semantic meaning on what a VC itself means is important.

Shawn Butterfield: Orie - yes, generally speaking it is..

Dave Longley: shawnb: it's like adding a type definitions file to make JS into TypeScript.

Orie Steele: You should use JOSE / COSE... if they are better fit for your use case... You should not try and make everyone use them, if you don't understand their usecases..

Dave Longley: shawnb: the @context URL says "these are the types used in here" -- and if your software knows that context, it doesnt' have to do any transforms, it only accepts JSON marked with that @context value..

David Waite: If you have multiple ways of expressing things and people understand that in different ways, someone might thing an object property means something specific vs. someone processing @context thinks theres an extra value there, downloading things dynamically, changes semantics as they are processing it and that's a serious security issue where you can craft messages are meant to be secure but can be interpreted in different ways by different people,.

Manu Sporny: where we're not encouraging processing as data, you haven't committed to valid semantic model for extensions where JSON developers are using static things - explosion of complexity. We are not giving people the flexibility to do both sets of tools, we are requiring them to understand security ramifications looking at data in different ways..

Shawn Butterfield: Orie: Agree, I'm not trying to make everyone use them..

Orie Steele: Imagine telling everyone that category theory and type safe languages are bad, because you can use python and javascript..

Michael Jones: We already have a split ecosystem, there are two camps, we should support both well than to be halfway inbetween that serves no one..
… responding to manu's comment - the community is already divided. We have those who speak JSON-LD correctly and those who are not. We are better off recognizing that vs leaving things halfway between.

Orie Steele: -1 to "there are 2 camps"... there are people who use JOSE / COSE and there are people who use them and the VC Data Model..


@dwaite
Copy link

dwaite commented Oct 5, 2022

That is, in-line JSON-LD Contexts are strongly discouraged (we might even want to go as far as forbidding them).

-1 to this guidance... it also contradicts conversations with schema.org / google regarding usage of JSON-LD on web pages.

Suggest the working group NOT provide guidance of this form in a W3C TR.

My problem with including inline contexts is that from a predictability perspective, JSON tools would need to evaluate @context to make sure that it is as expected for a particular type of credential - else the same JSON properties could have been redefined to have different semantic meaning and structure for JSON consumers and RDF consumers.

We can define things to be stricter, in that an implementation could compare a list of URI strings for an exact ordered match. To compare against an effective JSON-LD context is not something which I know of a current algorithm for - and for which I doubt there is a simple algorithm to accomplish.

This is also why some (including me) have advocated against such data isomorphism at the proof layer - once I know that I'm evaluating unmodified and integrity-protected data from the issuer it becomes a lot easier for optional consumption of JSON-LD data as RDF or JSON - because you know that the context and data haven't been modified by another party. If there is any manipulation going on for how different verifiers would interpret a credential, you as a verifier know (and can show via non-repudiation) that manipulation was by the issuer themselves.

@dlongley
Copy link
Contributor

dlongley commented Oct 5, 2022

@dwaite,

We can define things to be stricter, in that an implementation could compare a list of URI strings for an exact ordered match.

Note that this is all that is needed if the approved contexts use @protected definitions, in order to ensure that both JSON and RDF consumers use the same semantic meaning for known (to the application), protected terms.

@dlongley
Copy link
Contributor

dlongley commented Oct 6, 2022

@dwaite,

However, adding the ability for the documents to be transformed makes allowing for this in a security context substantially harder. That would include both malicious modifications of @context by an intermediary as I brought up, or modifications allowed by a subject during presentation via selective disclosure schemes.

Yes, but let's not commit a Nirvana Fallacy here. There are trade offs here that many would agree are well worth the canonicalization approach. I gave a presentation on this at TPAC. With the trade offs, we can either prioritize people writing security libraries (few) or application developers (orders of magnitude more).

@David-Chadwick
Copy link
Contributor

Unfortunately I do not agree that the trade offs of canonicalisation are worth it. I don't think it is a hassle for application developers to store a copy of the signed data if they need it later. If they do not they can discard it. If they change the data then canonicalisation does not help them validate the signature later on the modified data. The downside of canonicalisation is that it is a) hard to get right b)introduces potential security vulnerabilities and c) swapping storage for processing. So I prefer the benefits of a fast signing mechanism that does not require canonicalisation, and storing the signed data. Since VCs are essentially a security data structure, it is prudent to err on the side of security rather than application developers

@TallTed

This comment was marked as resolved.

@dlongley
Copy link
Contributor

dlongley commented Oct 6, 2022

@David-Chadwick,

Unfortunately I do not agree that the trade offs of canonicalisation are worth it.

This is what trade off decisions are about: you can choose based on your own needs.

@gkellogg
Copy link
Member

gkellogg commented Oct 6, 2022

I grabbed this example from traceability-vocab, and cobbled together the additional context to the best of my ability using the JSON-LD playground. The n-quad view of the (corrected) example text and this document appeared to be the same, while a JSON API working on object properties would think the issuer is asserting the organization's address and position as Santa's Workshop.

@dlehn, @gkellogg, or @pchampin are the real experts here, but, in my opinion, the value of the location must be set to "Ratke - Bergstrom"'s address and Santa's workshop is effectively ignored from the output. Indeed, the effect of the null value for the "location" term within the scoped context is that the term, and its value, is ignored from the RDF output (and, as you say, this is what the JSON-LD playground does)

In other words, there is a bug in the JSON API implementation you use ☹️

+1

The effect of setting "location": null in the scoped context intentionally causes this property and its values to be ignored. This is evidenced by the Create Term Definition Algorithm step 14.1:

  1. If value contains the entry @id and its value does not equal term:

14.1) If the @id entry of valueis null, the term is not used for IRI expansion, but is retained to be able to detect future redefinitions of this term.

So, even if there is an @vocab in scope, the "location" will not be expanded. Of course, I don't understand why you would structure an object with conflicting "rdfLocation" and "location" properties in the first place.

As a general comment on the above discussion points, the JSON-LD data model really is RDF with the main exception being that JSON-LD allows (but strongly discourages) the use of a blank node as a property. The semantics of VC, as supported by JSON-LD, imply that if the value of a property is a named graph, that that implies a direct relationship, while RDF makes no such assertion; this was a point of contention in the RDF 1.1 working group as named graphs are used for different purposes by different implementations. It's up to a given application to describe these semantics, which VC (and really JSON-LD) do here.

Also, JSON-LD was specifically designed to allow virtually any JSON to be interpreted as JSON-LD, and thus have unambiguous semantics, while JSON as use in the wild typically relies upon API documentation for the silo in which it is used to describe the meaning of property names and values. I recently discussed this on Twitter.

@dwaite
Copy link

dwaite commented Oct 6, 2022

So, even if there is an @vocab in scope, the "location" will not be expanded. Of course, I don't understand why you would structure an object with conflicting "rdfLocation" and "location" properties in the first place.

The concept is that the issuer would issue the linked example with a RDF canonicalization based signature, while another party would modify the message before handing it to the verifier.

At the verifier the security layer, needing to understand RDF canonicalization to give a pass/fail on the integrity check, would not see these as meaningful changes. The application layer, using JSON processing logic on the received document, would be operating under false assumptions on document structure and what data is actually integrity protected.

In the context of the issue at hand, this means that the problem of having data which is securely interpretable with both a JSON data model and a graph data model is especially challenging, moreso when you have transformations like canonicalization or selective disclosure in play.

@dlongley
Copy link
Contributor

dlongley commented Oct 6, 2022

@dwaite,

The concept is that the issuer would issue the linked example with a RDF canonicalization based signature, while another party would modify the message before handing it to the verifier.

The issuer can just use @protected. I think we should move past this now since there's a clear solution for the issuer.

@TallTed
Copy link
Member

TallTed commented Oct 6, 2022

@dwaite -- Please edit your #929 (comment) and put code fences (`@vocab`) around the now unfenced @vocab in the quote block. That GitHub user doesn't need to be pinged about every update to this discussion, in which they are not (yet) a willing participant.

@melvincarvalho
Copy link

melvincarvalho commented Oct 6, 2022

What would be the requirement for RDF from a data model perspective

Just going to point out that RDF is a Set and JSON is a Tree/Object. So that means that duplicate triples get merged into one thing. That makes merges cheaper, but some other operations more expensive

Also, arrays are treated slightly differently, seem to me, to be some of the main differences, between JSON and JSON-LD

JSON-LD, among other things, also standardizes a way to include hyperlinks in your JSON. That's useful as JSON doesnt have this natively

IMHO it would be nice if JSON-LD parsers could largely parse plain old JSON. Perhaps one day, they will.

@msporny
Copy link
Member

msporny commented Oct 6, 2022

@dwaite wrote:

The concept is that the issuer would issue the linked example with a RDF canonicalization based signature, while another party would modify the message before handing it to the verifier.

Your response seems to indicate that you are not grokking the points that multiple people in this thread have been making regarding your proposed attack. It has been demonstrated to be invalid. It does not work if you use the protection mechanisms that exist in the VC spec today:

https://www.w3.org/TR/vc-data-model/#semantic-interoperability

https://www.w3.org/TR/vc-data-model/#syntactic-sugar (read the last bullet item)

Can we do better? Yes, we can and we should expand on the multiple ways to prevent the attack that you're outlining. That said, the fundamental premise of your argument has been dismantled and shown to be invalid. Use static contexts, use @protected, and follow the advice in the sections above. Attack mitigated. Full stop. As has been demonstrated above. :)

Your response seems to either assert that these things do not exist (even though we're linking to them above), or to assert "Yes, but if you take the safeties off, you can do bad things." -- and that's true of any technology, just like alg=none in JWTs, just like it's true that you're asking for trouble if you don't use a secure cryptographic random number generator when creating a cryptographic private key.

Why do you still believe that a MiTM injection or redefinition of a @context value is possible given the mitigations that are outlined in the current VC spec? What am I missing?

@msporny
Copy link
Member

msporny commented Oct 9, 2022

I've been thinking about this issue over the past couple of days and it seems like we're all conflating a number of things in this issue and perhaps reframing the threads of discussion as separate issues will help us make progress. Here are the two main paths that seem to be emerging:

  1. There is interest from a group of individuals in making @context optional.
  2. There is interest from a group of individuals to make the developer experience better than what we have now if @context is used.

I propose we break the discussions happening in this one issue into two separate issues (that can reference this issue as the starting point).

PROPOSAL: Create a new issue titled "Make the usage of @context optional"
PROPOSAL: Create a new issue titled "Improve the developer experience when @context is used"

Thoughts?

@dwaite
Copy link

dwaite commented Oct 10, 2022

@dwaite wrote:

@msporny wrote:

The concept is that the issuer would issue the linked example with a RDF canonicalization based signature, while another party would modify the message before handing it to the verifier.

Your response seems to indicate that you are not grokking the points that multiple people in this thread have been making regarding your proposed attack. It has been demonstrated to be invalid. It does not work if you use the protection mechanisms that exist in the VC spec today:

https://www.w3.org/TR/vc-data-model/#semantic-interoperability

https://www.w3.org/TR/vc-data-model/#syntactic-sugar (read the last bullet item)

There are multiple confounding factors there which are related to the topic at hand (JSON-LD compatible JSON).

@OR13 mentioned earlier that he would be against limiting @context to be just URI and not objects, as 1.1 allows today. However, as the Semantic Interoperability section mentions:

JSON-based processors MUST process the @context key, ensuring the expected values exist in the expected order for the credential type being processed.

Is there a proposal on how one would verify an inline context object is as expected?

There is also a goal to allow credentials to be extensible. If a university wanted to include additional information in a transcript, for example, that should be allowed. However, how does a verifier or holder recognize that as an "expected value appearing in the expected order"?

What I don't want to see is a desire for isomorphic data formats and flexible security algorithms to then limit the expressiveness and extensibility of credentials.

I will say if you take the current text as a limit to @context being effectively a static mapping for a given kind of credential with no dynamic extensibility, then indeed one can't do the sort of ambiguity I described earlier - and at that point @protected doesn't really matter. I don't know if the group would agree to the cost of such a restriction.

Why do you still believe that a MiTM injection or redefinition of a @context value is possible given the mitigations that are outlined in the current VC spec? What am I missing?

There were several additional things proposed to allow for JSON-LD compatible JSON securely above, including:

  1. stricter interpretations of the above semantic-interoperability section which might prohibit context-based extensibility
  2. extensions which are not suggested in the above sections, such as using only pre-understood static context URI and/or hash links, and
  3. features in particular software implementations (e.g. safe mode, which I wasn't quickly able to find documentation on the set of behaviors when this is switched to default or true)

I apologize to the group about elaborating with an example here and sidetracking this topic, and would propose that further conversation happen in another venue. It would probably also be more useful if that discussion was framed by a more stable set of assumptions, such as any additional proposed guidance/restrictions for implementers.

@OR13
Copy link
Contributor

OR13 commented Oct 10, 2022

IMO, it's JSON compatible JSON-LD when it verifies as a JWS payload. How it's processed after that is up the issuer and verifier.

I can provide examples of this working with and without the need to cache contexts by URI, and produce the same information.

As a verifier, I want to process structured data from issuers I trust.

As an issuer, I want to produce structured data that meets the requirements of many verifiers.

To me, the VCDM is just a tool to accomplish this objective with some well established standards, JSON-LD and JOSE.

If we add additional restrictions, beyond these, there should be really good reasoning provided, because complexity and optionality on top of these 2 building blocks are the greatest risk to the success of Verifiable Credentials, and we need them desperately.

https://twitter.com/ylecun/status/1579138435398279170?t=qhYsHGueuPHd7QdE7suKhQ&s=19

There are a lot of context related conversation we need to be having, especially around @container and proof.

It would be nice to get to those conversations.

I suggest we use "discussions" for discussions, and "issues" for working towards pull requests.

@melvincarvalho
Copy link

melvincarvalho commented Oct 10, 2022

PROPOSAL: Create a new issue titled "Make the usage of @context optional"

@msporny +1 to this

It seems valuable on every level

We're close to plain old JSON then, with @id and @type as a way to tell a parser that an object has a certain URI and has a certain type. That seems to be where much of the benefit of JSON-LD lies

How to deal with extensibility would be a great topic. For example, you can guess what "controller" means, but it might be harder to guess what "timestamp" means. Software should be able to handle this without a fatal error, imho.

There will be some trade offs around parsing, canonicalization and signing, perhaps

Worth it's own issue, imho, and I'm sure @msporny has much valuable experience that we can all learn from

@decentralgabe
Copy link
Contributor

I've been thinking about this issue over the past couple of days and it seems like we're all conflating a number of things in this issue and perhaps reframing the threads of discussion as separate issues will help us make progress. Here are the two main paths that seem to be emerging:

  1. There is interest from a group of individuals in making @context optional.
  2. There is interest from a group of individuals to make the developer experience better than what we have now if @context is used.

I propose we break the discussions happening in this one issue into two separate issues (that can reference this issue as the starting point).

PROPOSAL: Create a new issue titled "Make the usage of @context optional" PROPOSAL: Create a new issue titled "Improve the developer experience when @context is used"

Thoughts?

+1

thank you @msporny -- we need this kind of distillations, and I agree with what you have written. both proposals make sense to me.

@nadalin
Copy link

nadalin commented Oct 11, 2022 via email

@melvincarvalho
Copy link

The base model should be JSON, the extensibility is that you may use JSON-LD and the “@ context” is the trigger point for usage of JSON-LD, I’m in favor of a data model that only requires JSON but can be extensible to support JSON-LD if needed

@nadalin we came to a similar conclusion some time ago, leading to the creation of some notes, for something with the code name "Linked Objects"

https://linkedobjects.org/

Just a one-pager with some notes, not an implementation or production system (yet)

It's basically a JSON base, with extensibility to full JSON-LD. This is not a proposal or suggestion, but hopefully on-topic, and some "food for thought". It also has some references to the history.

A way for developers to get started with regular JSON and guide those that want it to full JSON-LD, including all the benefits

@David-Chadwick
Copy link
Contributor

Given that there will (hopefully) be thousands of issuers and millions of holders and verifiers, in which a verifier does not necessarily have a-priori knowledge about a VC issuer, it is important to me that we do not split the eco-system into two halves, the @contexts and the not-@contexts. I think an @context issuer should be able to have its VCs accepted by all not-@contexts verifiers and vice-versa. But a two-world solution favours no-one (except perhaps today's smartphone OS wallet providers)

@msporny
Copy link
Member

msporny commented Oct 11, 2022

@nadalin wrote:

The base model should be JSON, the extensibility is that you may use JSON-LD and the “@ context” is the trigger point for usage of JSON-LD, I’m in favor of a data model that only requires JSON but can be extensible to support JSON-LD if needed

The base data model does not specify any market-specific properties. That is, if you're issuing a University Degree, there are no types/terms for a university degree in the base VC data model (by design). You have to extend the base data model to use market-specific properties and @context has been the mechanism used to do that since the very beginning.

@msporny Can you explain why you would want to enforce the usage of @context for a JSON only VC

For at least the reason above. If you don't use @context it means:

  • The meaning of the properties in your credential are undefined.
  • It is not possible to discover the meaning of the properties in your credential.
  • It is not possible for a Verifier to know what the property means without some sort of out-of-band communication mechanism.

To be clear, just because you add @context to a JSON payload DOES NOT mean you need to use JSON-LD to process the credential. If you follow the rules in the spec today, it is possible to just see if the @contextcontains a set of URLs that you are expecting and then implement processing for that specific type of credential... that's the compromise the group came to the last time this topic was discussed that enabled VC-JWTs to include @context but not require JSON-LD processing.

@msporny
Copy link
Member

msporny commented Oct 11, 2022

@dwaite wrote:

Is there a proposal on how one would verify an inline context object is as expected?

No, there is not, which is why the following proposal was made:

PROPOSAL: Verifiable Credentials SHOULD NOT utilize inline JSON-LD Contexts (objects as values) for the @context.

This would ensure that we don't hit a case where a JSON-only encoding would hit an inline context object. It's a compromise that we are suggesting though there are others that want to keep inline contexts. Doing the above proposal would enable us to further be able to rely on NOT requiring JSON-LD processors for JSON-only folks... we can probably get the pro-@vocab folks what they want too, but that's a separate discussion.

There is also a goal to allow credentials to be extensible. If a university wanted to include additional information in a transcript, for example, that should be allowed. However, how does a verifier or holder recognize that as an "expected value appearing in the expected order"?

It depends if the Verifier wants to process that additional information. If @protected is used, then the Verifier only has to check the first N of M @context values and only process the stuff they understand. They can do this because @protected guarantees that terms that they recognize will not be redefined to mean something else. So, yes, you can have JSON-only (with @context), never do JSON-LD processing, and still be able to depend on the semantics of the VC.

@msporny
Copy link
Member

msporny commented Oct 11, 2022

I'm not seeing any objections to splitting this issue into two more focused issues, so here they are:

#947 Make the usage of @context optional
#948 Limit JSON-LD optionality to enhance developer experience

Closing this issue, please continue the discussion in the more focused issues above.

@msporny msporny closed this as completed Oct 11, 2022
@TallTed
Copy link
Member

TallTed commented Oct 12, 2022

@nadalin
Copy link

nadalin commented Oct 19, 2022 via email

@David-Chadwick
Copy link
Contributor

@nadalin "Your reply assumes that IANA is not used"
This is correct. @context is there precisely so that IANA is not needed.

@nadalin
Copy link

nadalin commented Oct 19, 2022 via email

@msporny
Copy link
Member

msporny commented Oct 19, 2022

This issue is closed, take up the discussion in the issues listed here: #929 (comment)

@w3c w3c locked as resolved and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests