Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should handle less-than and other disallowed characters #47

Open
alexkreidler opened this issue Apr 9, 2021 · 3 comments
Open

Should handle less-than and other disallowed characters #47

alexkreidler opened this issue Apr 9, 2021 · 3 comments

Comments

@alexkreidler
Copy link

I have an entry in a JSON-LD file like this:

        {
          "@id": "ex:BOE/code/INSTRUMENTS/LDA>1Y",
          "@type": "skos:Concept",
          "skos:prefLabel": "Medium and long term deposits",
          "skos:notation": "LDA>1Y"
        },

It gets converted by this library into:

<https://example.com/BOE/code/INSTRUMENTS/LDA>1Y> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<https://example.com/BOE/code/INSTRUMENTS/LDA>1Y> <http://www.w3.org/2004/02/skos/core#hasTopConcept> <https://example.com/BOE/code/INSTRUMENTS> .
<https://example.com/BOE/code/INSTRUMENTS/LDA>1Y> <http://www.w3.org/2004/02/skos/core#notation> "LDA>1Y" .
<https://example.com/BOE/code/INSTRUMENTS/LDA>1Y> <http://www.w3.org/2004/02/skos/core#prefLabel> "Medium and long term deposits" .

As you can see, the LDA>1Y> section is problematic because N-Triples parsers fail at that position. They view the IRI as already being closed.

I'm not sure if the JSON-LD spec has anything to say about this, i.e. whether the > should be URL encoded, or if the serializer should just return an error.

But the library should do one of those two: either serialize it properly or throw an error, rather than silently emit invalid N-Triples.

Let me know if I can provide more info. Thanks for this awesome library!

@kazarena
Copy link
Member

kazarena commented Apr 9, 2021

@alexkreidler thank you for reporting the issue. The problem is clear. At first glance, I'm not sure what the correct behaviour should be (whether the library should apply URL escaping or leave it to the caller (so that the library always expects @id fields in the escaped format) ). JSON-LD Playground which is a good reference point behaves in the same way as json-gold.

While I'm looking at the possible solution, I'll give an unhelpful suggestion, based on my experience with writing financial services software 😄 : even if the library is producing well formed N-tuples, I'm afraid there will be problems with such identifiers downstream. I would highly recommend using 'safe' identifiers without characters like >, and moving the actual identifier into a separate field.

@gkellogg
Copy link

gkellogg commented Apr 9, 2021

Looking at the IRI Syntax from RFC3987, "LDA>1Y" would be an isegment part of an ipath, and ">" is not a valid icharacter, so must be escaped. The spec depends on the use of valid IRIs, and a processor may reject invalid IRIs or relative IRI references (such as this).

My on parser (available at http://rdf.greggkellogg.net/distiller) is happy to expand this, but doesn't generate N-Triples because of the invalid IRIs.

I used the following as an example:

{
  "@context": {
    "@base": "http://example.com/BOE/code/INSTRUMENTS",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "ex": "https://example.com/",
    "skos:notation": {"@type": "@id"}
  },
  "@id": "ex:BOE/code/INSTRUMENTS/LDA>1Y",
  "@type": "skos:Concept",
  "skos:prefLabel": "Medium and long term deposits",
  "skos:notation": "LDA>1Y"
}

@alexkreidler
Copy link
Author

Thanks for both your responses.

For my situation I can do a check to make sure the @id is valid, and either just omit the bad records or think about URL-encoding or shortening those IDs.

It would be interesting to see if json-gold could do a check to make sure it's not serializing invalid IRIs. Rdflib does this Of course, we wouldn't want it to hurt performance, so maybe it could be optional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants