Subjects - schemeURI: encourage use of versionIRIs of controlled vocabularies, ontologies, subject schemes and the like #154
Replies: 11 comments 7 replies
-
|
This same concern is also becoming an issue for our efforts in NASA Heliophysics to incorporate controlled vocabularies and taxonomies. |
Beta Was this translation helpful? Give feedback.
-
|
Hi there! A related suggestion from me, thank you: I was hoping to re-use DataCite's resource types for our own purposes (tagging our content in FAIRsharing) for model and outputmanagamentplan. However I can only presume the 'official' IRIs for these resource types are not the readthe doc links I have found, and was wondering what IRIs should be used to unambiguously references these types? I've discovered https://schema.datacite.org/meta/kernel-4/include/datacite-resourceType-v4.xsd but there is no apparent IRI I can use to reference these resource types. That leaves us with two options:
I feel linking to a readthedocs page isn't the right idea for what we want, so it's likely that for now, at least, we will be going with solution 1. Thank you for providing such a great resource and I hope I can make more use of it in future! |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for this suggestion, @SArndt-TIB! You can definitely use a version-specific schemeURI in the Subject property. We can also look at adding some examples of this. Out of curiosity, what controlled vocabulary(ies) are you using that you are hoping to include in the Subject property? |
Beta Was this translation helpful? Give feedback.
-
|
None in particular, @KellyStathis - this is just a general concern, that it may be useful to relate back to a particular version than just the latest one. |
Beta Was this translation helpful? Give feedback.
-
|
I have an example for this. Take the Unified Astronomy Thesaurus https://astrothesaurus.org/. This is a thesaurus of terms relevant to Astronomy, Space Physics, and Solar Physics. Any of the terms (e.g., solar flares) is likely to have the definition updated over time as our understanding of the science progresses with new research. However, this also means that a term used from an earlier version of the UAT in DataCite metadata for a dataset may not have the same meaning as a term in a later set, so being able to indicate the version of UAT (currently 5.1) would be advantageous to maintaining accurate metadata. @rmcgranaghan |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for raising this, and for the concrete examples in this thread. As part of work within PID4NFDI https://pid.services.base4nfdi.de/ , we are publishing the DataCite terms as linked-data artifacts (classes, properties, vocabularies) with resolvable IRIs and distributions (JSON-LD/Turtle/RDF). Where this still needs design input is whether documentation guidance is enough, or whether a dedicated field (for example a scheme version URI) would materially improve interoperability and downstream reuse. We’d be very interested in getting your input. Please provide your feedback while the RFC is still open https://github.com/datacite/schema.datacite.org-linked-data. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks this is just what would be useful for us! for example, output management plan would have an identifier of https://schema.stage.datacite.org/linked-data/vocab/resourceTypeGeneral/OutputManagementPlan |
Beta Was this translation helpful? Give feedback.
-
|
I don't know if this general response is useful, but here goes. (Regarding the original post to use versioned IRIs for the terms of an ontology--note, not just the ontology itself. More recent comments are about #148 and I'll respond to them on that thread.) When we designed the MMI ontology repository 20+ years ago (?!), Luis Bermudez and I designed the IRIs to have two forms: (1) version-embedded (using a timestamp as the version indicator), and (2) unversioned. So someone could access each term either way. And we definitely tracked ontology versions, so you could at least compare across versions. Alas, we didn't have the political skill to get funding to take advantage of that cleverness, so although we've managed to persist the repo (at https://mmisw.org/) it isn't in common use today and can't leverage that design. Through this we found that experts were divided—some loved versioned terms because it could 'lock in' meaning; others felt that semantic identifiers shouldn't be used and that once minted, an identifier's basic meaning should never change, so versioning shouldn't be a thing. More recently many principled ontology developers have started identifying when each change is made to an ontology term—but still without versioning the term. But at least if you know when the term was used, you have a fair idea of what its state was at the time. There are more subtleties to consider. If some term (or lots of terms) in an ontology changes, should all the terms be reversioned? Rigorously speaking, I'd say yes, because the context of the entire ontology changes with any change to its content. But if you're changing the ontology regularly, now you'll have multitudes of identifiers for what is essentially a single term. Tracking those relationships is going to be miserable for all the tools that are trying to figure out what is (related to, sameAs, almost sameAs) what else. To achieve the same goal today, my recommendation would be treating each major change to the ontology as deserving a different ontology ID, as with DC and DCTerms. This effectively gives new identifiers to all the duplicate terms that are carried over. But as that community knows, that distinction creates its own confusion, as non-experts try to keep everything straight while ontologists try to keep it all rigorous. You can have both at the same time if you're particularly clever, but I think that just gets you both the best, and the worst, of both worlds. |
Beta Was this translation helpful? Give feedback.
-
|
Hi folks, My current proposal is:
So for a DataCite term such as subject, the canonical IRI would stay the same. What changes from release to release is not the term IRI itself, but the versioned manifest/distribution that contains and describes that term. In other words:
This means I am not proposing versioned per-term endpoints. I am proposing versioned release artifacts. An important implication is that a machine cannot recover exact DataCite-version provenance from the canonical IRI alone. If a client stores only the stable DataCite IRI for subject, and later dereferences it, it will get the current/latest meaning. It will not automatically know whether the original usage was based on 4.6 or 4.7. So if exact DataCite-version provenance matters, the client needs to persist one more piece of information alongside the stable IRI, for example:
I also want to distinguish this from the Subject property discussion itself:
So there are really two separate versioning questions here:
This can be expressed via schemeUri when the vocabulary provides a versioned IRI.
This needs to be captured separately if exact DataCite-version provenance is required. For me, the advantage of this approach is that it gives us:
But the tradeoff is that exact DataCite-version provenance is only available if the consuming system stores version context in addition to the canonical DataCite IRI. Without explicit DataCite version info
What this tells a machine:
With explicit DataCite version info Example wrapper object:
What this tells a machine:
If you want to show the distinction even more clearly
That makes the two versioning layers very visible:
Finally, please checkout the work itself: https://github.com/datacite/schema.datacite.org-linked-data and maybe we can move the discussions there. |
Beta Was this translation helpful? Give feedback.
-
|
There is already a "schemaVersion" field in DataCite XML and JSON output (e.g., https://commons.datacite.org/doi.org/10.48322/6cfb-rq65). However, the entries into this field are not version specific as "http://datacite.org/schema/kernel-4" and "http://schema.datacite.org/meta/kernel-4/metadata.xsd" resolves to the latest versions only. Perhaps the entry into these fields becomes version specific? It would also be useful to add this field to the DataCite documentation. |
Beta Was this translation helpful? Give feedback.
-
|
First: Overall everything here is top-notch, lots of great work to bring this into existence. So testing my understanding of your excellent description, and offering comments based on that understanding. My discussion focuses specifically on DataCite vocabulary terms (like ResourceTypeGeneral/Book) used to describe a digital artifact. Let me know if you want me to move my 3 notes to separate tickets. If I am using a DataCite metadata schema to define my metadata, I will have declared which (versioned) schema (e.g., 4.6) I am using inside the metadata instance, and so my use of ResourceTypeGeneral/Book will follow the context of version 4.6 and that's the corresponding definition that applies. I can't access the 4.7 version of the vocabulary until I change my DataCite metadata schema. (Is that right? Or is it possible for a metadata instance to be silent about which schema version it is following?) If I am using some other metadata schema but want to use a DataCite-compatible metadata attributes, I will define a schema appropriately. So references for a certain 'resourceType' field would constrain values to only DataCite ResourceTypeGeneral values. Assuming my template schema uses semantic IRIs to constrain the category values, there are two possibilities here as to what is needed: (A) an IRI for a fixed list of possible values that will never change because it's versioned, or (B) an IRI for an always-current list of possible values. In one sense, I can specify either of those, because there's a version-specific dist/ for the ontology files (https://github.com/datacite/schema.datacite.org-linked-data/blob/main/dist/datacite-4.6.ttl) and a non-version-specific JSON-LD file for the specific concept (https://github.com/datacite/schema.datacite.org-linked-data/blob/main/vocab/resourceTypeGeneral/resourceTypeGeneral.jsonld). But the latter is not the whole ontology, Note 1: Is there a fixed IRI location where the latest ontology will be publicly maintained? (This would ideally be where the namespace resolves as well, crudely speaking.) If I'm missing it my apologies. Otherwise I'd like to suggest a non-versioned set of distribution files, a la https://github.com/datacite/schema.datacite.org-linked-data/blob/main/dist/datacite.ttl | rdf | jsonld. (If I am trying to always present the latest version of the ontology in a tool like BioPortal, or refer to it in a blog post, I expect to be able to specify the 'latest-greatest' location just once, and the file at that location will be updated whenever the vocabulary changes.) Note 2: It seems the dist/ files are not self-identifying, in that once the contents are downloaded, you can not tell from the contents what you have. As a general rule ontologies do contain their own version identification and other self-describing metadata, for all the usual FAIR reasons. (See MOD, recommend at least the 2.0 version.) Even/especially in the 'non-versioned' or most recent ontology, showing the actual version number is important for downstream applications and repositories. Note 3: A minor thing, but ideally the namespace embedded in each term could be specified as the default namespace in the TTL (and will eventually be resolvable for the ontology and its terms). (Now returning to my understanding of how this could work.) Then finally, because CEDAR relies on an intermediary (BIoPortal) to support any term lookup needs, if someone wanted to refer to a specific version of the ontology in BioPortal, they would want to need to upload that specific (4.6) version as something like DATACITE-46 (similarly to ICD9, ICD10). Otherwise to use the always current version they would go to the general DATACITE listing. Does that work for you? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What is the problem that your suggestion solves?
Ontologies, controlled vocabularies and other terminological resources may change over time. It may be helpful to add version info to the subject terms from such terminological resources, so that it is transparent what version of a term a resource refers to.
What solution might meet your needs?
There are several options:
Your name
Susanne Arndt
Your organization
Technische Informationsbibliothek
What alternatives have you tried or considered?
versionIRIs for terms, but this is not feasible
Is there anything else you would like to share?
No response
What group(s) would benefit from your suggestion?
If other group(s), please describe.
users of DataCite metadata
Beta Was this translation helpful? Give feedback.
All reactions