This repository contains JSON-LD resources that describe parts of the DataCite metadata schema (currently staged at version 4.7) as linked data.
If you are new to this topic:
- JSON-LD is a JSON-based format for representing linked data (RDF-style identifiers and relationships).
- This repo is primarily a collection of schema artifacts, vocabularies, contexts, manifests, and example transformation outputs.
This repository organizes DataCite metadata concepts into resolvable JSON-LD files:
class/defines major entities such asResource,Creator, andTitleproperty/defines metadata properties such asidentifier,creatorName, andpublicationYearvocab/defines controlled vocabularies (enumerated terms) used by DataCite fields (for exampleresourceTypeGeneral,relationType,nameType)context/contains JSON-LD contexts used to map compact keys to IRIsmanifest/provides a versioned inventory of classes, properties, and vocabulary schemes/termsdist/contains integrated distribution artifacts that bundle the schema as a single JSON-LD graph plus alternate RDF serializationsInput files/contains example XML, transformed JSON outputs, validation helpers, and notes used during transformation experiments
If you need to handle a new DataCite schema release, start here:
- DATACITE-RELEASE-RUNBOOK.md: step-by-step guide for detecting, reviewing, applying, and merging a new DataCite release
- UPGRADING-4.6-TO-4.7.md: detailed worked example of the
4.6 -> 4.7upgrade
Each file describes a DataCite entity as a JSON-LD/RDF class.
Example:
class/Resource.jsonlddefines the class IRI for a citable resource
Typical structure:
@id: stable IRI of the class@type: usuallyrdfs:Classrdfs:label: human-readable namerdfs:comment: short description
Each file describes one DataCite property as a JSON-LD/RDF property.
Example:
property/identifier.jsonlddescribes theidentifierfield used in DataCite metadata
This folder contains controlled terms used by DataCite (enumerations).
Examples:
vocab/resourceTypeGeneral/for values likeDataset,Software,Textvocab/relationType/for values likeCites,IsPartOf,Referencesvocab/nameType/for values likePersonal,Organizational
There are usually two kinds of files in each vocabulary folder:
- a vocabulary scheme file (for example
resourceTypeGeneral.jsonld) - individual term files (for example
Dataset.jsonld)
Contains JSON-LD contexts that define how compact JSON keys map to linked-data identifiers.
Important files:
context/fullcontext.jsonld: a large mapping used to interpret DataCite-like JSON as linked datacontext/runner.jsonld: example JSON instance using the contextcontext/runner.nq: RDF/N-Quads output derived from the examplecontext/runner.err: transformation/processing notes or errors from a run
manifest/datacite-4.7.jsonis the current staged versioned index of linked-data resourcesmanifest/datacite-4.6.jsonremains as a frozen previous release snapshotmanifest/datacite-current.jsonis the canonical pointer to the current default manifest/distribution targetsmanifest/release-matrix-4.6-4.7.jsoncaptures the schema-level change matrix between these two versions
This is a useful entry point if you want to programmatically discover what is defined for a given schema version.
Release-import automation:
node scripts/detect-datacite-release.jsbuilds a review plan for the next official DataCite4.xrelease or for an explicitly requested versionnode scripts/apply-datacite-release-plan.js --plan reports/release-import-plan-<version>.jsonapplies approved plan items and regenerates outputs
The Detect DataCite Release GitHub workflow can now commit the generated plan files back to the selected branch, so the JSON plan can be reviewed and edited directly on GitHub before running the apply workflow.
dist/datacite-4.7.jsonldis the integrated JSON-LD bundle for the current staged schema versiondist/datacite-4.7.ttlis the Turtle serialization of that bundledist/datacite-4.7.rdfis the RDF/XML serialization of that bundledist/datacite.jsonldis the moving latest full JSON-LD bundle aliasdist/datacite.ttlis the moving latest full Turtle aliasdist/datacite.rdfis the moving latest full RDF/XML aliasdist/datacite-current.jsonldis the canonical pointer to the current default distribution targets
Versioned bundles are generated from the manifest-backed source files by node scripts/build-distribution.js --version 4.7. The moving dist/datacite.* aliases are refreshed when you update the current version pointers.
Release snapshot automation:
node scripts/release-snapshot.js --version 4.7 --release-date 2026-03-03creates/updates the versioned manifest, distribution bundle, current pointers, latest full distribution aliases, and section index pages.node scripts/update-current-pointers.js --version 4.7refreshes canonical current pointers and the movingdist/datacite.*aliases.
This folder contains working materials and examples used to test conversions and round-tripping:
- DataCite XML examples and XSD files
- XML-shaped JSON examples
- roundtrip XML outputs
- TTL examples
- validation helper script (
validate_xml.rb) - notes documenting conversion steps (
codes&steps.md)
In this repo, linked data mainly means:
- metadata concepts have stable IRIs (web identifiers)
- JSON documents include context mappings (
@context) - terms can be interpreted as RDF classes, properties, and controlled concepts
This allows metadata fields like identifier, creator, or resourceTypeGeneral to be described in a machine-readable, interoperable way.
class/: 21 JSON-LD filesproperty/: 79 JSON-LD filesvocab/: 174 JSON-LD filescontext/: 2 JSON-LD filesmanifest/: 4 JSON files (datacite-4.6.json,datacite-4.7.json,datacite-current.json,release-matrix-4.6-4.7.json)dist/: 7 distribution/pointer files (.jsonld,.ttl,.rdf)
An individual vocabulary term (for example vocab/resourceTypeGeneral/Dataset.jsonld) includes:
- a stable term IRI
- label (
prefLabel) - definition
- scheme membership (
inScheme) - optional notes/examples/mappings (
scopeNote,example,closeMatch)
This repo also includes notes and example artifacts for different transformation approaches from DataCite XML.
Two approaches appear in the existing notes:
XML -> Bolognese JSON(semantic normalization)XML -> xml-js JSON(XML-shaped JSON for structural fidelity / round-trip testing)
The key difference is whether you want semantic convenience or exact XML structure preservation.
| Transformation | Purpose | Reversible | XML fidelity |
|---|---|---|---|
| XML -> bolognese JSON | Semantic normalization | ❌ No | ❌ No |
| XML -> xml-js JSON | Structural preservation | ✅ Yes | ✅ Yes |
- Use DataCite/Bolognese-style JSON when you want a cleaner application-facing representation for APIs or processing pipelines
- Use XML-shaped JSON when you need to preserve XML attributes/wrappers and support round-trip conversion back to equivalent XML
- Use the JSON-LD files in this repo when you need schema definitions, vocabulary IRIs, and semantic mappings
- DataCite XML and DataCite JSON are not identical representations; they serve different goals
- “Valid XML” and “semantically equivalent JSON” are separate concerns
- JSON-LD contexts do not automatically validate DataCite business rules; they define interpretation/mapping
- Many IRIs in this repo use the
schema.stage.datacite.orgnamespace, which indicates a staging environment namespace in the current artifacts
If you are exploring this repo for the first time:
- Open
dist/datacite.jsonldfor the moving latest full bundle, ordist/datacite-current.jsonldfor the canonical pointer. - Read
manifest/datacite-current.jsonfor current-version pointers, ormanifest/datacite-4.7.jsonfor the current full inventory. - Open
context/fullcontext.jsonldto understand how DataCite-like JSON keys are mapped. - Compare one file each from
class/,property/, andvocab/to see the modeling pattern. - Review
Input files/codes&steps.mdand the example files if you are evaluating XML <-> JSON round-tripping.
- Some files in
Input files/are experimental outputs used for validation and comparison, not canonical schema definitions.