Skip to content
Open
Show file tree
Hide file tree
Changes from 82 commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
1d53e3a
test
svogt0511 Oct 30, 2025
8efb138
Implements JSON Schema for titles
codycooperross May 21, 2025
2ce744c
Array.wrap titles attribute
codycooperross May 21, 2025
20e175b
Linting
codycooperross May 21, 2025
19213ec
Addressed issue where a string submitted in the titles attribute woul…
codycooperross May 22, 2025
1360a4c
Fix title data structure in test
codycooperross May 22, 2025
04497cd
Fix data structure
codycooperross May 23, 2025
853dd15
test
svogt0511 Oct 30, 2025
77e6295
Update activerecord_json_validator to the latest version.
svogt0511 Oct 31, 2025
71143ae
Appease rubocop.
svogt0511 Oct 31, 2025
a5ca110
Merge remote-tracking branch 'origin/pb325-json-metadata-validation' …
svogt0511 Oct 31, 2025
bdb6327
Update titles.json to use the latest schema and a couple more checks.
svogt0511 Nov 7, 2025
42f9f59
Fix test
svogt0511 Nov 7, 2025
b4b3451
fix titles schema
svogt0511 Nov 10, 2025
c5e8678
Merge pull request #1341 from datacite/titles
svogt0511 Nov 10, 2025
8a4a38b
JSON schema for publicationYear.
svogt0511 Nov 10, 2025
49c5ff3
JSON schema for publication_year.
svogt0511 Nov 10, 2025
e90e088
Add creators schema
svogt0511 Nov 11, 2025
3337cc2
Update schema in title_type
svogt0511 Nov 11, 2025
1f8dee6
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Nov 12, 2025
ac4864c
Creators test should have failed.
svogt0511 Nov 12, 2025
0c343ad
Temporarily comment out test.
svogt0511 Nov 12, 2025
c6d150f
Temporarily comment out test.
svogt0511 Nov 12, 2025
19a710c
Fix test - creator missing name
svogt0511 Nov 12, 2025
da0e51d
Fix test. Creator.affiliation was not an array.
svogt0511 Nov 12, 2025
d3d118b
Don't validate json on /dois/validate enpoint. Fix a couple of tests.
svogt0511 Nov 13, 2025
d9c046e
Fix some test data - errors detected by validating with json schema.
svogt0511 Nov 13, 2025
e5f7e96
Fix test data
svogt0511 Nov 13, 2025
0351dc2
Add contributors schema.
svogt0511 Nov 13, 2025
9107c7d
Create shared schemas.
svogt0511 Nov 13, 2025
2833a4d
Add subjects schema.
svogt0511 Nov 14, 2025
e3b1429
Fix test data
svogt0511 Nov 14, 2025
de18655
Fix test data.
svogt0511 Nov 14, 2025
15c2f2f
Fix test data.
svogt0511 Nov 14, 2025
156e56f
Dates schema
svogt0511 Nov 17, 2025
c431f7b
Update validation qualifier to only valid schemas.
svogt0511 Nov 19, 2025
9036ca7
Appease rubocop.
svogt0511 Nov 19, 2025
9663a76
Appease rubocop.
svogt0511 Nov 19, 2025
cdd8419
Backing out dates to string rather than verification using regular ex…
svogt0511 Dec 1, 2025
a08363c
Remove validations for dates. Date to remain validated as string onl…
svogt0511 Dec 1, 2025
d545e4b
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Dec 1, 2025
225deae
Add json-schema validation for resourceType.
svogt0511 Dec 1, 2025
c2bba5b
Revert test.
svogt0511 Dec 1, 2025
3f6606c
Add json-schema validation for alternageIdentifiers.
svogt0511 Dec 1, 2025
6315505
add json-schema validation for relatedIdentifiers.
svogt0511 Dec 2, 2025
4bb4b5b
Add json-schema validation for sizes.
svogt0511 Dec 2, 2025
2d87449
Add json-schema validation for formats.
svogt0511 Dec 2, 2025
329ee8f
Mods.
svogt0511 Dec 2, 2025
04de852
Add json-schema validation for version.
svogt0511 Dec 2, 2025
44fb68e
Add json-schema validation for descriptions.
svogt0511 Dec 2, 2025
598906b
Fix some errors.
svogt0511 Dec 3, 2025
acb17d4
More fixes.
svogt0511 Dec 3, 2025
ad28046
More fixes.
svogt0511 Dec 3, 2025
c6e093f
Fix a test.
svogt0511 Dec 3, 2025
da5f9e8
Corrections plus new controlled vocabularies.
svogt0511 Dec 4, 2025
111ec55
Restructuring and corrections.
svogt0511 Dec 8, 2025
c917c85
Add language validation, make sure it is used in all references to la…
svogt0511 Dec 8, 2025
e44889b
Appease rubocop.
svogt0511 Dec 8, 2025
743ac08
Fix a test - description.
svogt0511 Dec 8, 2025
235c3f9
Fixes - code and test.
svogt0511 Dec 8, 2025
fae0e97
Fix errors.
svogt0511 Dec 8, 2025
d430aee
Temporarily commenting out some tests having to do with language.
svogt0511 Dec 8, 2025
05b301e
Comment out language validation. There is a problem with it.
svogt0511 Dec 8, 2025
7a96090
Appease rubocop.
svogt0511 Dec 8, 2025
2148812
Fix tests.
svogt0511 Dec 8, 2025
8e953cd
Add fundingReference validation.
svogt0511 Dec 9, 2025
091b665
Fix test.
svogt0511 Dec 9, 2025
f27dcee
geoLocation validation.
svogt0511 Dec 9, 2025
fe15778
Add json schema validation for geoLocations.
svogt0511 Dec 10, 2025
a0b91fe
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Dec 10, 2025
5e831a9
Fix
svogt0511 Dec 10, 2025
d719846
Fixes.
svogt0511 Dec 11, 2025
e945040
Fix errors.
svogt0511 Dec 11, 2025
f428d24
Fixes.
svogt0511 Dec 11, 2025
3c9ee96
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Dec 16, 2025
bb141dc
fixes
svogt0511 Dec 16, 2025
681654a
Appease rubocop
svogt0511 Dec 16, 2025
7a1071e
Fixes - language
svogt0511 Dec 16, 2025
bb739f4
Fix
svogt0511 Dec 16, 2025
cf14588
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Dec 17, 2025
74d0375
Fix language validation - add raw_language.
svogt0511 Dec 22, 2025
86c22a7
Appease rubocop
svogt0511 Dec 22, 2025
d3f2056
Add test.
svogt0511 Jan 6, 2026
64206c3
Review comment - fix
svogt0511 Jan 6, 2026
1162d93
Types/resourceType validation
svogt0511 Jan 6, 2026
3232373
Appease rubocop.
svogt0511 Jan 6, 2026
38f9fbd
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Mar 2, 2026
466de93
Merge remote-tracking branch 'origin/master' into pb325-json-metadata…
svogt0511 Mar 3, 2026
fa2327b
JSON-SCHEMA - metadata-4.7 support.
svogt0511 Mar 4, 2026
0cc8590
Use the latest JSON-schema definition in schemas/client/subjects.json…
svogt0511 Mar 4, 2026
30cf06e
Remove null from the controlled vocabs.
svogt0511 Mar 4, 2026
401d91d
Remove null from the controlled vocabs.
svogt0511 Mar 4, 2026
13e2288
Reverse coderabbit comment. Allowing null as alternative to a contro…
svogt0511 Mar 5, 2026
95f1c99
Address review comment on affiliation.json - removed field dependency.
svogt0511 Mar 5, 2026
9fca65c
Affiliation: require nonempty string in name, as specified in DC meta…
svogt0511 Mar 5, 2026
ccda29d
Address review comment: Should both alternateIdentifier and alternate…
svogt0511 Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ source "https://rubygems.org"

gem "aasm", "~> 5.0", ">= 5.0.1"
gem "active_model_serializers", "~> 0.10.0"
gem "activerecord_json_validator", "~> 2.1", ">= 2.1.5"
gem "activerecord_json_validator", "~> 3.1"
gem "apollo-federation", "1.1.3"
gem "audited", "~> 5.4", ">= 5.4.3"
gem "aws-sdk-s3"
Expand Down
16 changes: 6 additions & 10 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ GEM
activemodel (= 7.1.3.2)
activesupport (= 7.1.3.2)
timeout (>= 0.4.0)
activerecord_json_validator (2.1.5)
activerecord (>= 4.2.0, < 8)
json_schemer (~> 0.2.18)
activerecord_json_validator (3.1.0)
activerecord (>= 4.2.0, < 9)
json_schemer (~> 2.2)
activestorage (7.1.3.2)
actionpack (= 7.1.3.2)
activejob (= 7.1.3.2)
Expand Down Expand Up @@ -278,8 +278,6 @@ GEM
scanf (~> 1.0)
sxp (~> 1.2)
unicode-types (~> 1.7)
ecma-re-validator (0.4.0)
regexp_parser (~> 2.2)
edtf (3.2.0)
activesupport (>= 3.0, < 9.0)
elasticsearch (7.17.10)
Expand Down Expand Up @@ -390,12 +388,11 @@ GEM
json-ld-preloaded (3.2.2)
json-ld (~> 3.2)
rdf (~> 3.2)
json_schemer (0.2.25)
ecma-re-validator (~> 0.3)
json_schemer (2.4.0)
bigdecimal
hana (~> 1.3)
regexp_parser (~> 2.0)
simpleidn (~> 0.2)
uri_template (~> 0.7)
jsonapi-renderer (0.2.2)
jsonapi-serializer (2.2.0)
activesupport (>= 4.2)
Expand Down Expand Up @@ -771,7 +768,6 @@ GEM
unicode_utils (1.4.0)
uniform_notifier (1.16.0)
uri (0.13.2)
uri_template (0.7.0)
uuid (2.3.9)
macaddr (~> 1.0)
uuidtools (2.2.0)
Expand Down Expand Up @@ -799,7 +795,7 @@ PLATFORMS
DEPENDENCIES
aasm (~> 5.0, >= 5.0.1)
active_model_serializers (~> 0.10.0)
activerecord_json_validator (~> 2.1, >= 2.1.5)
activerecord_json_validator (~> 3.1)
apollo-federation (= 1.1.3)
audited (~> 5.4, >= 5.4.3)
aws-sdk-s3
Expand Down
49 changes: 41 additions & 8 deletions app/models/doi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@
require "benchmark"

class Doi < ApplicationRecord
INVALID_SCHEMAS = %w[
http://datacite.org/schema/kernel-2.1
http://datacite.org/schema/kernel-2.2
http://datacite.org/schema/kernel-3.0
http://datacite.org/schema/kernel-3.1
http://datacite.org/schema/kernel-3
].freeze

self.ignored_columns += [:publisher]
PUBLISHER_JSON_SCHEMA = Rails.root.join("app", "models", "schemas", "doi", "publisher.json")
audited only: %i[doi url creators contributors titles publisher_obj publication_year types descriptions container sizes formats version_info language dates identifiers related_identifiers related_items funding_references geo_locations rights_list subjects schema_version content_url landing_page aasm_state source reason]
Expand Down Expand Up @@ -110,16 +118,13 @@ class Doi < ApplicationRecord
validates_presence_of :doi
validates_presence_of :url, if: Proc.new { |doi| doi.is_registered_or_findable? }

json_schema_validation = {
message: ->(errors) { errors },
schema: PUBLISHER_JSON_SCHEMA
}

def validate_publisher_obj?(doi)
doi.validatable? && doi.publisher_obj? && !(doi.publisher_obj.blank? || doi.publisher_obj.all?(nil))
def validate_json_attribute?(attribute)
validatable? && !self[attribute].nil? && !INVALID_SCHEMAS.include?(self.schema_version)
end

validates :publisher_obj, if: ->(doi) { validate_publisher_obj?(doi) }, json: json_schema_validation
def schema_file_path(schema_name)
Rails.root.join("app", "models", "schemas", "doi", "#{schema_name}.json")
end

# from https://www.crossref.org/blog/dois-and-matching-regular-expressions/ but using uppercase
validates_format_of :doi, with: /\A10\.\d{4,5}\/[-._;()\/:a-zA-Z0-9*~$=]+\z/, on: :create
Expand All @@ -146,6 +151,34 @@ def validate_publisher_obj?(doi)
validate :check_geo_locations, if: :geo_locations?
validate :check_language, if: :language?

# JSON-SCHEMA VALIDATION
# temporarily commenting out this validation.
# validates :doi, if: proc { |doi| doi.validate_json_attribute?(:identifier) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("identifier") } }, unless: :only_validate
validates :creators, if: proc { |doi| doi.validate_json_attribute?(:creators) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("creators") } }, unless: :only_validate
validates :titles, if: proc { |doi| doi.validate_json_attribute?(:titles) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("titles") } }, unless: :only_validate
validates :publisher_obj, if: proc { |doi| doi.validate_json_attribute?(:publisher_obj) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("publisher") } }, unless: :only_validate
validates :publication_year, if: proc { |doi| doi.validate_json_attribute?(:publication_year) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("publication_year") } }, unless: :only_validate
validates :subjects, if: proc { |doi| doi.validate_json_attribute?(:subjects) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("subjects") } }, unless: :only_validate
validates :contributors, if: proc { |doi| doi.validate_json_attribute?(:contributors) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("contributors") } }, unless: :only_validate
validates :dates, if: proc { |doi| doi.validate_json_attribute?(:dates) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("dates") } }, unless: :only_validate
validates :raw_language, if: proc { |doi| doi.validate_json_attribute?(:raw_language) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("language") } }, unless: :only_validate
validates :resource_type, if: proc { |doi| doi.validate_json_attribute?(:resource_type) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("resource_type") } }, unless: :only_validate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this validate the types property? That's where resourceType and resourceTypeGeneral are stored.

validates :alternate_identifiers, if: proc { |doi| doi.validate_json_attribute?(:alternate_identifiers) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("alternate_identifiers") } }, unless: :only_validate
validates :related_identifiers, if: proc { |doi| doi.validate_json_attribute?(:related_identifiers) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("related_identifiers") } }, unless: :only_validate
validates :sizes, if: proc { |doi| doi.validate_json_attribute?(:sizes) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("sizes") } }, unless: :only_validate
validates :formats, if: proc { |doi| doi.validate_json_attribute?(:formats) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("formats") } }, unless: :only_validate
validates :version, if: proc { |doi| doi.validate_json_attribute?(:version) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("version") } }, unless: :only_validate
validates :rights_list, if: proc { |doi| doi.validate_json_attribute?(:rights_list) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("rights_list") } }, unless: :only_validate
validates :descriptions, if: proc { |doi| doi.validate_json_attribute?(:descriptions) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("descriptions") } }, unless: :only_validate
validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix geolocation validator key mismatch.

Line 175 validates :geolocations, but this model uses geo_locations. This typo can prevent geolocation JSON-schema validation from running.

🔧 Proposed fix
-  validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
+  validates :geo_locations, if: proc { |doi| doi.validate_json_attribute?(:geo_locations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
validates :geo_locations, if: proc { |doi| doi.validate_json_attribute?(:geo_locations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/doi.rb` at line 175, The validator is registering :geolocations
but the model's attribute is :geo_locations, so update the validation call in
DOI to use :geo_locations (and also update the schema name passed to
schema_file_path from "geolocations" to "geo_locations" if your JSON schema file
is named accordingly) so that validate_json_attribute?(:geo_locations) and the
JSON schema validation for geolocation run correctly; look for the validates
line in app/models/doi.rb and adjust the attribute symbol and schema_file_path
argument to the consistent "geo_locations" key.

validates :funding_references, if: proc { |doi| doi.validate_json_attribute?(:funding_references) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("funding_references") } }, unless: :only_validate
validates :related_items, if: proc { |doi| doi.validate_json_attribute?(:related_items) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("related_items") } }, unless: :only_validate

# See https://github.com/mirego/activerecord_json_validator for an explanation of why this must be done.
def raw_language
self[:language]
end

after_commit :update_url, on: %i[create update]
after_commit :update_media, on: %i[create update]

Expand Down
5 changes: 1 addition & 4 deletions app/models/schemas/client/subjects.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,7 @@
]
},
"lang": {
"type": [
"string",
"null"
]
"$ref": "../doi/language.json"
},
"subject": { "type": "string" },
"subjectScheme": { "type": "string" }
Expand Down
23 changes: 23 additions & 0 deletions app/models/schemas/doi/affiliation.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"title": "Affiliation",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"affiliationIdentifier": {
"type": ["string", "null"]
},
"affiliationIdentifierScheme": {
"type": ["string", "null"]
},
"name": {
"type": ["string", "null"]
},
"schemeUri": {
"type": ["string", "null"]
}
},
"dependentRequired": {
"affiliationIdentifier": ["affiliationIdentifierScheme"]
},
"additionalProperties": false
}
9 changes: 9 additions & 0 deletions app/models/schemas/doi/affiliations.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"title": "Affiliations",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"minItems": 0,
"items": {
"$ref": "affiliation.json"
}
}
17 changes: 17 additions & 0 deletions app/models/schemas/doi/alternate_identifier.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"title": "AlternateIdentifier",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"alternateIdentifier": {
"type": "string"
},
"alternateIdentifierType": {
"type": "string"
}
},
"additionalProperties": false,
"dependentRequired": {
"alternateIdentifier": ["alternateIdentifierType"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should both alternateIdentifier and alternateIdentifierType be required?

Copy link
Contributor Author

@svogt0511 svogt0511 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They should both be required.

Question: The metadata.xsd shows:

Affiliation: <xs:extension base="nonemptycontentStringType">
-versus-
AlternateIdentifier: <xs:extension base="xs:string">

Shouldn't they both be nonemptycontentStringType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some inconsistencies in the Schema like this for sure. CCing @KellyStathis on this :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codycooperross alternateIdentifier is an optional property. Can you clarify your suggestion that it should be required here?

Yes, there are definitely inconsistencies like this. So technically, an alternateIdentifier can have length 0 characters.

}
}
9 changes: 9 additions & 0 deletions app/models/schemas/doi/alternate_identifiers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"title": "AlternateIdentifiers",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"minItems": 0,
"items": {
"$ref": "alternate_identifier.json"
}
}
42 changes: 42 additions & 0 deletions app/models/schemas/doi/contributor.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"title": "Contributor",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"name": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting one. I was testing with metadata pulled from some recently created/updated DOIs, to see if this validation would have impacted them.

I found this DOI which in the JSON has a contributor givenName and familyName, but not name: https://api.datacite.org/dois/10.34804/supra.2021092825

"contributors": [
{
"nameType": "Personal",
"givenName": "Jacopo",
"familyName": "Torrisi",
"affiliation": [],
"contributorType": "DataManager",
"nameIdentifiers": [
{
"nameIdentifier": "",
"nameIdentifierScheme": "ORCID"
}
]
}
],

Using this metadata to create a DOI on staging failed with this error:

{
    "errors": [
        {
            "source": "contributors",
            "title": "Object at `/0` is missing required properties: name",
            "uid": "10.1111/742r-wc63"
        }
    ]
}

From my understanding of the XSD, contributorName is required even if givenName and familyName are provided. And the corresponding XML for this DOI does have a contributorName:

<contributors>
    <contributor contributorType="DataManager">
      <contributorName nameType="Personal">Torrisi, Jacopo</contributorName>
      <givenName>Jacopo</givenName>
      <familyName>Torrisi</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI=""/>
      <affiliation affiliationIdentifierScheme="ROR"/>
    </contributor>
  </contributors>

How is that contributorName being generated for the XML? I am just thinking through the potential impact on user who are currently providing not providing contributor.name, but are providing contributor.givenName and contributor.familyName, if we introduce this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @codycooperross for input into this as well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With JSON -> XML, we currently generate a contributorName and creatorName based on available familyName and givenName metadata if available. See the description here: https://docs.google.com/spreadsheets/d/1Hy0KXWPxqNx-Pfh-nNFxbsUFXXVYsO8O2sDIytXQv7U/edit?gid=1806954511#gid=1806954511&range=2:2

For the sake of backwards compatibility with existing request patterns and scoping this PR, let's remove the name requirement on creator and contributor for now. Currently invalid JSON metadata, i.e. metadata that contains no name or familyName metadata, will continue to fail when validated against the XSD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was always very confusing to me.

As I understand it, contributorName is always required.

If nameType = Unknown then Contributor can have a given name, family name and a completely independent name.

If nameType = Organization then Contributor has any name you want to give it. (**Although, if you are converting the type to organization to one of the other types, givenName and familyName are preserved in the DB (and probably in elasticSearch. A topic for another time.)

If nameType = Person, the Name field is generated from the givenName and familyName.

I would not remove the name requirement. I would not back out our requirements to fit incorrect data.

I thought that if there are incorrect dois in the DB, that the user would be prodded to correct them if/when the doi is updated, otherwise, there is not much we could do, except perhaps, to flag them as being incorrect at some point.

Does this make sense?

"type": "string"
},
"nameType": {
"$ref": "controlled_vocabularies/name_type.json"
},
"givenName": {
"type": ["string", "null"]
},
"familyName": {
"type": ["string", "null"]
},
"contributorType": {
"oneOf": [
{
"$ref": "controlled_vocabularies/contributor_type.json"
},
{
"type": "null"
}
]
},
"lang": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think lang is permitted on contributor:

<xs:element name="contributors" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="contributor" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>The institution or person responsible for collecting, creating, or otherwise contributing to the development of the dataset.</xs:documentation>
<xs:documentation>The personal name format should be: Family, Given.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="contributorName">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="nonemptycontentStringType">
<xs:attribute name="nameType" type="nameType" use="optional"/>
<xs:attribute ref="xml:lang"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="givenName" minOccurs="0"/>
<xs:element name="familyName" minOccurs="0"/>
<xs:element name="nameIdentifier" xsi:type="nameIdentifier" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="affiliation" xsi:type="affiliation" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="contributorType" type="contributorType" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>

It doesn't look like it's populated in the API: https://api.datacite.org/dois?query=contributors.lang:*

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, it should be allowed. In the XML, it is attached to contributorName: https://datacite-metadata-schema.readthedocs.io/en/4.6/properties/contributor/#id2

There is <xs:attribute ref="xml:lang"/> in the XSD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we do accept this currently in ParamsSanitizer:

Not sure if it's mapped in OpenSearch or to XML. But I'll close this then.

"$ref": "language.json"
},
"affiliation": {
"$ref": "affiliations.json"
},
"nameIdentifiers": {
"$ref": "name_identifiers.json"
}
},
"additionalProperties": false,
"required": [
"name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contributorType is also required on contributor.

]
}
9 changes: 9 additions & 0 deletions app/models/schemas/doi/contributors.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"title": "Contributors",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"minItems": 0,
"items": {
"$ref": "contributor.json"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"title": "ContributorType",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": ["string", "null"],
"anyOf": [
{
"type": "string",
"enum": [
"ContactPerson",
"DataCollector",
"DataCurator",
"DataManager",
"Distributor",
"Editor",
"HostingInstitution",
"Producer",
"ProjectLeader",
"ProjectManager",
"ProjectMember",
"RegistrationAgency",
"RegistrationAuthority",
"RelatedPerson",
"Researcher",
"ResearchGroup",
"RightsHolder",
"Sponsor",
"Supervisor",
"Translator",
"WorkPackageLeader",
"Other"
]
},
{
"type": "null"
}
]
}
27 changes: 27 additions & 0 deletions app/models/schemas/doi/controlled_vocabularies/date_type.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"title": "DateType",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": ["string", "null"],
"anyOf": [
{
"type": "string",
"enum": [
"Accepted",
"Available",
"Copyrighted",
"Collected",
"Coverage",
"Created",
"Issued",
"Submitted",
"Updated",
"Valid",
"Withdrawn",
"Other"
]
},
{
"type": "null"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"title": "DescriptionType",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "string",
"enum": [
"Abstract",
"Methods",
"SeriesInformation",
"TableOfContents",
"TechnicalInfo",
"Other"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"title": "FunderIdentifierType",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "string",
"enum": [
"Crossref Funder ID",
"GRID",
"ISNI",
"ROR",
"Other"
]
}
14 changes: 14 additions & 0 deletions app/models/schemas/doi/controlled_vocabularies/name_type.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"title": "NameType",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": ["string", "null"],
"anyOf": [
{
"type": "string",
"enum": ["Organizational", "Personal"]
},
{
"type": "null"
}
]
}
Loading