Enforce data structure and data type consistency for JSON metadata#1421
Enforce data structure and data type consistency for JSON metadata#1421
Conversation
Implements JSON Schema for titles
KellyStathis
left a comment
There was a problem hiding this comment.
Hi @svogt0511, I noted a couple things from reading through this but haven't tested yet. Happy to take a closer look when I'm back next week!
| "if": { | ||
| "properties": { | ||
| "relationType": { | ||
| "enum": ["HasMetadata", "IsMetadataFor"] | ||
| } | ||
| } | ||
| }, | ||
| "then": { | ||
| "properties": { | ||
| "relatedMetadataScheme": { | ||
| "type": "string" | ||
| }, | ||
| "schemeUri": { | ||
| "type": "string" | ||
| }, | ||
| "schemeType": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| }, | ||
| "else": { | ||
| "not": { | ||
| "anyOf": [ | ||
| { "required": ["relatedMetadataScheme"] }, | ||
| { "required": ["schemeUri"] }, | ||
| { "required": ["schemeType"] } | ||
| ] | ||
| } | ||
| }, |
There was a problem hiding this comment.
@svogt0511 This restriction is in the schema documentation—that 12.c, 12.d, and 12.e should only be used when 12.b = "HasMetadata" or "IsMetadataFor": https://datacite-metadata-schema.readthedocs.io/en/4.6/properties/relatedidentifier/
However, I don't think it is in the XSD as XSD doesn't support conditional requirements like this.
In the interest of maintaining parity between the XSD and the JSON Schema, could we remove this?
There was a problem hiding this comment.
Agreeing with Kelly—let's remove this conditional since it's not imposed by the XSD.
There was a problem hiding this comment.
I think removing this logic will fix this, but I am having an issue including relatedMetadataScheme even when the relationType is HasMetadata. This was my JSON test:
"relatedIdentifiers": [
{
"schemeUri": "https://github.com/citation-style-language/schema/raw/master/csl-data.json",
"relationType": "HasMetadata",
"relatedIdentifier": "https://data.datacite.org/application/citeproc+json/10.5072/example-full",
"relatedIdentifierType": "URL",
"relatedMetadataScheme": "citeproc+json"
},Which failed with this error:
{
"errors": [
{
"source": "related_identifiers",
"title": "Object property at `/0/relatedMetadataScheme` is a disallowed additional property",
"uid": "10.1111/742r-wc63"
}
]
}| "if": { | ||
| "properties": { | ||
| "relationType": { | ||
| "enum": [ "HasMetadata", "IsMetadataFor" ] | ||
| } | ||
| } | ||
| }, | ||
| "then": { | ||
| "properties": { | ||
| "relatedItemIdentifier": { | ||
| "type": "object", | ||
| "properties": { | ||
| "relatedItemIdentifier": { | ||
| "type": "string" | ||
| }, | ||
| "relatedItemIdentifierType": { | ||
| "$ref": "controlled_vocabularies/related_identifier_type.json" | ||
| }, | ||
| "relatedMetadataScheme": { | ||
| "type": "string" | ||
| }, | ||
| "schemeURI": { | ||
| "type": "string" | ||
| }, | ||
| "schemeType": { | ||
| "type": "string" | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } | ||
| }, | ||
| "else": { | ||
| "properties": { | ||
| "relatedItemIdentifier": { | ||
| "type": "object", | ||
| "properties": { | ||
| "relatedItemIdentifier": { | ||
| "type": "string" | ||
| }, | ||
| "relatedItemIdentifierType": { | ||
| "$ref": "controlled_vocabularies/related_identifier_type.json" | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
@svogt0511 If we update RelatedIdentifier to remove this restriction (see earlier comment), we should also remove it from RelatedItem here.
| } | ||
| }, | ||
| "dependentRequired": { | ||
| "affiliationIdentifier": ["nameIdentifierScheme"] |
There was a problem hiding this comment.
@svogt0511 Should this line be as follows?
"nameIdentifier": ["nameIdentifierScheme"]
As opposed to "affiliationIdentifier": ["nameIdentifierScheme"]?
There was a problem hiding this comment.
Also, I think nameIdentifierSchema is simply required rather than having a dependency on nameIdentifier:
<xs:complexType name="nameIdentifier">
<xs:annotation>
<xs:documentation>Uniquely identifies a creator or contributor, according to various identifier schemes.</xs:documentation>
</xs:annotation>
<xs:simpleContent>
<xs:extension base="nonemptycontentStringType">
<xs:attribute name="nameIdentifierScheme" type="xs:string" use="required"/>
<xs:attribute name="schemeURI" type="xs:anyURI" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
There was a problem hiding this comment.
nameIdentifier also cannot be null since it has nonemptycontentStringType.
codycooperross
left a comment
There was a problem hiding this comment.
Took a pass on the JSON Schemas aside from relatedItem—these look great! I made some comments where there might be inconsistencies with the XSD or the current JSON representation.
Will get to relatedItem soon!
| } | ||
| }, | ||
| "dependentRequired": { | ||
| "affiliationIdentifier": ["nameIdentifierScheme"] |
There was a problem hiding this comment.
Also, I think nameIdentifierSchema is simply required rather than having a dependency on nameIdentifier:
<xs:complexType name="nameIdentifier">
<xs:annotation>
<xs:documentation>Uniquely identifies a creator or contributor, according to various identifier schemes.</xs:documentation>
</xs:annotation>
<xs:simpleContent>
<xs:extension base="nonemptycontentStringType">
<xs:attribute name="nameIdentifierScheme" type="xs:string" use="required"/>
<xs:attribute name="schemeURI" type="xs:anyURI" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
| } | ||
| }, | ||
| "dependentRequired": { | ||
| "affiliationIdentifier": ["nameIdentifierScheme"] |
There was a problem hiding this comment.
nameIdentifier also cannot be null since it has nonemptycontentStringType.
| { | ||
| "title": "publicationYear", | ||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "type": [ "integer", "string" ], |
There was a problem hiding this comment.
Is this necessary with oneOf below?
| "inclusiveMinimum": true, | ||
| "inclusiveMaximum": true | ||
| } | ||
| ], |
There was a problem hiding this comment.
Should publicationYear be required by this schema?
| { | ||
| "title": "Version", | ||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "type": [ "number", "string" ] |
There was a problem hiding this comment.
Per the XSD, this seems like it should just be a string rather than a number or a string:
<xs:element name="version" type="xs:string" minOccurs="0">
<xs:annotation>
<xs:documentation>Version number of the resource. If the primary resource has changed the version number increases.</xs:documentation>
<xs:documentation>Register a new identifier for a major version change. Individual stewards need to determine which are major vs. minor versions. May be used in conjunction with properties 11 and 12 (AlternateIdentifier and RelatedIdentifier) to indicate various information updates. May be used in conjunction with property 17 (Description) to indicate the nature and file/record range of version.</xs:documentation>
</xs:annotation>
</xs:element>
| "$ref": "geo_location_polygon.json" | ||
| } | ||
| }, | ||
| "required": [ |
There was a problem hiding this comment.
Are these meant to be in the top-level geoLocation schema? They don't seem to point to existing attributes.
| "title": "RelatedItems", | ||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "type": "array", | ||
| "minItems": 0, |
There was a problem hiding this comment.
Also a clear pattern with "minItems": 0, but are these necessary?
| "schemeUri": { | ||
| "type": "string" | ||
| }, | ||
| "schemeType": { |
There was a problem hiding this comment.
I believe schemeType can be null. It is in this example: https://support.datacite.org/docs/api-create-dois#request-payload-1
| "schemeUri": { | ||
| "type": "string" | ||
| }, | ||
| "resourceTypeGeneral": { |
There was a problem hiding this comment.
This can also be null, per this example: https://support.datacite.org/docs/api-create-dois#request-payload-1
Since this is a pattern, I'm wondering if we should systematically allow null value for optional properties. What do you think, @codycooperross?
There was a problem hiding this comment.
Since the production API currently accepts null values for many fields in my testing, yes, the JSON Schema should permit null values for all fields that are not xs:string or nonemptycontentStringType in the XSD.
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "type": "object", | ||
| "properties": { | ||
| "name": { |
There was a problem hiding this comment.
This is an interesting one. I was testing with metadata pulled from some recently created/updated DOIs, to see if this validation would have impacted them.
I found this DOI which in the JSON has a contributor givenName and familyName, but not name: https://api.datacite.org/dois/10.34804/supra.2021092825
"contributors": [
{
"nameType": "Personal",
"givenName": "Jacopo",
"familyName": "Torrisi",
"affiliation": [],
"contributorType": "DataManager",
"nameIdentifiers": [
{
"nameIdentifier": "",
"nameIdentifierScheme": "ORCID"
}
]
}
],Using this metadata to create a DOI on staging failed with this error:
{
"errors": [
{
"source": "contributors",
"title": "Object at `/0` is missing required properties: name",
"uid": "10.1111/742r-wc63"
}
]
}From my understanding of the XSD, contributorName is required even if givenName and familyName are provided. And the corresponding XML for this DOI does have a contributorName:
<contributors>
<contributor contributorType="DataManager">
<contributorName nameType="Personal">Torrisi, Jacopo</contributorName>
<givenName>Jacopo</givenName>
<familyName>Torrisi</familyName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI=""/>
<affiliation affiliationIdentifierScheme="ROR"/>
</contributor>
</contributors>How is that contributorName being generated for the XML? I am just thinking through the potential impact on user who are currently providing not providing contributor.name, but are providing contributor.givenName and contributor.familyName, if we introduce this.
There was a problem hiding this comment.
Tagging @codycooperross for input into this as well :)
There was a problem hiding this comment.
With JSON -> XML, we currently generate a contributorName and creatorName based on available familyName and givenName metadata if available. See the description here: https://docs.google.com/spreadsheets/d/1Hy0KXWPxqNx-Pfh-nNFxbsUFXXVYsO8O2sDIytXQv7U/edit?gid=1806954511#gid=1806954511&range=2:2
For the sake of backwards compatibility with existing request patterns and scoping this PR, let's remove the name requirement on creator and contributor for now. Currently invalid JSON metadata, i.e. metadata that contains no name or familyName metadata, will continue to fail when validated against the XSD.
There was a problem hiding this comment.
This was always very confusing to me.
As I understand it, contributorName is always required.
If nameType = Unknown then Contributor can have a given name, family name and a completely independent name.
If nameType = Organization then Contributor has any name you want to give it. (**Although, if you are converting the type to organization to one of the other types, givenName and familyName are preserved in the DB (and probably in elasticSearch. A topic for another time.)
If nameType = Person, the Name field is generated from the givenName and familyName.
I would not remove the name requirement. I would not back out our requirements to fit incorrect data.
I thought that if there are incorrect dois in the DB, that the user would be prodded to correct them if/when the doi is updated, otherwise, there is not much we could do, except perhaps, to flag them as being incorrect at some point.
Does this make sense?
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBumps activerecord_json_validator to "~> 3.1"; replaces previous DOI JSON-schema validation with many attribute-level json: validations resolved via new schema_file_path helper; introduces Doi::INVALID_SCHEMAS; adds ~60 JSON Schema files (draft-2020-12) and updates validators, tests, and fixtures to new data shapes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
spec/requests/datacite_dois/post_geo_spec.rb (1)
140-144:⚠️ Potential issue | 🟠 MajorPayload still uses string numerics, so type-consistency validation is not being strictly exercised.
Several request fixtures send numeric geo fields as strings (e.g., Line 142, Line 143, Line 344, Line 346), while expectations assert numeric output. This can pass via coercion and weaken the “enforce data type consistency” objective. Use numeric literals in valid payloads, and add a negative case that expects 422 for string-typed numerics.
Proposed fix (representative)
- "pointLatitude" => "49.0850736", - "pointLongitude" => "-123.3300992" + "pointLatitude" => 49.0850736, + "pointLongitude" => -123.3300992 - "eastBoundLongitude" => "+123.", + "eastBoundLongitude" => 123.0, - "southBoundLatitude" => "-40", + "southBoundLatitude" => -40.0Also applies to: 233-237, 339-347, 507-515, 772-780, 829-837, 886-894
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/requests/datacite_dois/post_geo_spec.rb` around lines 140 - 144, The spec payloads use string numerics for geo fields (e.g., "geoLocationPoint" with "pointLatitude"/"pointLongitude") which masks type validation; update the valid fixtures in spec/requests/datacite_dois/post_geo_spec.rb to use numeric literals for pointLatitude and pointLongitude (and other listed fixtures at the referenced ranges) so expectations assert true numeric types, and add a new negative example in the same spec that submits pointLatitude/pointLongitude as strings and asserts a 422 response; locate and update the payloads that construct "geoLocationPoint" and add the failing-case test alongside existing positive cases.
♻️ Duplicate comments (12)
app/models/schemas/doi/resource_type.json (1)
9-11:⚠️ Potential issue | 🟠 Major
resourceTypeis over-constrained inrequired.Line 10 makes
resourceTypemandatory, which can incorrectly fail valid payloads that provide onlyresourceTypeGeneral. Keep onlyresourceTypeGeneralinrequired.Proposed fix
"required": [ - "resourceType", "resourceTypeGeneral" ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/resource_type.json` around lines 9 - 11, The schema's required array currently includes "resourceType" and "resourceTypeGeneral", which wrongly makes resourceType mandatory; update the required array in resource_type.json to only include "resourceTypeGeneral" (remove "resourceType") so payloads that supply only resourceTypeGeneral validate correctly.app/models/schemas/doi/date.json (1)
12-14:⚠️ Potential issue | 🟠 MajorAllow
dateInformationto be nullable to avoid rejecting valid payloads.Line 12 through Line 14 currently restricts
dateInformationtostring, which can break requests wheredateInformationisnull(already flagged in prior review discussion).Suggested schema adjustment
"dateInformation": { - "type": "string" + "type": ["string", "null"] }To verify impact quickly, search for existing null usage in fixtures/spec payloads:
#!/bin/bash rg -nP '"dateInformation"\s*:\s*null|dateInformation:\s*null'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/date.json` around lines 12 - 14, The schema currently forces "dateInformation" to be a string and rejects payloads where dateInformation is null; update the dateInformation schema to accept null as well (e.g., change the type to allow both "string" and "null" or use an anyOf with string and null) so existing fixtures/requests with null pass validation; update the "dateInformation" entry accordingly and run the provided grep search to verify existing fixtures/specs that use null.app/models/schemas/doi/version.json (1)
4-4:⚠️ Potential issue | 🟠 Major
versionshould stay string-only.Line 4 currently allows numeric values, which weakens type consistency for this field and conflicts with previously raised schema expectations.
Suggested fix
- "type": [ "number", "string" ] + "type": "string"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/version.json` at line 4, The schema's "version" field currently allows both number and string types; change the "version" property in the JSON schema (the "version" key in the DOI schema) to be string-only by removing "number" from its "type" entry (i.e., make the type either a single "string" or an array containing only "string") so the field enforces string values consistently.app/models/schemas/doi/affiliation.json (1)
19-21:⚠️ Potential issue | 🟠 MajorRemove
dependentRequiredto stay aligned with XSD-compatible behavior.This dependency was previously called out as inconsistent with the XSD contract. Keeping it here can reject records that should remain valid.
Suggested fix
- "dependentRequired": { - "affiliationIdentifier": ["affiliationIdentifierScheme"] - },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/affiliation.json` around lines 19 - 21, Remove the "dependentRequired" constraint from the affiliation schema to match XSD-compatible behavior: delete the dependentRequired block that ties "affiliationIdentifier" to "affiliationIdentifierScheme" in app/models/schemas/doi/affiliation.json so the schema no longer rejects records that omit the scheme; ensure no other code relies on the "dependentRequired" rule for affiliationIdentifier before committing.app/models/schemas/doi/name_identifier.json (1)
9-18:⚠️ Potential issue | 🟠 MajorFix dependency wiring: Line 17 references a non-existent property.
Line 17 uses
affiliationIdentifier, which is not defined in this schema, so the dependency won’t fire. Also, allowingnullfor identifier fields weakens consistency checks.Suggested fix
"properties": { "schemeUri": { "type": ["string", "null"] }, "nameIdentifier": { - "type": ["string", "null"] + "type": "string", + "minLength": 1 }, "nameIdentifierScheme": { - "type": ["string", "null"] + "type": "string", + "minLength": 1 } }, "dependentRequired": { - "affiliationIdentifier": ["nameIdentifierScheme"] + "nameIdentifier": ["nameIdentifierScheme"] },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/name_identifier.json` around lines 9 - 18, The dependentRequired currently references a non-existent property affiliationIdentifier; change the dependency to reference the actual field nameIdentifier (i.e., set dependentRequired: { "nameIdentifier": ["nameIdentifierScheme"] }) so the dependency fires correctly, and tighten validation by removing "null" from the type arrays for nameIdentifier and nameIdentifierScheme (make them plain "string") so identifier fields must be non-null strings.app/models/schemas/doi/geo_location.json (1)
19-23:⚠️ Potential issue | 🔴 CriticalTop-level
requiredfields are incorrect and make this schema invalid in practice.Line 19-22 requires
pointLatitude/pointLongitude, but those keys only exist insidegeoLocationPoint. With Line 23 (additionalProperties: false), valid payloads will fail.Proposed fix
- "required": [ - "pointLatitude", - "pointLongitude" - ], + "anyOf": [ + { "required": ["geoLocationPoint"] }, + { "required": ["geoLocationBox"] }, + { "required": ["geoLocationPlace"] }, + { "required": ["geoLocationPolygon"] } + ], "additionalProperties": false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/geo_location.json` around lines 19 - 23, The schema's top-level "required" list wrongly requires pointLatitude and pointLongitude (which are nested under geoLocationPoint) and with "additionalProperties": false will reject valid documents; update the schema so the top-level "required" contains the correct top-level keys (e.g., "geoLocationPoint" if that object is mandatory) and move/ensure "pointLatitude" and "pointLongitude" are required inside the geoLocationPoint subschema (or remove top-level additionalProperties restriction if you intend to allow those nested properties at top level); reference the "required", "additionalProperties", "geoLocationPoint", "pointLatitude", and "pointLongitude" symbols when making the change.app/models/schemas/doi/contributor.json (1)
18-41:⚠️ Potential issue | 🟠 MajorRequired fields are inverted relative to prior decisions (
contributorTypeshould be required;nameshould remain optional for compatibility).Line 40 currently requires
name, whilecontributorTypeis optional/nullable. This reintroduces the previously discussed compatibility risk and misses the prior requirement forcontributorType.Suggested fix
- "contributorType": { - "oneOf": [ - { - "$ref": "controlled_vocabularies/contributor_type.json" - }, - { - "type": "null" - } - ] - }, + "contributorType": { + "$ref": "controlled_vocabularies/contributor_type.json" + }, @@ - "required": [ - "name" - ] + "required": [ + "contributorType" + ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/contributor.json` around lines 18 - 41, The schema currently lists "name" as required and leaves "contributorType" optional/nullable; change the required fields so "contributorType" is required and "name" is optional to match prior decisions. Edit the JSON schema's "required" array to include "contributorType" (and remove "name"), and verify the "contributorType" definition (the oneOf ref to controlled_vocabularies/contributor_type.json and the null option) is consistent with whether null is allowed when required; adjust if you need to enforce non-null values for contributorType in the "contributorType" oneOf definition.app/models/schemas/doi/related_item.json (1)
64-111:⚠️ Potential issue | 🟠 MajorRemove conditional restriction to maintain XSD parity.
Per past review feedback, this conditional restriction for
HasMetadata/IsMetadataForshould be removed since it's not enforced in the XSD. This keeps validation behavior consistent between XML and JSON submissions.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/related_item.json` around lines 64 - 111, Remove the JSON Schema conditional that branches on relationType ("HasMetadata"/"IsMetadataFor") — specifically delete the entire "if"/"then"/"else" block and replace it with a single unconditional definition for the relatedItemIdentifier property that includes the union of fields currently only allowed in the "then" branch (relatedItemIdentifier, relatedItemIdentifierType, relatedMetadataScheme, schemeURI, schemeType) while keeping type "object", the existing $ref for relatedItemIdentifierType, and "additionalProperties": false so the schema matches the XSD behavior; update any references to relationType-based gating so relatedItemIdentifier is always validated with the extended properties.app/models/schemas/doi/related_identifier.json (2)
25-57:⚠️ Potential issue | 🟠 MajorConditional restriction and
additionalProperties: falseconflict.Two issues:
Per past review feedback, the conditional restriction for
HasMetadata/IsMetadataForshould be removed to maintain parity with the XSD.The
thenblock definesrelatedMetadataScheme,schemeUri, andschemeTypeproperties, but these are not in the mainpropertiesblock. Combined withadditionalProperties: falseat line 57, this will reject valid payloads containing these fields even whenrelationTypeisHasMetadataorIsMetadataFor.Consider moving
schemeTypeto the mainpropertiesblock (alongside the correctedrelatedMetadataSchemeandschemeUri) and removing the conditional logic entirely.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/related_identifier.json` around lines 25 - 57, Remove the conditional "if"/"then"/"else" block entirely and instead declare relatedMetadataScheme, schemeUri, and schemeType as top-level properties in the schema's main "properties" object (each as "type": "string"), leaving dependentRequired and the existing required relationships (e.g. relatedIdentifier -> relatedIdentifierType, relationType) intact; this ensures payloads with those fields are accepted while keeping "additionalProperties": false and matching the XSD parity (also keep relationType and relatedIdentifier-related fields as before).
15-17:⚠️ Potential issue | 🔴 CriticalTypo:
relatedMetaDataSchemeshould berelatedMetadataScheme.The property name uses a capital "D" (
relatedMetaDataScheme) but the conditional blocks at lines 34 and 48 use lowercase "d" (relatedMetadataScheme). This mismatch will cause validation failures when this field is provided.🐛 Proposed fix
- "relatedMetaDataScheme": { + "relatedMetadataScheme": { "type": "string" },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/related_identifier.json` around lines 15 - 17, The JSON schema property "relatedMetaDataScheme" is misspelled (capital "D") and must be renamed to "relatedMetadataScheme" to match the conditional checks elsewhere; update the key in app/models/schemas/doi/related_identifier.json from "relatedMetaDataScheme" to "relatedMetadataScheme" and search/update any other occurrences (e.g., in validation/conditional blocks that reference relatedMetadataScheme) so the schema and all references use the exact same identifier.app/models/schemas/doi/formats.json (1)
5-7:⚠️ Potential issue | 🟠 MajorIncorrect
$ref- referencessize.jsoninstead offormat.json.This schema is named "Formats" but the items reference
size.json. This appears to be a copy-paste error fromsizes.json. Should referenceformat.jsonto match the semantic intent.🐛 Proposed fix
"items": { - "$ref": "size.json" + "$ref": "format.json" }Run the following script to verify if
format.jsonexists:#!/bin/bash # Check if format.json exists alongside size.json fd -t f 'format.json' app/models/schemas/doi/ fd -t f 'size.json' app/models/schemas/doi/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/formats.json` around lines 5 - 7, The "Formats" schema's items currently reference "size.json" (copy-paste error); update the "items" $ref in app/models/schemas/doi/formats.json from "size.json" to "format.json" so the schema correctly points to the format definition, and confirm that format.json exists alongside size.json; adjust any callers expecting the old reference if needed (look for the "Formats" schema and its "items" property to locate the change).app/models/doi.rb (1)
125-127:⚠️ Potential issue | 🟠 MajorUse method-safe attribute access in
validate_json_attribute?.The guard currently relies on
self[attribute], which fails for method-backed fields likeraw_languageandraw_types. These methods returnself[:language]andself[:types]respectively, but the current check attempts direct column access viaself[:raw_language]andself[:raw_types], which don't exist as columns. This breaks validation for these attributes. Usepublic_sendwithhas_attribute?fallback to properly access both method-backed and column-backed attributes.Proposed fix
def validate_json_attribute?(attribute) - validatable? && !self[attribute].nil? && !INVALID_SCHEMAS.include?(self.schema_version) + value = + if respond_to?(attribute) + public_send(attribute) + elsif has_attribute?(attribute) + self[attribute] + end + + validatable? && value.present? && !INVALID_SCHEMAS.include?(schema_version) end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/doi.rb` around lines 125 - 127, The guard in validate_json_attribute?(attribute) uses self[attribute] which fails for method-backed fields like raw_language and raw_types; change the nil-check to obtain the attribute value via public_send(attribute) and fall back to column access with has_attribute?(attribute) ? self[attribute] : public_send(attribute) (or vice versa) so both method-backed and column-backed attributes are handled, then use that value in the existing validatable? and INVALID_SCHEMAS checks; keep the method name validate_json_attribute?, and preserve the schema/version guard (INVALID_SCHEMAS and validatable?).
🧹 Nitpick comments (6)
spec/requests/datacite_dois/post_geo_spec.rb (2)
304-1129: Geo test fixtures are heavily duplicated across contexts.The repeated geo payload/expectation blocks are costly to maintain and easy to drift. Extract shared builders (
base_attributes,geo_location_box,geo_location_polygon) and use parameterized/shared examples per geo variant.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/requests/datacite_dois/post_geo_spec.rb` around lines 304 - 1129, The spec repeats large geo payloads inside multiple context blocks and let(:valid_attributes); refactor by extracting shared builders (e.g. define helpers like base_attributes, geo_location_box, geo_location_point, geo_location_polygon) and use them inside the existing contexts that call let(:valid_attributes) and the tests that assert json.dig("data", "attributes", "geoLocations"), and replace duplicated expectation arrays with shared_examples or parameterized examples that accept the geo variant (single geo, all properties, array of geo locations); update contexts that reference geoLocations to call the shared builders and the shared_examples to reduce duplication and keep assertions DRY.
300-300: XML geoLocationBox assertions are brittle due to concatenated string matching.Comparing the full concatenated text (including sign/format artifacts) is fragile. Prefer asserting each child node value independently (or parse as Float) to avoid failures from harmless serialization/order formatting changes.
Also applies to: 731-731
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/requests/datacite_dois/post_geo_spec.rb` at line 300, The test currently asserts the entire concatenated string from doc.at_css("geoLocations").content which is brittle; instead locate the individual child nodes (e.g., the geoLocationBox children like westBoundLongitude, eastBoundLongitude, southBoundLatitude, northBoundLatitude) using doc.at_css or doc.css and assert each node's text independently (or parse each to Float and compare with expected values using exact equals or a small delta) to avoid failures from formatting/ordering changes; update the two failing assertions (the current expect on "geoLocations" and the similar one around line 731) to perform per-child assertions on those specific element names.app/models/schemas/doi/controlled_vocabularies/number_type.json (1)
4-18: Simplify duplicated constraints to reduce schema drift risk.The current schema enforces type boundaries twice (line 4 with
type: ["string", "null"]and lines 5–18 withanyOf), which is valid but increases maintenance burden. Per JSON Schema Draft 2020-12,enumarrays legally contain mixed types (includingnull), making a singleenumwith all allowed values functionally equivalent and clearer.♻️ Proposed simplification
{ "title": "NumberType", "$schema": "https://json-schema.org/draft/2020-12/schema", - "type": ["string", "null"], - "anyOf": [ - { - "type": "string", - "enum": [ - "Article", - "Chapter", - "Report", - "Other" - ] - }, - { - "type": "null" - } - ] + "enum": [ + "Article", + "Chapter", + "Report", + "Other", + null + ] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/controlled_vocabularies/number_type.json` around lines 4 - 18, Remove the duplicated type constraints in the number_type.json schema by replacing the top-level "type": ["string","null"] and the "anyOf" block with a single "enum" that lists the allowed string values plus null (e.g., ["Article","Chapter","Report","Other", null]); update the schema so only the "enum" remains (remove the "type" and "anyOf" properties) to simplify validation and avoid drift.app/models/schemas/doi/in_polygon_point.json (1)
5-17: Consider deduplicating this schema withpolygon_point.json.This file currently duplicates the same object contract, which can drift over time. Reusing a single shared schema would reduce maintenance risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/in_polygon_point.json` around lines 5 - 17, The schema in in_polygon_point.json duplicates polygon_point.json; replace the duplicated object with a single shared reference by removing the local "properties"/"required"/"additionalProperties" block and instead $ref'ing the existing polygon_point.json (or extract the shared object to a new polygon_point.json if needed), ensuring the schema still validates pointLatitude and pointLongitude and preserving additionalProperties: false; update any consumers to reference the single shared schema to avoid drift.app/models/schemas/doi/controlled_vocabularies/title_type.json (1)
4-13: Consider simplifying nullable enum pattern.The combination of
type: ["string", "null"]withanyOfis redundant. JSON Schema 2020-12 supports a cleaner pattern for nullable enums.♻️ Proposed simplification
{ "title": "TitleType", "$schema": "https://json-schema.org/draft/2020-12/schema", - "type": ["string", "null"], - "anyOf": [ - { - "type": "string", - "enum": ["AlternativeTitle", "Subtitle", "TranslatedTitle", "Other"] - }, - { - "type": "null" - } - ] + "enum": ["AlternativeTitle", "Subtitle", "TranslatedTitle", "Other", null] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/controlled_vocabularies/title_type.json` around lines 4 - 13, The schema for title_type.json uses a redundant combination of "type": ["string","null"] and an anyOf block; replace the anyOf with a single "enum" that includes null so the schema becomes a single concise definition: keep the enum values "AlternativeTitle", "Subtitle", "TranslatedTitle", "Other" and add null to the enum, removing the anyOf block (reference the title_type.json schema and its enum values to locate where to change).spec/requests/datacite_dois/post_spec.rb (1)
5-5: Remove debug-onlypprequire from request specs.This looks like leftover debugging and adds noise to the test file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/requests/datacite_dois/post_spec.rb` at line 5, Remove the debug-only require by deleting the literal require "pp" from the spec file (post_spec.rb) so the test suite no longer includes the pretty-print debug dependency; update the file to rely on standard test helpers and run the request specs to confirm no behavior changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/doi.rb`:
- Line 175: The validator is registering :geolocations but the model's attribute
is :geo_locations, so update the validation call in DOI to use :geo_locations
(and also update the schema name passed to schema_file_path from "geolocations"
to "geo_locations" if your JSON schema file is named accordingly) so that
validate_json_attribute?(:geo_locations) and the JSON schema validation for
geolocation run correctly; look for the validates line in app/models/doi.rb and
adjust the attribute symbol and schema_file_path argument to the consistent
"geo_locations" key.
- Around line 5-6: Remove the top-level debugger and stdlib requires from the
DOI model: delete require "byebug" and require "pp" from the file that defines
the Doi class, and instead add debugger gems to the development/test groups in
the Gemfile or require them only in scripts/console/debugging entrypoints; if
pretty-printing is needed in code, require "pp" locally inside the specific
method or use Rails.logger/inspect for production-safe logging. Ensure no other
code in the Doi class (class Doi) depends on those top-level requires so the
model can load without those gems present.
In `@app/models/schemas/client/subjects.json`:
- Line 17: The subjects.json schema is still using draft-04 while it $ref's
../doi/language.json (draft-2020-12), causing cross-draft validation issues;
update the local schema by setting the top-level "$schema" in subjects.json to
"https://json-schema.org/draft/2020-12/schema" and then validate and adjust any
draft-04-specific keywords/usages in subjects.json to be compliant with
draft-2020-12 so the "$ref": "../doi/language.json" resolves and validates
consistently.
In `@app/models/schemas/doi/controlled_vocabularies/contributor_type.json`:
- Around line 4-36: The shared controlled vocabulary schema currently allows
null by including "null" in the "type" array and a null branch in "anyOf";
remove the nullability from this shared enum schema by deleting "null" from the
"type" definition and removing the {"type":"null"} branch in the "anyOf" so the
schema only permits the enumerated strings (leave the "enum" list intact), and
update any parent schemas that need to allow null to model that nullability at
the parent property level instead of in contributor_type.json.
In `@app/models/schemas/doi/geo_location_polygon.json`:
- Around line 8-9: The polygon schema contains unresolved $ref values
"polygon_point" and "in_polygon_point" (and the other similar refs later in the
file) that lack the ".json" suffix used elsewhere; update those $ref entries in
app/models/schemas/doi/geo_location_polygon.json (e.g., the entries currently
set to { "$ref": "polygon_point" } and { "$ref": "in_polygon_point" } and the
two similar refs further down) to use the correct filenames with ".json" (for
example "polygon_point.json" and "in_polygon_point.json") so reference
resolution and polygon validation succeed.
In `@app/models/schemas/doi/point_latitude.json`:
- Line 4: Update the numeric types and bounds for geographic coordinate schemas:
change point_latitude.json so the "type" is numeric-only (remove "string") and
add numeric constraints "minimum": -90 and "maximum": 90, and likewise change
point_longitude.json to type number with "minimum": -180 and "maximum": 180;
ensure these files remain referenced by GeoLocationPoint, PolygonPoint,
InPolygonPoint, and GeoLocationBox so the stricter numeric-only validation and
coordinate bounds apply across those schemas.
In `@app/models/schemas/doi/point_longitude.json`:
- Line 4: The longitude schema in point_longitude.json currently allows any
number or string because it only sets "type": ["number","string"]; update the
schema so numeric values are constrained to minimum: -180 and maximum: 180 and
string values are allowed only if they represent a numeric value within
[-180,180] (e.g. replace the loose "type" with an anyOf/oneOf that has one
branch for type: "number" with minimum -180 and maximum 180 and another branch
for type: "string" with a regex pattern that only matches numeric strings in the
same range); this will ensure point_longitude.json enforces valid longitude
bounds while preserving string-compatible input.
In `@app/models/schemas/doi/publication_year.json`:
- Around line 12-18: The JSON Schema in publication_year.json uses invalid
keywords inclusiveMinimum and inclusiveMaximum (they're not part of draft
2020-12); remove those two properties from the schema so that the existing
minimum and maximum (1000 and 3000) remain as inclusive bounds, or if you
intended exclusive bounds replace them with exclusiveMinimum: true and/or
exclusiveMaximum: true; update the object that defines the "type": "integer",
"minimum", and "maximum" accordingly.
In `@app/models/schemas/doi/related_item_identifier.json`:
- Around line 15-17: The property name in the JSON schema is using inconsistent
casing: "schemeURI" should be renamed to "schemeUri" to match the rest of the
API; update the property key in this schema from schemeURI to schemeUri, and
search for and update any references or example fixtures/validators that expect
"schemeURI" so they use "schemeUri" instead to maintain consistency across
schemas and avoid validation failures.
In
`@spec/fixtures/vcr_cassettes/DataciteDoisController/PATCH_/dois/_id/when_the_title_is_changed_wrong_format/error.yml`:
- Around line 22-33: The recorded cassette for the
"when_the_title_is_changed_wrong_format" scenario now contains an authentication
failure (code: 401 and a Www-Authenticate header) instead of the expected
format-validation failure; update the fixture so the PATCH /dois/_id cassette
returns the validation error (e.g., 422 or the API's format-error code) with the
validation message payload rather than the 401 Unauthorized response—either
re-record the cassette while ensuring requests include valid auth or manually
replace the error.yml contents (remove the Www-Authenticate header and set code
and message to the format-validation error) so tests assert metadata validation
behavior.
In `@spec/requests/datacite_dois/post_spec.rb`:
- Around line 2684-2688: The before hook is turning off VCR and allowing net
connections (VCR.eject_cassette, VCR.turn_off!, WebMock.allow_net_connect!),
which leaks global test state; replace the before block with an around hook that
saves current VCR and WebMock state, yields the example, and then restores VCR
(re-enable and re-insert cassette if needed) and WebMock settings after the
example so global state is always restored even on failure — update the example
group to use an around hook that wraps the example execution and toggles
VCR.turn_off!/VCR.turn_on! and
WebMock.allow_net_connect!/WebMock.disable_net_connect! symmetrically.
---
Outside diff comments:
In `@spec/requests/datacite_dois/post_geo_spec.rb`:
- Around line 140-144: The spec payloads use string numerics for geo fields
(e.g., "geoLocationPoint" with "pointLatitude"/"pointLongitude") which masks
type validation; update the valid fixtures in
spec/requests/datacite_dois/post_geo_spec.rb to use numeric literals for
pointLatitude and pointLongitude (and other listed fixtures at the referenced
ranges) so expectations assert true numeric types, and add a new negative
example in the same spec that submits pointLatitude/pointLongitude as strings
and asserts a 422 response; locate and update the payloads that construct
"geoLocationPoint" and add the failing-case test alongside existing positive
cases.
---
Duplicate comments:
In `@app/models/doi.rb`:
- Around line 125-127: The guard in validate_json_attribute?(attribute) uses
self[attribute] which fails for method-backed fields like raw_language and
raw_types; change the nil-check to obtain the attribute value via
public_send(attribute) and fall back to column access with
has_attribute?(attribute) ? self[attribute] : public_send(attribute) (or vice
versa) so both method-backed and column-backed attributes are handled, then use
that value in the existing validatable? and INVALID_SCHEMAS checks; keep the
method name validate_json_attribute?, and preserve the schema/version guard
(INVALID_SCHEMAS and validatable?).
In `@app/models/schemas/doi/affiliation.json`:
- Around line 19-21: Remove the "dependentRequired" constraint from the
affiliation schema to match XSD-compatible behavior: delete the
dependentRequired block that ties "affiliationIdentifier" to
"affiliationIdentifierScheme" in app/models/schemas/doi/affiliation.json so the
schema no longer rejects records that omit the scheme; ensure no other code
relies on the "dependentRequired" rule for affiliationIdentifier before
committing.
In `@app/models/schemas/doi/contributor.json`:
- Around line 18-41: The schema currently lists "name" as required and leaves
"contributorType" optional/nullable; change the required fields so
"contributorType" is required and "name" is optional to match prior decisions.
Edit the JSON schema's "required" array to include "contributorType" (and remove
"name"), and verify the "contributorType" definition (the oneOf ref to
controlled_vocabularies/contributor_type.json and the null option) is consistent
with whether null is allowed when required; adjust if you need to enforce
non-null values for contributorType in the "contributorType" oneOf definition.
In `@app/models/schemas/doi/date.json`:
- Around line 12-14: The schema currently forces "dateInformation" to be a
string and rejects payloads where dateInformation is null; update the
dateInformation schema to accept null as well (e.g., change the type to allow
both "string" and "null" or use an anyOf with string and null) so existing
fixtures/requests with null pass validation; update the "dateInformation" entry
accordingly and run the provided grep search to verify existing fixtures/specs
that use null.
In `@app/models/schemas/doi/formats.json`:
- Around line 5-7: The "Formats" schema's items currently reference "size.json"
(copy-paste error); update the "items" $ref in
app/models/schemas/doi/formats.json from "size.json" to "format.json" so the
schema correctly points to the format definition, and confirm that format.json
exists alongside size.json; adjust any callers expecting the old reference if
needed (look for the "Formats" schema and its "items" property to locate the
change).
In `@app/models/schemas/doi/geo_location.json`:
- Around line 19-23: The schema's top-level "required" list wrongly requires
pointLatitude and pointLongitude (which are nested under geoLocationPoint) and
with "additionalProperties": false will reject valid documents; update the
schema so the top-level "required" contains the correct top-level keys (e.g.,
"geoLocationPoint" if that object is mandatory) and move/ensure "pointLatitude"
and "pointLongitude" are required inside the geoLocationPoint subschema (or
remove top-level additionalProperties restriction if you intend to allow those
nested properties at top level); reference the "required",
"additionalProperties", "geoLocationPoint", "pointLatitude", and
"pointLongitude" symbols when making the change.
In `@app/models/schemas/doi/name_identifier.json`:
- Around line 9-18: The dependentRequired currently references a non-existent
property affiliationIdentifier; change the dependency to reference the actual
field nameIdentifier (i.e., set dependentRequired: { "nameIdentifier":
["nameIdentifierScheme"] }) so the dependency fires correctly, and tighten
validation by removing "null" from the type arrays for nameIdentifier and
nameIdentifierScheme (make them plain "string") so identifier fields must be
non-null strings.
In `@app/models/schemas/doi/related_identifier.json`:
- Around line 25-57: Remove the conditional "if"/"then"/"else" block entirely
and instead declare relatedMetadataScheme, schemeUri, and schemeType as
top-level properties in the schema's main "properties" object (each as "type":
"string"), leaving dependentRequired and the existing required relationships
(e.g. relatedIdentifier -> relatedIdentifierType, relationType) intact; this
ensures payloads with those fields are accepted while keeping
"additionalProperties": false and matching the XSD parity (also keep
relationType and relatedIdentifier-related fields as before).
- Around line 15-17: The JSON schema property "relatedMetaDataScheme" is
misspelled (capital "D") and must be renamed to "relatedMetadataScheme" to match
the conditional checks elsewhere; update the key in
app/models/schemas/doi/related_identifier.json from "relatedMetaDataScheme" to
"relatedMetadataScheme" and search/update any other occurrences (e.g., in
validation/conditional blocks that reference relatedMetadataScheme) so the
schema and all references use the exact same identifier.
In `@app/models/schemas/doi/related_item.json`:
- Around line 64-111: Remove the JSON Schema conditional that branches on
relationType ("HasMetadata"/"IsMetadataFor") — specifically delete the entire
"if"/"then"/"else" block and replace it with a single unconditional definition
for the relatedItemIdentifier property that includes the union of fields
currently only allowed in the "then" branch (relatedItemIdentifier,
relatedItemIdentifierType, relatedMetadataScheme, schemeURI, schemeType) while
keeping type "object", the existing $ref for relatedItemIdentifierType, and
"additionalProperties": false so the schema matches the XSD behavior; update any
references to relationType-based gating so relatedItemIdentifier is always
validated with the extended properties.
In `@app/models/schemas/doi/resource_type.json`:
- Around line 9-11: The schema's required array currently includes
"resourceType" and "resourceTypeGeneral", which wrongly makes resourceType
mandatory; update the required array in resource_type.json to only include
"resourceTypeGeneral" (remove "resourceType") so payloads that supply only
resourceTypeGeneral validate correctly.
In `@app/models/schemas/doi/version.json`:
- Line 4: The schema's "version" field currently allows both number and string
types; change the "version" property in the JSON schema (the "version" key in
the DOI schema) to be string-only by removing "number" from its "type" entry
(i.e., make the type either a single "string" or an array containing only
"string") so the field enforces string values consistently.
---
Nitpick comments:
In `@app/models/schemas/doi/controlled_vocabularies/number_type.json`:
- Around line 4-18: Remove the duplicated type constraints in the
number_type.json schema by replacing the top-level "type": ["string","null"] and
the "anyOf" block with a single "enum" that lists the allowed string values plus
null (e.g., ["Article","Chapter","Report","Other", null]); update the schema so
only the "enum" remains (remove the "type" and "anyOf" properties) to simplify
validation and avoid drift.
In `@app/models/schemas/doi/controlled_vocabularies/title_type.json`:
- Around line 4-13: The schema for title_type.json uses a redundant combination
of "type": ["string","null"] and an anyOf block; replace the anyOf with a single
"enum" that includes null so the schema becomes a single concise definition:
keep the enum values "AlternativeTitle", "Subtitle", "TranslatedTitle", "Other"
and add null to the enum, removing the anyOf block (reference the
title_type.json schema and its enum values to locate where to change).
In `@app/models/schemas/doi/in_polygon_point.json`:
- Around line 5-17: The schema in in_polygon_point.json duplicates
polygon_point.json; replace the duplicated object with a single shared reference
by removing the local "properties"/"required"/"additionalProperties" block and
instead $ref'ing the existing polygon_point.json (or extract the shared object
to a new polygon_point.json if needed), ensuring the schema still validates
pointLatitude and pointLongitude and preserving additionalProperties: false;
update any consumers to reference the single shared schema to avoid drift.
In `@spec/requests/datacite_dois/post_geo_spec.rb`:
- Around line 304-1129: The spec repeats large geo payloads inside multiple
context blocks and let(:valid_attributes); refactor by extracting shared
builders (e.g. define helpers like base_attributes, geo_location_box,
geo_location_point, geo_location_polygon) and use them inside the existing
contexts that call let(:valid_attributes) and the tests that assert
json.dig("data", "attributes", "geoLocations"), and replace duplicated
expectation arrays with shared_examples or parameterized examples that accept
the geo variant (single geo, all properties, array of geo locations); update
contexts that reference geoLocations to call the shared builders and the
shared_examples to reduce duplication and keep assertions DRY.
- Line 300: The test currently asserts the entire concatenated string from
doc.at_css("geoLocations").content which is brittle; instead locate the
individual child nodes (e.g., the geoLocationBox children like
westBoundLongitude, eastBoundLongitude, southBoundLatitude, northBoundLatitude)
using doc.at_css or doc.css and assert each node's text independently (or parse
each to Float and compare with expected values using exact equals or a small
delta) to avoid failures from formatting/ordering changes; update the two
failing assertions (the current expect on "geoLocations" and the similar one
around line 731) to perform per-child assertions on those specific element
names.
In `@spec/requests/datacite_dois/post_spec.rb`:
- Line 5: Remove the debug-only require by deleting the literal require "pp"
from the spec file (post_spec.rb) so the test suite no longer includes the
pretty-print debug dependency; update the file to rely on standard test helpers
and run the request specs to confirm no behavior changed.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
Gemfile.lockis excluded by!**/*.lock
📒 Files selected for processing (72)
Gemfileapp/models/doi.rbapp/models/schemas/client/subjects.jsonapp/models/schemas/doi/affiliation.jsonapp/models/schemas/doi/affiliations.jsonapp/models/schemas/doi/alternate_identifier.jsonapp/models/schemas/doi/alternate_identifiers.jsonapp/models/schemas/doi/contributor.jsonapp/models/schemas/doi/contributors.jsonapp/models/schemas/doi/controlled_vocabularies/contributor_type.jsonapp/models/schemas/doi/controlled_vocabularies/date_type.jsonapp/models/schemas/doi/controlled_vocabularies/description_type.jsonapp/models/schemas/doi/controlled_vocabularies/funder_identifier_type.jsonapp/models/schemas/doi/controlled_vocabularies/name_type.jsonapp/models/schemas/doi/controlled_vocabularies/number_type.jsonapp/models/schemas/doi/controlled_vocabularies/related_identifier_type.jsonapp/models/schemas/doi/controlled_vocabularies/related_item_type.jsonapp/models/schemas/doi/controlled_vocabularies/relation_type.jsonapp/models/schemas/doi/controlled_vocabularies/resource_type_general.jsonapp/models/schemas/doi/controlled_vocabularies/title_type.jsonapp/models/schemas/doi/creator.jsonapp/models/schemas/doi/creators.jsonapp/models/schemas/doi/date.jsonapp/models/schemas/doi/dates.jsonapp/models/schemas/doi/description.jsonapp/models/schemas/doi/descriptions.jsonapp/models/schemas/doi/format.jsonapp/models/schemas/doi/formats.jsonapp/models/schemas/doi/funding_reference.jsonapp/models/schemas/doi/funding_references.jsonapp/models/schemas/doi/geo_location.jsonapp/models/schemas/doi/geo_location_box.jsonapp/models/schemas/doi/geo_location_place.jsonapp/models/schemas/doi/geo_location_point.jsonapp/models/schemas/doi/geo_location_polygon.jsonapp/models/schemas/doi/geo_locations.jsonapp/models/schemas/doi/identifier.jsonapp/models/schemas/doi/in_polygon_point.jsonapp/models/schemas/doi/language.jsonapp/models/schemas/doi/name_identifier.jsonapp/models/schemas/doi/name_identifiers.jsonapp/models/schemas/doi/point_latitude.jsonapp/models/schemas/doi/point_longitude.jsonapp/models/schemas/doi/polygon_point.jsonapp/models/schemas/doi/publication_year.jsonapp/models/schemas/doi/publisher.jsonapp/models/schemas/doi/related_identifier.jsonapp/models/schemas/doi/related_identifiers.jsonapp/models/schemas/doi/related_item.jsonapp/models/schemas/doi/related_item_identifier.jsonapp/models/schemas/doi/related_items.jsonapp/models/schemas/doi/resource_type.jsonapp/models/schemas/doi/rights.jsonapp/models/schemas/doi/rights_list.jsonapp/models/schemas/doi/size.jsonapp/models/schemas/doi/sizes.jsonapp/models/schemas/doi/subject.jsonapp/models/schemas/doi/subjects.jsonapp/models/schemas/doi/title.jsonapp/models/schemas/doi/titles.jsonapp/models/schemas/doi/version.jsonapp/validators/xml_schema_validator.rbspec/concerns/indexable_spec.rbspec/factories/doi.rbspec/fixtures/files/datacite-user-example.jsonspec/fixtures/files/nasa_error.jsonspec/fixtures/vcr_cassettes/DataciteDoisController/PATCH_/dois/_id/when_the_title_is_changed_wrong_format/error.ymlspec/graphql/types/work_type_spec.rbspec/models/doi_spec.rbspec/requests/datacite_dois/patch_spec.rbspec/requests/datacite_dois/post_geo_spec.rbspec/requests/datacite_dois/post_spec.rb
| require "byebug" | ||
| require "pp" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify where byebug is required and whether it is scoped to non-production groups.
fd -a "Gemfile" "gems.rb" "Gemfile.lock" --type f
echo "---"
echo "=== Checking Gemfile for byebug gem group configuration ==="
rg -n "byebug|debug" Gemfile gems.rb Gemfile.lock 2>/dev/null || echo "No matches found"
echo "---"
echo "=== Checking for require statements in app/ and spec/ ==="
rg -n 'require "byebug"|require "pp"' app spec 2>/dev/null || echo "No require statements found"
echo "---"
echo "=== Checking app/models/doi.rb directly ==="
head -20 app/models/doi.rbRepository: datacite/lupo
Length of output: 1609
Remove byebug require from model code.
Requiring debugger gems in production model code causes boot failures in environments where the gem is not installed and is a release blocker. The require "pp" should also be removed—standard library utilities should not be required at class definition time.
Fix
-require "byebug"
-require "pp"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| require "byebug" | |
| require "pp" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/doi.rb` around lines 5 - 6, Remove the top-level debugger and
stdlib requires from the DOI model: delete require "byebug" and require "pp"
from the file that defines the Doi class, and instead add debugger gems to the
development/test groups in the Gemfile or require them only in
scripts/console/debugging entrypoints; if pretty-printing is needed in code,
require "pp" locally inside the specific method or use Rails.logger/inspect for
production-safe logging. Ensure no other code in the Doi class (class Doi)
depends on those top-level requires so the model can load without those gems
present.
| validates :version, if: proc { |doi| doi.validate_json_attribute?(:version) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("version") } }, unless: :only_validate | ||
| validates :rights_list, if: proc { |doi| doi.validate_json_attribute?(:rights_list) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("rights_list") } }, unless: :only_validate | ||
| validates :descriptions, if: proc { |doi| doi.validate_json_attribute?(:descriptions) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("descriptions") } }, unless: :only_validate | ||
| validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate |
There was a problem hiding this comment.
Fix geolocation validator key mismatch.
Line 175 validates :geolocations, but this model uses geo_locations. This typo can prevent geolocation JSON-schema validation from running.
🔧 Proposed fix
- validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
+ validates :geo_locations, if: proc { |doi| doi.validate_json_attribute?(:geo_locations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate | |
| validates :geo_locations, if: proc { |doi| doi.validate_json_attribute?(:geo_locations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/doi.rb` at line 175, The validator is registering :geolocations
but the model's attribute is :geo_locations, so update the validation call in
DOI to use :geo_locations (and also update the schema name passed to
schema_file_path from "geolocations" to "geo_locations" if your JSON schema file
is named accordingly) so that validate_json_attribute?(:geo_locations) and the
JSON schema validation for geolocation run correctly; look for the validates
line in app/models/doi.rb and adjust the attribute symbol and schema_file_path
argument to the consistent "geo_locations" key.
| { "$ref": "polygon_point" }, | ||
| { "$ref": "in_polygon_point" } |
There was a problem hiding this comment.
Fix unresolved $ref paths in polygon rules.
Line 8, Line 9, Line 14, and Line 18 reference schema names without .json, unlike the rest of this schema set. This can break reference resolution and make polygon validation fail.
Proposed fix
- { "$ref": "polygon_point" },
- { "$ref": "in_polygon_point" }
+ { "$ref": "polygon_point.json" },
+ { "$ref": "in_polygon_point.json" }
@@
- "contains": { "$ref": "polygon_point" },
+ "contains": { "$ref": "polygon_point.json" },
@@
- "contains": { "$ref": "in_polygon_point" },
+ "contains": { "$ref": "in_polygon_point.json" },Also applies to: 14-18
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/schemas/doi/geo_location_polygon.json` around lines 8 - 9, The
polygon schema contains unresolved $ref values "polygon_point" and
"in_polygon_point" (and the other similar refs later in the file) that lack the
".json" suffix used elsewhere; update those $ref entries in
app/models/schemas/doi/geo_location_polygon.json (e.g., the entries currently
set to { "$ref": "polygon_point" } and { "$ref": "in_polygon_point" } and the
two similar refs further down) to use the correct filenames with ".json" (for
example "polygon_point.json" and "in_polygon_point.json") so reference
resolution and polygon validation succeed.
| { | ||
| "title": "PointLongitude", | ||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "type": [ "number", "string" ] |
There was a problem hiding this comment.
Longitude lacks range validation.
Line 4 allows values outside valid longitude bounds. Please constrain accepted values to [-180, 180] (including string-form numeric input if compatibility is needed).
Suggested fix
- "type": [ "number", "string" ]
+ "oneOf": [
+ {
+ "type": "number",
+ "minimum": -180,
+ "maximum": 180
+ },
+ {
+ "type": "string",
+ "pattern": "^-?(?:180(?:\\.0+)?|(?:1[0-7]\\d|\\d?\\d)(?:\\.\\d+)?)$"
+ }
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "type": [ "number", "string" ] | |
| "oneOf": [ | |
| { | |
| "type": "number", | |
| "minimum": -180, | |
| "maximum": 180 | |
| }, | |
| { | |
| "type": "string", | |
| "pattern": "^-?(?:180(?:\\.0+)?|(?:1[0-7]\\d|\\d?\\d)(?:\\.\\d+)?)$" | |
| } | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/schemas/doi/point_longitude.json` at line 4, The longitude schema
in point_longitude.json currently allows any number or string because it only
sets "type": ["number","string"]; update the schema so numeric values are
constrained to minimum: -180 and maximum: 180 and string values are allowed only
if they represent a numeric value within [-180,180] (e.g. replace the loose
"type" with an anyOf/oneOf that has one branch for type: "number" with minimum
-180 and maximum 180 and another branch for type: "string" with a regex pattern
that only matches numeric strings in the same range); this will ensure
point_longitude.json enforces valid longitude bounds while preserving
string-compatible input.
| { | ||
| "type": "integer", | ||
| "minimum": 1000, | ||
| "maximum": 3000, | ||
| "inclusiveMinimum": true, | ||
| "inclusiveMaximum": true | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n app/models/schemas/doi/publication_year.jsonRepository: datacite/lupo
Length of output: 715
🌐 Web query:
JSON Schema draft 2020-12 inclusiveMinimum inclusiveMaximum keywords specification
💡 Result:
In JSON Schema draft 2020-12, the keywords inclusiveMinimum and inclusiveMaximum are not part of the specification. Numeric range constraints are expressed with:
minimum: inclusive lower bound (instance must be ≥minimum) [1]maximum: inclusive upper bound (instance must be ≤maximum) [1]exclusiveMinimum: exclusive lower bound (instance must be >exclusiveMinimum) and its value MUST be a number [1]exclusiveMaximum: exclusive upper bound (instance must be <exclusiveMaximum) and its value MUST be a number [1]
Note: In older specs (e.g., the IETF draft that corresponds to draft-04-era behavior), exclusiveMinimum/exclusiveMaximum were boolean modifiers of minimum/maximum rather than independent numeric bounds [2].
Sources
[1] JSON Schema 2020-12 Validation vocabulary (numeric keywords: minimum/maximum/exclusiveMinimum/exclusiveMaximum) (json-schema.org)
[2] Older (draft-04-era) behavior: exclusiveMinimum/exclusiveMaximum as booleans (datatracker.ietf.org)
Citations:
- 1: https://json-schema.org/draft/2020-12/json-schema-validation?utm_source=openai
- 2: https://datatracker.ietf.org/doc/html/draft-wright-json-schema-validation-00?utm_source=openai
Invalid JSON Schema keywords: inclusiveMinimum and inclusiveMaximum.
These keywords do not exist in JSON Schema draft 2020-12 (as declared on line 3) and will be silently ignored by validators. In draft 2020-12, minimum and maximum are inclusive by default. Use exclusiveMinimum and exclusiveMaximum for exclusive bounds.
🐛 Proposed fix
{
"type": "integer",
"minimum": 1000,
- "maximum": 3000,
- "inclusiveMinimum": true,
- "inclusiveMaximum": true
+ "maximum": 3000
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/schemas/doi/publication_year.json` around lines 12 - 18, The JSON
Schema in publication_year.json uses invalid keywords inclusiveMinimum and
inclusiveMaximum (they're not part of draft 2020-12); remove those two
properties from the schema so that the existing minimum and maximum (1000 and
3000) remain as inclusive bounds, or if you intended exclusive bounds replace
them with exclusiveMinimum: true and/or exclusiveMaximum: true; update the
object that defines the "type": "integer", "minimum", and "maximum" accordingly.
| "schemeURI": { | ||
| "type": "string" | ||
| }, |
There was a problem hiding this comment.
Use consistent property name schemeUri (not schemeURI).
Line 15 introduces a casing mismatch that can reject valid metadata and create inconsistent API behavior across schemas.
Suggested fix
- "schemeURI": {
+ "schemeUri": {
"type": "string"
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "schemeURI": { | |
| "type": "string" | |
| }, | |
| "schemeUri": { | |
| "type": "string" | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/schemas/doi/related_item_identifier.json` around lines 15 - 17,
The property name in the JSON schema is using inconsistent casing: "schemeURI"
should be renamed to "schemeUri" to match the rest of the API; update the
property key in this schema from schemeURI to schemeUri, and search for and
update any references or example fixtures/validators that expect "schemeURI" so
they use "schemeUri" instead to maintain consistency across schemas and avoid
validation failures.
| code: 401 | ||
| message: Unauthorized | ||
| headers: | ||
| Date: | ||
| - Wed, 04 May 2022 17:40:51 GMT | ||
| Content-Type: | ||
| - application/json; charset=utf-8 | ||
| - Thu, 22 May 2025 13:11:35 GMT | ||
| Content-Length: | ||
| - '0' | ||
| Connection: | ||
| - keep-alive | ||
| Status: | ||
| - 200 OK | ||
| Cache-Control: | ||
| - max-age=0, private, must-revalidate | ||
| Vary: | ||
| - Origin | ||
| Etag: | ||
| - W/"4779b1c8cb81d39142cef21ab1d60307" | ||
| X-Runtime: | ||
| - '0.261111' | ||
| X-Request-Id: | ||
| - eda1eb1c-86ed-44bc-abb8-e241e0c9da76 | ||
| X-Powered-By: | ||
| - Phusion Passenger 5.2.3 | ||
| Server: | ||
| - nginx/1.12.2 + Phusion Passenger 5.2.3 | ||
| Content-Encoding: | ||
| - gzip | ||
| Www-Authenticate: | ||
| - Basic realm="handle", Handle sessionId="node0jpq58pljxzd91vdly5e1oc72o5", | ||
| nonce="XJEPmnoCbjbSEdGkeuQcGQ==", error="Identity not verified" |
There was a problem hiding this comment.
Cassette now captures auth failure instead of the intended format-validation failure.
For a “wrong format” scenario, recording a 401 unauthorized response can make the test pass/fail for the wrong reason and stop asserting metadata validation behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@spec/fixtures/vcr_cassettes/DataciteDoisController/PATCH_/dois/_id/when_the_title_is_changed_wrong_format/error.yml`
around lines 22 - 33, The recorded cassette for the
"when_the_title_is_changed_wrong_format" scenario now contains an authentication
failure (code: 401 and a Www-Authenticate header) instead of the expected
format-validation failure; update the fixture so the PATCH /dois/_id cassette
returns the validation error (e.g., 422 or the API's format-error code) with the
validation message payload rather than the 401 Unauthorized response—either
re-record the cassette while ensuring requests include valid auth or manually
replace the error.yml contents (remove the Www-Authenticate header and set code
and message to the format-validation error) so tests assert metadata validation
behavior.
| before do | ||
| VCR.eject_cassette | ||
| VCR.turn_off! | ||
| WebMock.allow_net_connect! | ||
| end |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all VCR/WebMock state toggles in the spec file
rg -n "VCR\\.turn_off!|VCR\\.turn_on!|VCR\\.turned_off|WebMock\\.allow_net_connect!|WebMock\\.disable_net_connect!" spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 181
🏁 Script executed:
#!/bin/bash
# Read the specific section with context (lines 2680-2700) to understand the structure
sed -n '2680,2700p' spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 484
🏁 Script executed:
#!/bin/bash
# Look for any after hooks or teardown that might restore state around this section
sed -n '2620,2750p' spec/requests/datacite_dois/post_spec.rb | head -100Repository: datacite/lupo
Length of output: 4656
🏁 Script executed:
#!/bin/bash
# Look at more context to find where the describe block ends and if there's an after hook
sed -n '2720,2850p' spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 66
🏁 Script executed:
#!/bin/bash
# Search for any after hooks in the file around this section
rg -B 2 -A 2 "^\s*(after|end)" spec/requests/datacite_dois/post_spec.rb | sed -n '1,150p'Repository: datacite/lupo
Length of output: 3604
🏁 Script executed:
#!/bin/bash
# Count describe and end to see structure, look for corresponding end to the describe block starting at 2629
wc -l spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 102
🏁 Script executed:
#!/bin/bash
# Check the very end of the file
tail -5 spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 95
🏁 Script executed:
#!/bin/bash
# Check if there's anything between line 2720 and end of file
sed -n '2715,2721p' spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 227
🏁 Script executed:
#!/bin/bash
# Check if there are any after hooks at the top level or file level that might restore
rg "^\s*after\s" spec/requests/datacite_dois/post_spec.rbRepository: datacite/lupo
Length of output: 39
Avoid leaking global VCR/WebMock state from this example group.
VCR.turn_off! + WebMock.allow_net_connect! in the before hook are not restored, causing global state to leak and making later specs flaky.
Use an around hook instead to ensure state restoration:
Safer pattern
- before do
- VCR.eject_cassette
- VCR.turn_off!
- WebMock.allow_net_connect!
- end
+ around do |example|
+ VCR.turned_off do
+ WebMock.allow_net_connect!
+ example.run
+ ensure
+ WebMock.disable_net_connect!(allow_localhost: true)
+ end
+ end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| before do | |
| VCR.eject_cassette | |
| VCR.turn_off! | |
| WebMock.allow_net_connect! | |
| end | |
| around do |example| | |
| VCR.turned_off do | |
| WebMock.allow_net_connect! | |
| example.run | |
| ensure | |
| WebMock.disable_net_connect!(allow_localhost: true) | |
| end | |
| end |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@spec/requests/datacite_dois/post_spec.rb` around lines 2684 - 2688, The
before hook is turning off VCR and allowing net connections (VCR.eject_cassette,
VCR.turn_off!, WebMock.allow_net_connect!), which leaks global test state;
replace the before block with an around hook that saves current VCR and WebMock
state, yields the example, and then restores VCR (re-enable and re-insert
cassette if needed) and WebMock settings after the example so global state is
always restored even on failure — update the example group to use an around hook
that wraps the example execution and toggles VCR.turn_off!/VCR.turn_on! and
WebMock.allow_net_connect!/WebMock.disable_net_connect! symmetrically.
There was a problem hiding this comment.
@svogt0511 Drawing your attention to this. Probably needs and after or an around block to take care of this.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
app/models/schemas/doi/related_identifier.json (1)
18-18:⚠️ Potential issue | 🔴 CriticalFix
relatedMetaDataSchemevsrelatedMetadataSchemenaming mismatch.Line 18 uses
relatedMetaDataScheme, while Line 37/Line 54 userelatedMetadataScheme. WithadditionalProperties: false(Line 63), this can reject the canonical field and/or validate the wrong one.Proposed fix
- "relatedMetaDataScheme": { + "relatedMetadataScheme": { "type": "string" }, + "schemeType": { + "type": "string" + },#!/bin/bash set -euo pipefail python - <<'PY' import json, pathlib p = pathlib.Path("app/models/schemas/doi/related_identifier.json") s = json.loads(p.read_text()) top = set(s.get("properties", {}).keys()) then_props = set(s.get("then", {}).get("properties", {}).keys()) else_props = set() for clause in s.get("else", {}).get("not", {}).get("anyOf", []): else_props.update(clause.get("required", [])) print("Top-level keys:", sorted(top)) print("Then keys:", sorted(then_props)) print("Else-disallowed keys:", sorted(else_props)) print("Mismatched related metadata key names present:", "relatedMetaDataScheme" in top and "relatedMetadataScheme" in (then_props | else_props)) print("Keys referenced in conditionals but missing from top-level properties:", sorted((then_props | else_props) - top)) PYAlso applies to: 37-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/schemas/doi/related_identifier.json` at line 18, The schema has a naming mismatch between relatedMetaDataScheme and relatedMetadataScheme causing validation failures under additionalProperties: false; update the schema so the property name is consistent across the top-level "properties", the conditional "then" and "else" clauses, and any "required" arrays—prefer the canonical relatedMetadataScheme (change relatedMetaDataScheme to relatedMetadataScheme) or add an explicit alias property and mirror the conditional/required references to both names; ensure the "properties" object includes the final name(s) and that no conditional/required entries reference a name that isn't declared so validation succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/schemas/doi/related_identifier.json`:
- Around line 28-50: The if/then branch currently only specifies types for
relatedMetadataScheme, schemeUri, schemeType, and relationTypeInformation when
relationType is "HasMetadata" or "IsMetadataFor" but doesn't enforce them;
update the "then" block to include a required array that lists
"relatedMetadataScheme", "schemeUri", "schemeType", and
"relationTypeInformation" so those properties are mandatory whenever
relationType equals HasMetadata or IsMetadataFor (referencing the relationType
enum and the then properties relatedMetadataScheme, schemeUri, schemeType,
relationTypeInformation).
---
Duplicate comments:
In `@app/models/schemas/doi/related_identifier.json`:
- Line 18: The schema has a naming mismatch between relatedMetaDataScheme and
relatedMetadataScheme causing validation failures under additionalProperties:
false; update the schema so the property name is consistent across the top-level
"properties", the conditional "then" and "else" clauses, and any "required"
arrays—prefer the canonical relatedMetadataScheme (change relatedMetaDataScheme
to relatedMetadataScheme) or add an explicit alias property and mirror the
conditional/required references to both names; ensure the "properties" object
includes the final name(s) and that no conditional/required entries reference a
name that isn't declared so validation succeeds.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5f3cd11d-353d-4147-8bbe-3457b38e3cfa
📒 Files selected for processing (6)
app/models/schemas/doi/controlled_vocabularies/related_identifier_type.jsonapp/models/schemas/doi/controlled_vocabularies/related_item_type.jsonapp/models/schemas/doi/controlled_vocabularies/relation_type.jsonapp/models/schemas/doi/controlled_vocabularies/resource_type_general.jsonapp/models/schemas/doi/related_identifier.jsonapp/models/schemas/doi/related_item.json
🚧 Files skipped from review as they are similar to previous changes (5)
- app/models/schemas/doi/controlled_vocabularies/related_item_type.json
- app/models/schemas/doi/related_item.json
- app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
- app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
- app/models/schemas/doi/controlled_vocabularies/relation_type.json
…. Can we share the dois definition?
…led vocabulary term because checking happens prior to save and the API adds null as the value.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/schemas/doi/title.json`:
- Around line 6-8: The "title" schema currently only enforces type:string so
empty or whitespace-only titles are allowed; update the "title" property (in
app/models/schemas/doi/title.json) to require non-empty, non-blank values by
adding constraints such as "minLength": 1 and a "pattern" that requires at least
one non-whitespace character (e.g., \\\\S), and ensure the existing required
declaration that references "title" still applies; apply the same constraints
for any other title occurrence mentioned in the schema.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c82f3ce-0cef-4020-b4ee-a276dd35c1ef
📒 Files selected for processing (12)
app/models/schemas/doi/controlled_vocabularies/contributor_type.jsonapp/models/schemas/doi/controlled_vocabularies/date_type.jsonapp/models/schemas/doi/controlled_vocabularies/description_type.jsonapp/models/schemas/doi/controlled_vocabularies/funder_identifier_type.jsonapp/models/schemas/doi/controlled_vocabularies/name_type.jsonapp/models/schemas/doi/controlled_vocabularies/number_type.jsonapp/models/schemas/doi/controlled_vocabularies/related_identifier_type.jsonapp/models/schemas/doi/controlled_vocabularies/related_item_type.jsonapp/models/schemas/doi/controlled_vocabularies/relation_type.jsonapp/models/schemas/doi/controlled_vocabularies/resource_type_general.jsonapp/models/schemas/doi/controlled_vocabularies/title_type.jsonapp/models/schemas/doi/title.json
🚧 Files skipped from review as they are similar to previous changes (8)
- app/models/schemas/doi/controlled_vocabularies/funder_identifier_type.json
- app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
- app/models/schemas/doi/controlled_vocabularies/title_type.json
- app/models/schemas/doi/controlled_vocabularies/number_type.json
- app/models/schemas/doi/controlled_vocabularies/date_type.json
- app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
- app/models/schemas/doi/controlled_vocabularies/relation_type.json
- app/models/schemas/doi/controlled_vocabularies/contributor_type.json
| "title": { | ||
| "type": "string" | ||
| }, |
There was a problem hiding this comment.
Disallow empty/blank required titles.
Line 6–Line 8 only enforce type: "string", so "" (or whitespace-only values) will still pass even though Line 17–Line 19 marks title as required. Please add content constraints.
Suggested schema patch
"title": {
- "type": "string"
+ "type": "string",
+ "minLength": 1,
+ "pattern": ".*\\S.*"
},Also applies to: 17-19
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/schemas/doi/title.json` around lines 6 - 8, The "title" schema
currently only enforces type:string so empty or whitespace-only titles are
allowed; update the "title" property (in app/models/schemas/doi/title.json) to
require non-empty, non-blank values by adding constraints such as "minLength": 1
and a "pattern" that requires at least one non-whitespace character (e.g.,
\\\\S), and ensure the existing required declaration that references "title"
still applies; apply the same constraints for any other title occurrence
mentioned in the schema.
Purpose
closes: https://github.com/datacite/product-backlog/issues/325
Approach
See #1341 for the approach
Open Questions and Pre-Merge TODOs
Learning
Types of changes
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Reviewer, please remember our guidelines:
Summary by CodeRabbit
New Features
Refactor
Chore
Tests