Skip to content

Enforce data structure and data type consistency for JSON metadata#1421

Open
svogt0511 wants to merge 96 commits intomasterfrom
pb325-json-metadata-validation
Open

Enforce data structure and data type consistency for JSON metadata#1421
svogt0511 wants to merge 96 commits intomasterfrom
pb325-json-metadata-validation

Conversation

@svogt0511
Copy link
Contributor

@svogt0511 svogt0511 commented Oct 30, 2025

Purpose

closes: https://github.com/datacite/product-backlog/issues/325

Approach

See #1341 for the approach

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

Summary by CodeRabbit

  • New Features

    • Expanded JSON Schema coverage for DOI metadata (updated draft) with richer controlled vocabularies and new schema validations.
  • Refactor

    • Validation now resolves schemas dynamically and centralizes JSON-schema checks for many metadata fields.
  • Chore

    • Upgraded a JSON validation dependency to a newer version series.
  • Tests

    • Fixtures and specs adjusted to array-based metadata shapes and stricter validation/error expectations.

@svogt0511 svogt0511 self-assigned this Oct 30, 2025
Copy link
Collaborator

@KellyStathis KellyStathis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @svogt0511, I noted a couple things from reading through this but haven't tested yet. Happy to take a closer look when I'm back next week!

Comment on lines +25 to +53
"if": {
"properties": {
"relationType": {
"enum": ["HasMetadata", "IsMetadataFor"]
}
}
},
"then": {
"properties": {
"relatedMetadataScheme": {
"type": "string"
},
"schemeUri": {
"type": "string"
},
"schemeType": {
"type": "string"
}
}
},
"else": {
"not": {
"anyOf": [
{ "required": ["relatedMetadataScheme"] },
{ "required": ["schemeUri"] },
{ "required": ["schemeType"] }
]
}
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svogt0511 This restriction is in the schema documentation—that 12.c, 12.d, and 12.e should only be used when 12.b = "HasMetadata" or "IsMetadataFor": https://datacite-metadata-schema.readthedocs.io/en/4.6/properties/relatedidentifier/

However, I don't think it is in the XSD as XSD doesn't support conditional requirements like this.

In the interest of maintaining parity between the XSD and the JSON Schema, could we remove this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with Kelly—let's remove this conditional since it's not imposed by the XSD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this logic will fix this, but I am having an issue including relatedMetadataScheme even when the relationType is HasMetadata. This was my JSON test:

 "relatedIdentifiers": [
        {
          "schemeUri": "https://github.com/citation-style-language/schema/raw/master/csl-data.json",
          "relationType": "HasMetadata",
          "relatedIdentifier": "https://data.datacite.org/application/citeproc+json/10.5072/example-full",
          "relatedIdentifierType": "URL",
          "relatedMetadataScheme": "citeproc+json"
        },

Which failed with this error:

{
    "errors": [
        {
            "source": "related_identifiers",
            "title": "Object property at `/0/relatedMetadataScheme` is a disallowed additional property",
            "uid": "10.1111/742r-wc63"
        }
    ]
}

Comment on lines +64 to +111
"if": {
"properties": {
"relationType": {
"enum": [ "HasMetadata", "IsMetadataFor" ]
}
}
},
"then": {
"properties": {
"relatedItemIdentifier": {
"type": "object",
"properties": {
"relatedItemIdentifier": {
"type": "string"
},
"relatedItemIdentifierType": {
"$ref": "controlled_vocabularies/related_identifier_type.json"
},
"relatedMetadataScheme": {
"type": "string"
},
"schemeURI": {
"type": "string"
},
"schemeType": {
"type": "string"
}
},
"additionalProperties": false
}
}
},
"else": {
"properties": {
"relatedItemIdentifier": {
"type": "object",
"properties": {
"relatedItemIdentifier": {
"type": "string"
},
"relatedItemIdentifierType": {
"$ref": "controlled_vocabularies/related_identifier_type.json"
}
},
"additionalProperties": false
}
}
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svogt0511 If we update RelatedIdentifier to remove this restriction (see earlier comment), we should also remove it from RelatedItem here.

}
},
"dependentRequired": {
"affiliationIdentifier": ["nameIdentifierScheme"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svogt0511 Should this line be as follows?

"nameIdentifier": ["nameIdentifierScheme"]

As opposed to "affiliationIdentifier": ["nameIdentifierScheme"]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think nameIdentifierSchema is simply required rather than having a dependency on nameIdentifier:

<xs:complexType name="nameIdentifier">
<xs:annotation>
<xs:documentation>Uniquely identifies a creator or contributor, according to various identifier schemes.</xs:documentation>
</xs:annotation>
<xs:simpleContent>
<xs:extension base="nonemptycontentStringType">
<xs:attribute name="nameIdentifierScheme" type="xs:string" use="required"/>
<xs:attribute name="schemeURI" type="xs:anyURI" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameIdentifier also cannot be null since it has nonemptycontentStringType.

Copy link
Contributor

@codycooperross codycooperross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a pass on the JSON Schemas aside from relatedItem—these look great! I made some comments where there might be inconsistencies with the XSD or the current JSON representation.

Will get to relatedItem soon!

}
},
"dependentRequired": {
"affiliationIdentifier": ["nameIdentifierScheme"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think nameIdentifierSchema is simply required rather than having a dependency on nameIdentifier:

<xs:complexType name="nameIdentifier">
<xs:annotation>
<xs:documentation>Uniquely identifies a creator or contributor, according to various identifier schemes.</xs:documentation>
</xs:annotation>
<xs:simpleContent>
<xs:extension base="nonemptycontentStringType">
<xs:attribute name="nameIdentifierScheme" type="xs:string" use="required"/>
<xs:attribute name="schemeURI" type="xs:anyURI" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

}
},
"dependentRequired": {
"affiliationIdentifier": ["nameIdentifierScheme"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameIdentifier also cannot be null since it has nonemptycontentStringType.

{
"title": "publicationYear",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": [ "integer", "string" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary with oneOf below?

"inclusiveMinimum": true,
"inclusiveMaximum": true
}
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should publicationYear be required by this schema?

{
"title": "Version",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": [ "number", "string" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the XSD, this seems like it should just be a string rather than a number or a string:

<xs:element name="version" type="xs:string" minOccurs="0">
<xs:annotation>
<xs:documentation>Version number of the resource. If the primary resource has changed the version number increases.</xs:documentation>
<xs:documentation>Register a new identifier for a major version change. Individual stewards need to determine which are major vs. minor versions. May be used in conjunction with properties 11 and 12 (AlternateIdentifier and RelatedIdentifier) to indicate various information updates. May be used in conjunction with property 17 (Description) to indicate the nature and file/record range of version.</xs:documentation>
</xs:annotation>
</xs:element>

"$ref": "geo_location_polygon.json"
}
},
"required": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these meant to be in the top-level geoLocation schema? They don't seem to point to existing attributes.

"title": "RelatedItems",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"minItems": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a clear pattern with "minItems": 0, but are these necessary?

"schemeUri": {
"type": "string"
},
"schemeType": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe schemeType can be null. It is in this example: https://support.datacite.org/docs/api-create-dois#request-payload-1

"schemeUri": {
"type": "string"
},
"resourceTypeGeneral": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be null, per this example: https://support.datacite.org/docs/api-create-dois#request-payload-1

Since this is a pattern, I'm wondering if we should systematically allow null value for optional properties. What do you think, @codycooperross?

Copy link
Contributor

@codycooperross codycooperross Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the production API currently accepts null values for many fields in my testing, yes, the JSON Schema should permit null values for all fields that are not xs:string or nonemptycontentStringType in the XSD.

"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"name": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting one. I was testing with metadata pulled from some recently created/updated DOIs, to see if this validation would have impacted them.

I found this DOI which in the JSON has a contributor givenName and familyName, but not name: https://api.datacite.org/dois/10.34804/supra.2021092825

"contributors": [
{
"nameType": "Personal",
"givenName": "Jacopo",
"familyName": "Torrisi",
"affiliation": [],
"contributorType": "DataManager",
"nameIdentifiers": [
{
"nameIdentifier": "",
"nameIdentifierScheme": "ORCID"
}
]
}
],

Using this metadata to create a DOI on staging failed with this error:

{
    "errors": [
        {
            "source": "contributors",
            "title": "Object at `/0` is missing required properties: name",
            "uid": "10.1111/742r-wc63"
        }
    ]
}

From my understanding of the XSD, contributorName is required even if givenName and familyName are provided. And the corresponding XML for this DOI does have a contributorName:

<contributors>
    <contributor contributorType="DataManager">
      <contributorName nameType="Personal">Torrisi, Jacopo</contributorName>
      <givenName>Jacopo</givenName>
      <familyName>Torrisi</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI=""/>
      <affiliation affiliationIdentifierScheme="ROR"/>
    </contributor>
  </contributors>

How is that contributorName being generated for the XML? I am just thinking through the potential impact on user who are currently providing not providing contributor.name, but are providing contributor.givenName and contributor.familyName, if we introduce this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @codycooperross for input into this as well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With JSON -> XML, we currently generate a contributorName and creatorName based on available familyName and givenName metadata if available. See the description here: https://docs.google.com/spreadsheets/d/1Hy0KXWPxqNx-Pfh-nNFxbsUFXXVYsO8O2sDIytXQv7U/edit?gid=1806954511#gid=1806954511&range=2:2

For the sake of backwards compatibility with existing request patterns and scoping this PR, let's remove the name requirement on creator and contributor for now. Currently invalid JSON metadata, i.e. metadata that contains no name or familyName metadata, will continue to fail when validated against the XSD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was always very confusing to me.

As I understand it, contributorName is always required.

If nameType = Unknown then Contributor can have a given name, family name and a completely independent name.

If nameType = Organization then Contributor has any name you want to give it. (**Although, if you are converting the type to organization to one of the other types, givenName and familyName are preserved in the DB (and probably in elasticSearch. A topic for another time.)

If nameType = Person, the Name field is generated from the givenName and familyName.

I would not remove the name requirement. I would not back out our requirements to fit incorrect data.

I thought that if there are incorrect dois in the DB, that the user would be prodded to correct them if/when the doi is updated, otherwise, there is not much we could do, except perhaps, to flag them as being incorrect at some point.

Does this make sense?

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bumps activerecord_json_validator to "~> 3.1"; replaces previous DOI JSON-schema validation with many attribute-level json: validations resolved via new schema_file_path helper; introduces Doi::INVALID_SCHEMAS; adds ~60 JSON Schema files (draft-2020-12) and updates validators, tests, and fixtures to new data shapes.

Changes

Cohort / File(s) Summary
Gemfile
Gemfile
Updated activerecord_json_validator version from ~> 2.1", ">= 2.1.5 to ~> 3.1.
Doi model
app/models/doi.rb
Added INVALID_SCHEMAS constant, helper methods (validate_json_attribute?, schema_file_path, raw_language, raw_types) and many validates ... json: rules referencing schemas via schema_file_path.
XML validator
app/validators/xml_schema_validator.rb
Replaced inline invalid-schema list with Doi::INVALID_SCHEMAS and returns false after adding an invalid-schema error.
DOI JSON Schemas (core)
app/models/schemas/doi/*.json, app/models/schemas/doi/*/*.json (e.g. creator.json, creators.json, title.json, titles.json, publication_year.json, resource_type.json, related_item.json, related_identifier.json, related_item_identifier.json, description.json, subject.json, rights.json, funding_reference.json, geo schemas, identifier/point/polygon files, etc.)
Added ~60+ draft-2020-12 JSON Schema files defining DOI metadata shapes, conditional rules, dependentRequired constraints, and $ref links used by the new per-attribute validations.
Controlled vocabularies
app/models/schemas/doi/controlled_vocabularies/*.json (e.g. name_type.json, contributor_type.json, date_type.json, description_type.json, related_identifier_type.json, relation_type.json, resource_type_general.json, funder_identifier_type.json, title_type.json, number_type.json)
Added enum-based nullable controlled-vocabulary schemas referenced by DOI schemas.
Client schema update
app/models/schemas/client/subjects.json
Upgraded $schema to draft-2020-12 and changed lang from ["string","null"] to a $ref to ../doi/language.json.
Tests & fixtures
spec/**/*, spec/fixtures/**/*, spec/factories/doi.rb, spec/fixtures/vcr_cassettes/.../error.yml
Adjusted tests and fixtures to new shapes: convert scalar/object fields to arrays (titles, descriptions, contributors, affiliations, nameIdentifiers), add descriptionType/name fields, change funding reference keys, normalize geo numeric formatting, update some expected statuses/errors, and refresh a VCR cassette.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely describes the main change: enforcing data structure and data type consistency for JSON metadata, which aligns with the core objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pb325-json-metadata-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
spec/requests/datacite_dois/post_geo_spec.rb (1)

140-144: ⚠️ Potential issue | 🟠 Major

Payload still uses string numerics, so type-consistency validation is not being strictly exercised.

Several request fixtures send numeric geo fields as strings (e.g., Line 142, Line 143, Line 344, Line 346), while expectations assert numeric output. This can pass via coercion and weaken the “enforce data type consistency” objective. Use numeric literals in valid payloads, and add a negative case that expects 422 for string-typed numerics.

Proposed fix (representative)
- "pointLatitude" => "49.0850736",
- "pointLongitude" => "-123.3300992"
+ "pointLatitude" => 49.0850736,
+ "pointLongitude" => -123.3300992

- "eastBoundLongitude" => "+123.",
+ "eastBoundLongitude" => 123.0,
- "southBoundLatitude" => "-40",
+ "southBoundLatitude" => -40.0

Also applies to: 233-237, 339-347, 507-515, 772-780, 829-837, 886-894

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/requests/datacite_dois/post_geo_spec.rb` around lines 140 - 144, The
spec payloads use string numerics for geo fields (e.g., "geoLocationPoint" with
"pointLatitude"/"pointLongitude") which masks type validation; update the valid
fixtures in spec/requests/datacite_dois/post_geo_spec.rb to use numeric literals
for pointLatitude and pointLongitude (and other listed fixtures at the
referenced ranges) so expectations assert true numeric types, and add a new
negative example in the same spec that submits pointLatitude/pointLongitude as
strings and asserts a 422 response; locate and update the payloads that
construct "geoLocationPoint" and add the failing-case test alongside existing
positive cases.
♻️ Duplicate comments (12)
app/models/schemas/doi/resource_type.json (1)

9-11: ⚠️ Potential issue | 🟠 Major

resourceType is over-constrained in required.

Line 10 makes resourceType mandatory, which can incorrectly fail valid payloads that provide only resourceTypeGeneral. Keep only resourceTypeGeneral in required.

Proposed fix
   "required": [
-    "resourceType",
     "resourceTypeGeneral"
   ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/resource_type.json` around lines 9 - 11, The schema's
required array currently includes "resourceType" and "resourceTypeGeneral",
which wrongly makes resourceType mandatory; update the required array in
resource_type.json to only include "resourceTypeGeneral" (remove "resourceType")
so payloads that supply only resourceTypeGeneral validate correctly.
app/models/schemas/doi/date.json (1)

12-14: ⚠️ Potential issue | 🟠 Major

Allow dateInformation to be nullable to avoid rejecting valid payloads.

Line 12 through Line 14 currently restricts dateInformation to string, which can break requests where dateInformation is null (already flagged in prior review discussion).

Suggested schema adjustment
     "dateInformation": {
-      "type": "string"
+      "type": ["string", "null"]
     }

To verify impact quickly, search for existing null usage in fixtures/spec payloads:

#!/bin/bash
rg -nP '"dateInformation"\s*:\s*null|dateInformation:\s*null'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/date.json` around lines 12 - 14, The schema currently
forces "dateInformation" to be a string and rejects payloads where
dateInformation is null; update the dateInformation schema to accept null as
well (e.g., change the type to allow both "string" and "null" or use an anyOf
with string and null) so existing fixtures/requests with null pass validation;
update the "dateInformation" entry accordingly and run the provided grep search
to verify existing fixtures/specs that use null.
app/models/schemas/doi/version.json (1)

4-4: ⚠️ Potential issue | 🟠 Major

version should stay string-only.

Line 4 currently allows numeric values, which weakens type consistency for this field and conflicts with previously raised schema expectations.

Suggested fix
-  "type": [ "number", "string" ]
+  "type": "string"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/version.json` at line 4, The schema's "version" field
currently allows both number and string types; change the "version" property in
the JSON schema (the "version" key in the DOI schema) to be string-only by
removing "number" from its "type" entry (i.e., make the type either a single
"string" or an array containing only "string") so the field enforces string
values consistently.
app/models/schemas/doi/affiliation.json (1)

19-21: ⚠️ Potential issue | 🟠 Major

Remove dependentRequired to stay aligned with XSD-compatible behavior.

This dependency was previously called out as inconsistent with the XSD contract. Keeping it here can reject records that should remain valid.

Suggested fix
-  "dependentRequired": {
-    "affiliationIdentifier": ["affiliationIdentifierScheme"]
-  },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/affiliation.json` around lines 19 - 21, Remove the
"dependentRequired" constraint from the affiliation schema to match
XSD-compatible behavior: delete the dependentRequired block that ties
"affiliationIdentifier" to "affiliationIdentifierScheme" in
app/models/schemas/doi/affiliation.json so the schema no longer rejects records
that omit the scheme; ensure no other code relies on the "dependentRequired"
rule for affiliationIdentifier before committing.
app/models/schemas/doi/name_identifier.json (1)

9-18: ⚠️ Potential issue | 🟠 Major

Fix dependency wiring: Line 17 references a non-existent property.

Line 17 uses affiliationIdentifier, which is not defined in this schema, so the dependency won’t fire. Also, allowing null for identifier fields weakens consistency checks.

Suggested fix
   "properties": {
     "schemeUri": {
       "type": ["string", "null"]
     },
     "nameIdentifier": {
-      "type": ["string", "null"]
+      "type": "string",
+      "minLength": 1
     },
     "nameIdentifierScheme": {
-      "type": ["string", "null"]
+      "type": "string",
+      "minLength": 1
     }
   },
   "dependentRequired": {
-    "affiliationIdentifier": ["nameIdentifierScheme"]
+    "nameIdentifier": ["nameIdentifierScheme"]
   },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/name_identifier.json` around lines 9 - 18, The
dependentRequired currently references a non-existent property
affiliationIdentifier; change the dependency to reference the actual field
nameIdentifier (i.e., set dependentRequired: { "nameIdentifier":
["nameIdentifierScheme"] }) so the dependency fires correctly, and tighten
validation by removing "null" from the type arrays for nameIdentifier and
nameIdentifierScheme (make them plain "string") so identifier fields must be
non-null strings.
app/models/schemas/doi/geo_location.json (1)

19-23: ⚠️ Potential issue | 🔴 Critical

Top-level required fields are incorrect and make this schema invalid in practice.

Line 19-22 requires pointLatitude/pointLongitude, but those keys only exist inside geoLocationPoint. With Line 23 (additionalProperties: false), valid payloads will fail.

Proposed fix
-  "required": [
-    "pointLatitude",
-    "pointLongitude"
-  ],
+  "anyOf": [
+    { "required": ["geoLocationPoint"] },
+    { "required": ["geoLocationBox"] },
+    { "required": ["geoLocationPlace"] },
+    { "required": ["geoLocationPolygon"] }
+  ],
   "additionalProperties": false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/geo_location.json` around lines 19 - 23, The schema's
top-level "required" list wrongly requires pointLatitude and pointLongitude
(which are nested under geoLocationPoint) and with "additionalProperties": false
will reject valid documents; update the schema so the top-level "required"
contains the correct top-level keys (e.g., "geoLocationPoint" if that object is
mandatory) and move/ensure "pointLatitude" and "pointLongitude" are required
inside the geoLocationPoint subschema (or remove top-level additionalProperties
restriction if you intend to allow those nested properties at top level);
reference the "required", "additionalProperties", "geoLocationPoint",
"pointLatitude", and "pointLongitude" symbols when making the change.
app/models/schemas/doi/contributor.json (1)

18-41: ⚠️ Potential issue | 🟠 Major

Required fields are inverted relative to prior decisions (contributorType should be required; name should remain optional for compatibility).

Line 40 currently requires name, while contributorType is optional/nullable. This reintroduces the previously discussed compatibility risk and misses the prior requirement for contributorType.

Suggested fix
-    "contributorType": {
-      "oneOf": [
-        {
-            "$ref": "controlled_vocabularies/contributor_type.json"
-        },
-        {
-          "type": "null"
-        }
-      ]
-    },
+    "contributorType": {
+      "$ref": "controlled_vocabularies/contributor_type.json"
+    },
@@
-  "required": [
-    "name"
-  ]
+  "required": [
+    "contributorType"
+  ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/contributor.json` around lines 18 - 41, The schema
currently lists "name" as required and leaves "contributorType"
optional/nullable; change the required fields so "contributorType" is required
and "name" is optional to match prior decisions. Edit the JSON schema's
"required" array to include "contributorType" (and remove "name"), and verify
the "contributorType" definition (the oneOf ref to
controlled_vocabularies/contributor_type.json and the null option) is consistent
with whether null is allowed when required; adjust if you need to enforce
non-null values for contributorType in the "contributorType" oneOf definition.
app/models/schemas/doi/related_item.json (1)

64-111: ⚠️ Potential issue | 🟠 Major

Remove conditional restriction to maintain XSD parity.

Per past review feedback, this conditional restriction for HasMetadata/IsMetadataFor should be removed since it's not enforced in the XSD. This keeps validation behavior consistent between XML and JSON submissions.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/related_item.json` around lines 64 - 111, Remove the
JSON Schema conditional that branches on relationType
("HasMetadata"/"IsMetadataFor") — specifically delete the entire
"if"/"then"/"else" block and replace it with a single unconditional definition
for the relatedItemIdentifier property that includes the union of fields
currently only allowed in the "then" branch (relatedItemIdentifier,
relatedItemIdentifierType, relatedMetadataScheme, schemeURI, schemeType) while
keeping type "object", the existing $ref for relatedItemIdentifierType, and
"additionalProperties": false so the schema matches the XSD behavior; update any
references to relationType-based gating so relatedItemIdentifier is always
validated with the extended properties.
app/models/schemas/doi/related_identifier.json (2)

25-57: ⚠️ Potential issue | 🟠 Major

Conditional restriction and additionalProperties: false conflict.

Two issues:

  1. Per past review feedback, the conditional restriction for HasMetadata/IsMetadataFor should be removed to maintain parity with the XSD.

  2. The then block defines relatedMetadataScheme, schemeUri, and schemeType properties, but these are not in the main properties block. Combined with additionalProperties: false at line 57, this will reject valid payloads containing these fields even when relationType is HasMetadata or IsMetadataFor.

Consider moving schemeType to the main properties block (alongside the corrected relatedMetadataScheme and schemeUri) and removing the conditional logic entirely.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/related_identifier.json` around lines 25 - 57, Remove
the conditional "if"/"then"/"else" block entirely and instead declare
relatedMetadataScheme, schemeUri, and schemeType as top-level properties in the
schema's main "properties" object (each as "type": "string"), leaving
dependentRequired and the existing required relationships (e.g.
relatedIdentifier -> relatedIdentifierType, relationType) intact; this ensures
payloads with those fields are accepted while keeping "additionalProperties":
false and matching the XSD parity (also keep relationType and
relatedIdentifier-related fields as before).

15-17: ⚠️ Potential issue | 🔴 Critical

Typo: relatedMetaDataScheme should be relatedMetadataScheme.

The property name uses a capital "D" (relatedMetaDataScheme) but the conditional blocks at lines 34 and 48 use lowercase "d" (relatedMetadataScheme). This mismatch will cause validation failures when this field is provided.

🐛 Proposed fix
-    "relatedMetaDataScheme": {
+    "relatedMetadataScheme": {
       "type": "string"
     },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/related_identifier.json` around lines 15 - 17, The
JSON schema property "relatedMetaDataScheme" is misspelled (capital "D") and
must be renamed to "relatedMetadataScheme" to match the conditional checks
elsewhere; update the key in app/models/schemas/doi/related_identifier.json from
"relatedMetaDataScheme" to "relatedMetadataScheme" and search/update any other
occurrences (e.g., in validation/conditional blocks that reference
relatedMetadataScheme) so the schema and all references use the exact same
identifier.
app/models/schemas/doi/formats.json (1)

5-7: ⚠️ Potential issue | 🟠 Major

Incorrect $ref - references size.json instead of format.json.

This schema is named "Formats" but the items reference size.json. This appears to be a copy-paste error from sizes.json. Should reference format.json to match the semantic intent.

🐛 Proposed fix
   "items": {
-    "$ref": "size.json"
+    "$ref": "format.json"
   }

Run the following script to verify if format.json exists:

#!/bin/bash
# Check if format.json exists alongside size.json
fd -t f 'format.json' app/models/schemas/doi/
fd -t f 'size.json' app/models/schemas/doi/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/formats.json` around lines 5 - 7, The "Formats"
schema's items currently reference "size.json" (copy-paste error); update the
"items" $ref in app/models/schemas/doi/formats.json from "size.json" to
"format.json" so the schema correctly points to the format definition, and
confirm that format.json exists alongside size.json; adjust any callers
expecting the old reference if needed (look for the "Formats" schema and its
"items" property to locate the change).
app/models/doi.rb (1)

125-127: ⚠️ Potential issue | 🟠 Major

Use method-safe attribute access in validate_json_attribute?.

The guard currently relies on self[attribute], which fails for method-backed fields like raw_language and raw_types. These methods return self[:language] and self[:types] respectively, but the current check attempts direct column access via self[:raw_language] and self[:raw_types], which don't exist as columns. This breaks validation for these attributes. Use public_send with has_attribute? fallback to properly access both method-backed and column-backed attributes.

Proposed fix
 def validate_json_attribute?(attribute)
-  validatable? && !self[attribute].nil? && !INVALID_SCHEMAS.include?(self.schema_version)
+  value =
+    if respond_to?(attribute)
+      public_send(attribute)
+    elsif has_attribute?(attribute)
+      self[attribute]
+    end
+
+  validatable? && value.present? && !INVALID_SCHEMAS.include?(schema_version)
 end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/doi.rb` around lines 125 - 127, The guard in
validate_json_attribute?(attribute) uses self[attribute] which fails for
method-backed fields like raw_language and raw_types; change the nil-check to
obtain the attribute value via public_send(attribute) and fall back to column
access with has_attribute?(attribute) ? self[attribute] : public_send(attribute)
(or vice versa) so both method-backed and column-backed attributes are handled,
then use that value in the existing validatable? and INVALID_SCHEMAS checks;
keep the method name validate_json_attribute?, and preserve the schema/version
guard (INVALID_SCHEMAS and validatable?).
🧹 Nitpick comments (6)
spec/requests/datacite_dois/post_geo_spec.rb (2)

304-1129: Geo test fixtures are heavily duplicated across contexts.

The repeated geo payload/expectation blocks are costly to maintain and easy to drift. Extract shared builders (base_attributes, geo_location_box, geo_location_polygon) and use parameterized/shared examples per geo variant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/requests/datacite_dois/post_geo_spec.rb` around lines 304 - 1129, The
spec repeats large geo payloads inside multiple context blocks and
let(:valid_attributes); refactor by extracting shared builders (e.g. define
helpers like base_attributes, geo_location_box, geo_location_point,
geo_location_polygon) and use them inside the existing contexts that call
let(:valid_attributes) and the tests that assert json.dig("data", "attributes",
"geoLocations"), and replace duplicated expectation arrays with shared_examples
or parameterized examples that accept the geo variant (single geo, all
properties, array of geo locations); update contexts that reference geoLocations
to call the shared builders and the shared_examples to reduce duplication and
keep assertions DRY.

300-300: XML geoLocationBox assertions are brittle due to concatenated string matching.

Comparing the full concatenated text (including sign/format artifacts) is fragile. Prefer asserting each child node value independently (or parse as Float) to avoid failures from harmless serialization/order formatting changes.

Also applies to: 731-731

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/requests/datacite_dois/post_geo_spec.rb` at line 300, The test currently
asserts the entire concatenated string from doc.at_css("geoLocations").content
which is brittle; instead locate the individual child nodes (e.g., the
geoLocationBox children like westBoundLongitude, eastBoundLongitude,
southBoundLatitude, northBoundLatitude) using doc.at_css or doc.css and assert
each node's text independently (or parse each to Float and compare with expected
values using exact equals or a small delta) to avoid failures from
formatting/ordering changes; update the two failing assertions (the current
expect on "geoLocations" and the similar one around line 731) to perform
per-child assertions on those specific element names.
app/models/schemas/doi/controlled_vocabularies/number_type.json (1)

4-18: Simplify duplicated constraints to reduce schema drift risk.

The current schema enforces type boundaries twice (line 4 with type: ["string", "null"] and lines 5–18 with anyOf), which is valid but increases maintenance burden. Per JSON Schema Draft 2020-12, enum arrays legally contain mixed types (including null), making a single enum with all allowed values functionally equivalent and clearer.

♻️ Proposed simplification
 {
   "title": "NumberType",
   "$schema": "https://json-schema.org/draft/2020-12/schema",
-  "type": ["string", "null"],
-  "anyOf": [
-    {
-      "type": "string",
-      "enum": [
-        "Article",
-        "Chapter",
-        "Report",
-        "Other"
-      ]
-    },
-    {
-      "type": "null"
-    }
-  ]
+  "enum": [
+    "Article",
+    "Chapter",
+    "Report",
+    "Other",
+    null
+  ]
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/controlled_vocabularies/number_type.json` around lines
4 - 18, Remove the duplicated type constraints in the number_type.json schema by
replacing the top-level "type": ["string","null"] and the "anyOf" block with a
single "enum" that lists the allowed string values plus null (e.g.,
["Article","Chapter","Report","Other", null]); update the schema so only the
"enum" remains (remove the "type" and "anyOf" properties) to simplify validation
and avoid drift.
app/models/schemas/doi/in_polygon_point.json (1)

5-17: Consider deduplicating this schema with polygon_point.json.

This file currently duplicates the same object contract, which can drift over time. Reusing a single shared schema would reduce maintenance risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/in_polygon_point.json` around lines 5 - 17, The schema
in in_polygon_point.json duplicates polygon_point.json; replace the duplicated
object with a single shared reference by removing the local
"properties"/"required"/"additionalProperties" block and instead $ref'ing the
existing polygon_point.json (or extract the shared object to a new
polygon_point.json if needed), ensuring the schema still validates pointLatitude
and pointLongitude and preserving additionalProperties: false; update any
consumers to reference the single shared schema to avoid drift.
app/models/schemas/doi/controlled_vocabularies/title_type.json (1)

4-13: Consider simplifying nullable enum pattern.

The combination of type: ["string", "null"] with anyOf is redundant. JSON Schema 2020-12 supports a cleaner pattern for nullable enums.

♻️ Proposed simplification
 {
   "title": "TitleType",
   "$schema": "https://json-schema.org/draft/2020-12/schema",
-  "type": ["string", "null"],
-  "anyOf": [
-    {
-      "type": "string",
-      "enum": ["AlternativeTitle", "Subtitle", "TranslatedTitle", "Other"]
-    },
-    {
-      "type": "null"
-    }
-  ]
+  "enum": ["AlternativeTitle", "Subtitle", "TranslatedTitle", "Other", null]
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/controlled_vocabularies/title_type.json` around lines
4 - 13, The schema for title_type.json uses a redundant combination of "type":
["string","null"] and an anyOf block; replace the anyOf with a single "enum"
that includes null so the schema becomes a single concise definition: keep the
enum values "AlternativeTitle", "Subtitle", "TranslatedTitle", "Other" and add
null to the enum, removing the anyOf block (reference the title_type.json schema
and its enum values to locate where to change).
spec/requests/datacite_dois/post_spec.rb (1)

5-5: Remove debug-only pp require from request specs.

This looks like leftover debugging and adds noise to the test file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/requests/datacite_dois/post_spec.rb` at line 5, Remove the debug-only
require by deleting the literal require "pp" from the spec file (post_spec.rb)
so the test suite no longer includes the pretty-print debug dependency; update
the file to rely on standard test helpers and run the request specs to confirm
no behavior changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/doi.rb`:
- Line 175: The validator is registering :geolocations but the model's attribute
is :geo_locations, so update the validation call in DOI to use :geo_locations
(and also update the schema name passed to schema_file_path from "geolocations"
to "geo_locations" if your JSON schema file is named accordingly) so that
validate_json_attribute?(:geo_locations) and the JSON schema validation for
geolocation run correctly; look for the validates line in app/models/doi.rb and
adjust the attribute symbol and schema_file_path argument to the consistent
"geo_locations" key.
- Around line 5-6: Remove the top-level debugger and stdlib requires from the
DOI model: delete require "byebug" and require "pp" from the file that defines
the Doi class, and instead add debugger gems to the development/test groups in
the Gemfile or require them only in scripts/console/debugging entrypoints; if
pretty-printing is needed in code, require "pp" locally inside the specific
method or use Rails.logger/inspect for production-safe logging. Ensure no other
code in the Doi class (class Doi) depends on those top-level requires so the
model can load without those gems present.

In `@app/models/schemas/client/subjects.json`:
- Line 17: The subjects.json schema is still using draft-04 while it $ref's
../doi/language.json (draft-2020-12), causing cross-draft validation issues;
update the local schema by setting the top-level "$schema" in subjects.json to
"https://json-schema.org/draft/2020-12/schema" and then validate and adjust any
draft-04-specific keywords/usages in subjects.json to be compliant with
draft-2020-12 so the "$ref": "../doi/language.json" resolves and validates
consistently.

In `@app/models/schemas/doi/controlled_vocabularies/contributor_type.json`:
- Around line 4-36: The shared controlled vocabulary schema currently allows
null by including "null" in the "type" array and a null branch in "anyOf";
remove the nullability from this shared enum schema by deleting "null" from the
"type" definition and removing the {"type":"null"} branch in the "anyOf" so the
schema only permits the enumerated strings (leave the "enum" list intact), and
update any parent schemas that need to allow null to model that nullability at
the parent property level instead of in contributor_type.json.

In `@app/models/schemas/doi/geo_location_polygon.json`:
- Around line 8-9: The polygon schema contains unresolved $ref values
"polygon_point" and "in_polygon_point" (and the other similar refs later in the
file) that lack the ".json" suffix used elsewhere; update those $ref entries in
app/models/schemas/doi/geo_location_polygon.json (e.g., the entries currently
set to { "$ref": "polygon_point" } and { "$ref": "in_polygon_point" } and the
two similar refs further down) to use the correct filenames with ".json" (for
example "polygon_point.json" and "in_polygon_point.json") so reference
resolution and polygon validation succeed.

In `@app/models/schemas/doi/point_latitude.json`:
- Line 4: Update the numeric types and bounds for geographic coordinate schemas:
change point_latitude.json so the "type" is numeric-only (remove "string") and
add numeric constraints "minimum": -90 and "maximum": 90, and likewise change
point_longitude.json to type number with "minimum": -180 and "maximum": 180;
ensure these files remain referenced by GeoLocationPoint, PolygonPoint,
InPolygonPoint, and GeoLocationBox so the stricter numeric-only validation and
coordinate bounds apply across those schemas.

In `@app/models/schemas/doi/point_longitude.json`:
- Line 4: The longitude schema in point_longitude.json currently allows any
number or string because it only sets "type": ["number","string"]; update the
schema so numeric values are constrained to minimum: -180 and maximum: 180 and
string values are allowed only if they represent a numeric value within
[-180,180] (e.g. replace the loose "type" with an anyOf/oneOf that has one
branch for type: "number" with minimum -180 and maximum 180 and another branch
for type: "string" with a regex pattern that only matches numeric strings in the
same range); this will ensure point_longitude.json enforces valid longitude
bounds while preserving string-compatible input.

In `@app/models/schemas/doi/publication_year.json`:
- Around line 12-18: The JSON Schema in publication_year.json uses invalid
keywords inclusiveMinimum and inclusiveMaximum (they're not part of draft
2020-12); remove those two properties from the schema so that the existing
minimum and maximum (1000 and 3000) remain as inclusive bounds, or if you
intended exclusive bounds replace them with exclusiveMinimum: true and/or
exclusiveMaximum: true; update the object that defines the "type": "integer",
"minimum", and "maximum" accordingly.

In `@app/models/schemas/doi/related_item_identifier.json`:
- Around line 15-17: The property name in the JSON schema is using inconsistent
casing: "schemeURI" should be renamed to "schemeUri" to match the rest of the
API; update the property key in this schema from schemeURI to schemeUri, and
search for and update any references or example fixtures/validators that expect
"schemeURI" so they use "schemeUri" instead to maintain consistency across
schemas and avoid validation failures.

In
`@spec/fixtures/vcr_cassettes/DataciteDoisController/PATCH_/dois/_id/when_the_title_is_changed_wrong_format/error.yml`:
- Around line 22-33: The recorded cassette for the
"when_the_title_is_changed_wrong_format" scenario now contains an authentication
failure (code: 401 and a Www-Authenticate header) instead of the expected
format-validation failure; update the fixture so the PATCH /dois/_id cassette
returns the validation error (e.g., 422 or the API's format-error code) with the
validation message payload rather than the 401 Unauthorized response—either
re-record the cassette while ensuring requests include valid auth or manually
replace the error.yml contents (remove the Www-Authenticate header and set code
and message to the format-validation error) so tests assert metadata validation
behavior.

In `@spec/requests/datacite_dois/post_spec.rb`:
- Around line 2684-2688: The before hook is turning off VCR and allowing net
connections (VCR.eject_cassette, VCR.turn_off!, WebMock.allow_net_connect!),
which leaks global test state; replace the before block with an around hook that
saves current VCR and WebMock state, yields the example, and then restores VCR
(re-enable and re-insert cassette if needed) and WebMock settings after the
example so global state is always restored even on failure — update the example
group to use an around hook that wraps the example execution and toggles
VCR.turn_off!/VCR.turn_on! and
WebMock.allow_net_connect!/WebMock.disable_net_connect! symmetrically.

---

Outside diff comments:
In `@spec/requests/datacite_dois/post_geo_spec.rb`:
- Around line 140-144: The spec payloads use string numerics for geo fields
(e.g., "geoLocationPoint" with "pointLatitude"/"pointLongitude") which masks
type validation; update the valid fixtures in
spec/requests/datacite_dois/post_geo_spec.rb to use numeric literals for
pointLatitude and pointLongitude (and other listed fixtures at the referenced
ranges) so expectations assert true numeric types, and add a new negative
example in the same spec that submits pointLatitude/pointLongitude as strings
and asserts a 422 response; locate and update the payloads that construct
"geoLocationPoint" and add the failing-case test alongside existing positive
cases.

---

Duplicate comments:
In `@app/models/doi.rb`:
- Around line 125-127: The guard in validate_json_attribute?(attribute) uses
self[attribute] which fails for method-backed fields like raw_language and
raw_types; change the nil-check to obtain the attribute value via
public_send(attribute) and fall back to column access with
has_attribute?(attribute) ? self[attribute] : public_send(attribute) (or vice
versa) so both method-backed and column-backed attributes are handled, then use
that value in the existing validatable? and INVALID_SCHEMAS checks; keep the
method name validate_json_attribute?, and preserve the schema/version guard
(INVALID_SCHEMAS and validatable?).

In `@app/models/schemas/doi/affiliation.json`:
- Around line 19-21: Remove the "dependentRequired" constraint from the
affiliation schema to match XSD-compatible behavior: delete the
dependentRequired block that ties "affiliationIdentifier" to
"affiliationIdentifierScheme" in app/models/schemas/doi/affiliation.json so the
schema no longer rejects records that omit the scheme; ensure no other code
relies on the "dependentRequired" rule for affiliationIdentifier before
committing.

In `@app/models/schemas/doi/contributor.json`:
- Around line 18-41: The schema currently lists "name" as required and leaves
"contributorType" optional/nullable; change the required fields so
"contributorType" is required and "name" is optional to match prior decisions.
Edit the JSON schema's "required" array to include "contributorType" (and remove
"name"), and verify the "contributorType" definition (the oneOf ref to
controlled_vocabularies/contributor_type.json and the null option) is consistent
with whether null is allowed when required; adjust if you need to enforce
non-null values for contributorType in the "contributorType" oneOf definition.

In `@app/models/schemas/doi/date.json`:
- Around line 12-14: The schema currently forces "dateInformation" to be a
string and rejects payloads where dateInformation is null; update the
dateInformation schema to accept null as well (e.g., change the type to allow
both "string" and "null" or use an anyOf with string and null) so existing
fixtures/requests with null pass validation; update the "dateInformation" entry
accordingly and run the provided grep search to verify existing fixtures/specs
that use null.

In `@app/models/schemas/doi/formats.json`:
- Around line 5-7: The "Formats" schema's items currently reference "size.json"
(copy-paste error); update the "items" $ref in
app/models/schemas/doi/formats.json from "size.json" to "format.json" so the
schema correctly points to the format definition, and confirm that format.json
exists alongside size.json; adjust any callers expecting the old reference if
needed (look for the "Formats" schema and its "items" property to locate the
change).

In `@app/models/schemas/doi/geo_location.json`:
- Around line 19-23: The schema's top-level "required" list wrongly requires
pointLatitude and pointLongitude (which are nested under geoLocationPoint) and
with "additionalProperties": false will reject valid documents; update the
schema so the top-level "required" contains the correct top-level keys (e.g.,
"geoLocationPoint" if that object is mandatory) and move/ensure "pointLatitude"
and "pointLongitude" are required inside the geoLocationPoint subschema (or
remove top-level additionalProperties restriction if you intend to allow those
nested properties at top level); reference the "required",
"additionalProperties", "geoLocationPoint", "pointLatitude", and
"pointLongitude" symbols when making the change.

In `@app/models/schemas/doi/name_identifier.json`:
- Around line 9-18: The dependentRequired currently references a non-existent
property affiliationIdentifier; change the dependency to reference the actual
field nameIdentifier (i.e., set dependentRequired: { "nameIdentifier":
["nameIdentifierScheme"] }) so the dependency fires correctly, and tighten
validation by removing "null" from the type arrays for nameIdentifier and
nameIdentifierScheme (make them plain "string") so identifier fields must be
non-null strings.

In `@app/models/schemas/doi/related_identifier.json`:
- Around line 25-57: Remove the conditional "if"/"then"/"else" block entirely
and instead declare relatedMetadataScheme, schemeUri, and schemeType as
top-level properties in the schema's main "properties" object (each as "type":
"string"), leaving dependentRequired and the existing required relationships
(e.g. relatedIdentifier -> relatedIdentifierType, relationType) intact; this
ensures payloads with those fields are accepted while keeping
"additionalProperties": false and matching the XSD parity (also keep
relationType and relatedIdentifier-related fields as before).
- Around line 15-17: The JSON schema property "relatedMetaDataScheme" is
misspelled (capital "D") and must be renamed to "relatedMetadataScheme" to match
the conditional checks elsewhere; update the key in
app/models/schemas/doi/related_identifier.json from "relatedMetaDataScheme" to
"relatedMetadataScheme" and search/update any other occurrences (e.g., in
validation/conditional blocks that reference relatedMetadataScheme) so the
schema and all references use the exact same identifier.

In `@app/models/schemas/doi/related_item.json`:
- Around line 64-111: Remove the JSON Schema conditional that branches on
relationType ("HasMetadata"/"IsMetadataFor") — specifically delete the entire
"if"/"then"/"else" block and replace it with a single unconditional definition
for the relatedItemIdentifier property that includes the union of fields
currently only allowed in the "then" branch (relatedItemIdentifier,
relatedItemIdentifierType, relatedMetadataScheme, schemeURI, schemeType) while
keeping type "object", the existing $ref for relatedItemIdentifierType, and
"additionalProperties": false so the schema matches the XSD behavior; update any
references to relationType-based gating so relatedItemIdentifier is always
validated with the extended properties.

In `@app/models/schemas/doi/resource_type.json`:
- Around line 9-11: The schema's required array currently includes
"resourceType" and "resourceTypeGeneral", which wrongly makes resourceType
mandatory; update the required array in resource_type.json to only include
"resourceTypeGeneral" (remove "resourceType") so payloads that supply only
resourceTypeGeneral validate correctly.

In `@app/models/schemas/doi/version.json`:
- Line 4: The schema's "version" field currently allows both number and string
types; change the "version" property in the JSON schema (the "version" key in
the DOI schema) to be string-only by removing "number" from its "type" entry
(i.e., make the type either a single "string" or an array containing only
"string") so the field enforces string values consistently.

---

Nitpick comments:
In `@app/models/schemas/doi/controlled_vocabularies/number_type.json`:
- Around line 4-18: Remove the duplicated type constraints in the
number_type.json schema by replacing the top-level "type": ["string","null"] and
the "anyOf" block with a single "enum" that lists the allowed string values plus
null (e.g., ["Article","Chapter","Report","Other", null]); update the schema so
only the "enum" remains (remove the "type" and "anyOf" properties) to simplify
validation and avoid drift.

In `@app/models/schemas/doi/controlled_vocabularies/title_type.json`:
- Around line 4-13: The schema for title_type.json uses a redundant combination
of "type": ["string","null"] and an anyOf block; replace the anyOf with a single
"enum" that includes null so the schema becomes a single concise definition:
keep the enum values "AlternativeTitle", "Subtitle", "TranslatedTitle", "Other"
and add null to the enum, removing the anyOf block (reference the
title_type.json schema and its enum values to locate where to change).

In `@app/models/schemas/doi/in_polygon_point.json`:
- Around line 5-17: The schema in in_polygon_point.json duplicates
polygon_point.json; replace the duplicated object with a single shared reference
by removing the local "properties"/"required"/"additionalProperties" block and
instead $ref'ing the existing polygon_point.json (or extract the shared object
to a new polygon_point.json if needed), ensuring the schema still validates
pointLatitude and pointLongitude and preserving additionalProperties: false;
update any consumers to reference the single shared schema to avoid drift.

In `@spec/requests/datacite_dois/post_geo_spec.rb`:
- Around line 304-1129: The spec repeats large geo payloads inside multiple
context blocks and let(:valid_attributes); refactor by extracting shared
builders (e.g. define helpers like base_attributes, geo_location_box,
geo_location_point, geo_location_polygon) and use them inside the existing
contexts that call let(:valid_attributes) and the tests that assert
json.dig("data", "attributes", "geoLocations"), and replace duplicated
expectation arrays with shared_examples or parameterized examples that accept
the geo variant (single geo, all properties, array of geo locations); update
contexts that reference geoLocations to call the shared builders and the
shared_examples to reduce duplication and keep assertions DRY.
- Line 300: The test currently asserts the entire concatenated string from
doc.at_css("geoLocations").content which is brittle; instead locate the
individual child nodes (e.g., the geoLocationBox children like
westBoundLongitude, eastBoundLongitude, southBoundLatitude, northBoundLatitude)
using doc.at_css or doc.css and assert each node's text independently (or parse
each to Float and compare with expected values using exact equals or a small
delta) to avoid failures from formatting/ordering changes; update the two
failing assertions (the current expect on "geoLocations" and the similar one
around line 731) to perform per-child assertions on those specific element
names.

In `@spec/requests/datacite_dois/post_spec.rb`:
- Line 5: Remove the debug-only require by deleting the literal require "pp"
from the spec file (post_spec.rb) so the test suite no longer includes the
pretty-print debug dependency; update the file to rely on standard test helpers
and run the request specs to confirm no behavior changed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7c92060 and 38f9fbd.

⛔ Files ignored due to path filters (1)
  • Gemfile.lock is excluded by !**/*.lock
📒 Files selected for processing (72)
  • Gemfile
  • app/models/doi.rb
  • app/models/schemas/client/subjects.json
  • app/models/schemas/doi/affiliation.json
  • app/models/schemas/doi/affiliations.json
  • app/models/schemas/doi/alternate_identifier.json
  • app/models/schemas/doi/alternate_identifiers.json
  • app/models/schemas/doi/contributor.json
  • app/models/schemas/doi/contributors.json
  • app/models/schemas/doi/controlled_vocabularies/contributor_type.json
  • app/models/schemas/doi/controlled_vocabularies/date_type.json
  • app/models/schemas/doi/controlled_vocabularies/description_type.json
  • app/models/schemas/doi/controlled_vocabularies/funder_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/name_type.json
  • app/models/schemas/doi/controlled_vocabularies/number_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_item_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/controlled_vocabularies/title_type.json
  • app/models/schemas/doi/creator.json
  • app/models/schemas/doi/creators.json
  • app/models/schemas/doi/date.json
  • app/models/schemas/doi/dates.json
  • app/models/schemas/doi/description.json
  • app/models/schemas/doi/descriptions.json
  • app/models/schemas/doi/format.json
  • app/models/schemas/doi/formats.json
  • app/models/schemas/doi/funding_reference.json
  • app/models/schemas/doi/funding_references.json
  • app/models/schemas/doi/geo_location.json
  • app/models/schemas/doi/geo_location_box.json
  • app/models/schemas/doi/geo_location_place.json
  • app/models/schemas/doi/geo_location_point.json
  • app/models/schemas/doi/geo_location_polygon.json
  • app/models/schemas/doi/geo_locations.json
  • app/models/schemas/doi/identifier.json
  • app/models/schemas/doi/in_polygon_point.json
  • app/models/schemas/doi/language.json
  • app/models/schemas/doi/name_identifier.json
  • app/models/schemas/doi/name_identifiers.json
  • app/models/schemas/doi/point_latitude.json
  • app/models/schemas/doi/point_longitude.json
  • app/models/schemas/doi/polygon_point.json
  • app/models/schemas/doi/publication_year.json
  • app/models/schemas/doi/publisher.json
  • app/models/schemas/doi/related_identifier.json
  • app/models/schemas/doi/related_identifiers.json
  • app/models/schemas/doi/related_item.json
  • app/models/schemas/doi/related_item_identifier.json
  • app/models/schemas/doi/related_items.json
  • app/models/schemas/doi/resource_type.json
  • app/models/schemas/doi/rights.json
  • app/models/schemas/doi/rights_list.json
  • app/models/schemas/doi/size.json
  • app/models/schemas/doi/sizes.json
  • app/models/schemas/doi/subject.json
  • app/models/schemas/doi/subjects.json
  • app/models/schemas/doi/title.json
  • app/models/schemas/doi/titles.json
  • app/models/schemas/doi/version.json
  • app/validators/xml_schema_validator.rb
  • spec/concerns/indexable_spec.rb
  • spec/factories/doi.rb
  • spec/fixtures/files/datacite-user-example.json
  • spec/fixtures/files/nasa_error.json
  • spec/fixtures/vcr_cassettes/DataciteDoisController/PATCH_/dois/_id/when_the_title_is_changed_wrong_format/error.yml
  • spec/graphql/types/work_type_spec.rb
  • spec/models/doi_spec.rb
  • spec/requests/datacite_dois/patch_spec.rb
  • spec/requests/datacite_dois/post_geo_spec.rb
  • spec/requests/datacite_dois/post_spec.rb

Comment on lines +5 to +6
require "byebug"
require "pp"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify where byebug is required and whether it is scoped to non-production groups.
fd -a "Gemfile" "gems.rb" "Gemfile.lock" --type f
echo "---"
echo "=== Checking Gemfile for byebug gem group configuration ==="
rg -n "byebug|debug" Gemfile gems.rb Gemfile.lock 2>/dev/null || echo "No matches found"
echo "---"
echo "=== Checking for require statements in app/ and spec/ ==="
rg -n 'require "byebug"|require "pp"' app spec 2>/dev/null || echo "No require statements found"
echo "---"
echo "=== Checking app/models/doi.rb directly ==="
head -20 app/models/doi.rb

Repository: datacite/lupo

Length of output: 1609


Remove byebug require from model code.

Requiring debugger gems in production model code causes boot failures in environments where the gem is not installed and is a release blocker. The require "pp" should also be removed—standard library utilities should not be required at class definition time.

Fix
-require "byebug"
-require "pp"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
require "byebug"
require "pp"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/doi.rb` around lines 5 - 6, Remove the top-level debugger and
stdlib requires from the DOI model: delete require "byebug" and require "pp"
from the file that defines the Doi class, and instead add debugger gems to the
development/test groups in the Gemfile or require them only in
scripts/console/debugging entrypoints; if pretty-printing is needed in code,
require "pp" locally inside the specific method or use Rails.logger/inspect for
production-safe logging. Ensure no other code in the Doi class (class Doi)
depends on those top-level requires so the model can load without those gems
present.

validates :version, if: proc { |doi| doi.validate_json_attribute?(:version) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("version") } }, unless: :only_validate
validates :rights_list, if: proc { |doi| doi.validate_json_attribute?(:rights_list) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("rights_list") } }, unless: :only_validate
validates :descriptions, if: proc { |doi| doi.validate_json_attribute?(:descriptions) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("descriptions") } }, unless: :only_validate
validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix geolocation validator key mismatch.

Line 175 validates :geolocations, but this model uses geo_locations. This typo can prevent geolocation JSON-schema validation from running.

🔧 Proposed fix
-  validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
+  validates :geo_locations, if: proc { |doi| doi.validate_json_attribute?(:geo_locations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
validates :geolocations, if: proc { |doi| doi.validate_json_attribute?(:geolocations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
validates :geo_locations, if: proc { |doi| doi.validate_json_attribute?(:geo_locations) }, json: { message: ->(errors) { errors }, schema: lambda { schema_file_path("geolocations") } }, unless: :only_validate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/doi.rb` at line 175, The validator is registering :geolocations
but the model's attribute is :geo_locations, so update the validation call in
DOI to use :geo_locations (and also update the schema name passed to
schema_file_path from "geolocations" to "geo_locations" if your JSON schema file
is named accordingly) so that validate_json_attribute?(:geo_locations) and the
JSON schema validation for geolocation run correctly; look for the validates
line in app/models/doi.rb and adjust the attribute symbol and schema_file_path
argument to the consistent "geo_locations" key.

Comment on lines +8 to +9
{ "$ref": "polygon_point" },
{ "$ref": "in_polygon_point" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix unresolved $ref paths in polygon rules.

Line 8, Line 9, Line 14, and Line 18 reference schema names without .json, unlike the rest of this schema set. This can break reference resolution and make polygon validation fail.

Proposed fix
-      { "$ref": "polygon_point" },
-      { "$ref": "in_polygon_point" }
+      { "$ref": "polygon_point.json" },
+      { "$ref": "in_polygon_point.json" }
@@
-      "contains": { "$ref": "polygon_point" },
+      "contains": { "$ref": "polygon_point.json" },
@@
-      "contains": { "$ref": "in_polygon_point" },
+      "contains": { "$ref": "in_polygon_point.json" },

Also applies to: 14-18

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/geo_location_polygon.json` around lines 8 - 9, The
polygon schema contains unresolved $ref values "polygon_point" and
"in_polygon_point" (and the other similar refs later in the file) that lack the
".json" suffix used elsewhere; update those $ref entries in
app/models/schemas/doi/geo_location_polygon.json (e.g., the entries currently
set to { "$ref": "polygon_point" } and { "$ref": "in_polygon_point" } and the
two similar refs further down) to use the correct filenames with ".json" (for
example "polygon_point.json" and "in_polygon_point.json") so reference
resolution and polygon validation succeed.

{
"title": "PointLongitude",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": [ "number", "string" ]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Longitude lacks range validation.

Line 4 allows values outside valid longitude bounds. Please constrain accepted values to [-180, 180] (including string-form numeric input if compatibility is needed).

Suggested fix
-  "type": [ "number", "string" ]
+  "oneOf": [
+    {
+      "type": "number",
+      "minimum": -180,
+      "maximum": 180
+    },
+    {
+      "type": "string",
+      "pattern": "^-?(?:180(?:\\.0+)?|(?:1[0-7]\\d|\\d?\\d)(?:\\.\\d+)?)$"
+    }
+  ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"type": [ "number", "string" ]
"oneOf": [
{
"type": "number",
"minimum": -180,
"maximum": 180
},
{
"type": "string",
"pattern": "^-?(?:180(?:\\.0+)?|(?:1[0-7]\\d|\\d?\\d)(?:\\.\\d+)?)$"
}
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/point_longitude.json` at line 4, The longitude schema
in point_longitude.json currently allows any number or string because it only
sets "type": ["number","string"]; update the schema so numeric values are
constrained to minimum: -180 and maximum: 180 and string values are allowed only
if they represent a numeric value within [-180,180] (e.g. replace the loose
"type" with an anyOf/oneOf that has one branch for type: "number" with minimum
-180 and maximum 180 and another branch for type: "string" with a regex pattern
that only matches numeric strings in the same range); this will ensure
point_longitude.json enforces valid longitude bounds while preserving
string-compatible input.

Comment on lines +12 to +18
{
"type": "integer",
"minimum": 1000,
"maximum": 3000,
"inclusiveMinimum": true,
"inclusiveMaximum": true
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n app/models/schemas/doi/publication_year.json

Repository: datacite/lupo

Length of output: 715


🌐 Web query:

JSON Schema draft 2020-12 inclusiveMinimum inclusiveMaximum keywords specification

💡 Result:

In JSON Schema draft 2020-12, the keywords inclusiveMinimum and inclusiveMaximum are not part of the specification. Numeric range constraints are expressed with:

  • minimum: inclusive lower bound (instance must be minimum) [1]
  • maximum: inclusive upper bound (instance must be maximum) [1]
  • exclusiveMinimum: exclusive lower bound (instance must be > exclusiveMinimum) and its value MUST be a number [1]
  • exclusiveMaximum: exclusive upper bound (instance must be < exclusiveMaximum) and its value MUST be a number [1]

Note: In older specs (e.g., the IETF draft that corresponds to draft-04-era behavior), exclusiveMinimum/exclusiveMaximum were boolean modifiers of minimum/maximum rather than independent numeric bounds [2].

Sources
[1] JSON Schema 2020-12 Validation vocabulary (numeric keywords: minimum/maximum/exclusiveMinimum/exclusiveMaximum) (json-schema.org)
[2] Older (draft-04-era) behavior: exclusiveMinimum/exclusiveMaximum as booleans (datatracker.ietf.org)

Citations:


Invalid JSON Schema keywords: inclusiveMinimum and inclusiveMaximum.

These keywords do not exist in JSON Schema draft 2020-12 (as declared on line 3) and will be silently ignored by validators. In draft 2020-12, minimum and maximum are inclusive by default. Use exclusiveMinimum and exclusiveMaximum for exclusive bounds.

🐛 Proposed fix
     {
       "type": "integer",
       "minimum": 1000,
-      "maximum": 3000,
-      "inclusiveMinimum": true,
-      "inclusiveMaximum": true
+      "maximum": 3000
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/publication_year.json` around lines 12 - 18, The JSON
Schema in publication_year.json uses invalid keywords inclusiveMinimum and
inclusiveMaximum (they're not part of draft 2020-12); remove those two
properties from the schema so that the existing minimum and maximum (1000 and
3000) remain as inclusive bounds, or if you intended exclusive bounds replace
them with exclusiveMinimum: true and/or exclusiveMaximum: true; update the
object that defines the "type": "integer", "minimum", and "maximum" accordingly.

Comment on lines +15 to +17
"schemeURI": {
"type": "string"
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use consistent property name schemeUri (not schemeURI).

Line 15 introduces a casing mismatch that can reject valid metadata and create inconsistent API behavior across schemas.

Suggested fix
-    "schemeURI": {
+    "schemeUri": {
       "type": "string"
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"schemeURI": {
"type": "string"
},
"schemeUri": {
"type": "string"
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/related_item_identifier.json` around lines 15 - 17,
The property name in the JSON schema is using inconsistent casing: "schemeURI"
should be renamed to "schemeUri" to match the rest of the API; update the
property key in this schema from schemeURI to schemeUri, and search for and
update any references or example fixtures/validators that expect "schemeURI" so
they use "schemeUri" instead to maintain consistency across schemas and avoid
validation failures.

Comment on lines +22 to +33
code: 401
message: Unauthorized
headers:
Date:
- Wed, 04 May 2022 17:40:51 GMT
Content-Type:
- application/json; charset=utf-8
- Thu, 22 May 2025 13:11:35 GMT
Content-Length:
- '0'
Connection:
- keep-alive
Status:
- 200 OK
Cache-Control:
- max-age=0, private, must-revalidate
Vary:
- Origin
Etag:
- W/"4779b1c8cb81d39142cef21ab1d60307"
X-Runtime:
- '0.261111'
X-Request-Id:
- eda1eb1c-86ed-44bc-abb8-e241e0c9da76
X-Powered-By:
- Phusion Passenger 5.2.3
Server:
- nginx/1.12.2 + Phusion Passenger 5.2.3
Content-Encoding:
- gzip
Www-Authenticate:
- Basic realm="handle", Handle sessionId="node0jpq58pljxzd91vdly5e1oc72o5",
nonce="XJEPmnoCbjbSEdGkeuQcGQ==", error="Identity not verified"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cassette now captures auth failure instead of the intended format-validation failure.

For a “wrong format” scenario, recording a 401 unauthorized response can make the test pass/fail for the wrong reason and stop asserting metadata validation behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spec/fixtures/vcr_cassettes/DataciteDoisController/PATCH_/dois/_id/when_the_title_is_changed_wrong_format/error.yml`
around lines 22 - 33, The recorded cassette for the
"when_the_title_is_changed_wrong_format" scenario now contains an authentication
failure (code: 401 and a Www-Authenticate header) instead of the expected
format-validation failure; update the fixture so the PATCH /dois/_id cassette
returns the validation error (e.g., 422 or the API's format-error code) with the
validation message payload rather than the 401 Unauthorized response—either
re-record the cassette while ensuring requests include valid auth or manually
replace the error.yml contents (remove the Www-Authenticate header and set code
and message to the format-validation error) so tests assert metadata validation
behavior.

Comment on lines +2684 to +2688
before do
VCR.eject_cassette
VCR.turn_off!
WebMock.allow_net_connect!
end
Copy link

@coderabbitai coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all VCR/WebMock state toggles in the spec file
rg -n "VCR\\.turn_off!|VCR\\.turn_on!|VCR\\.turned_off|WebMock\\.allow_net_connect!|WebMock\\.disable_net_connect!" spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 181


🏁 Script executed:

#!/bin/bash
# Read the specific section with context (lines 2680-2700) to understand the structure
sed -n '2680,2700p' spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 484


🏁 Script executed:

#!/bin/bash
# Look for any after hooks or teardown that might restore state around this section
sed -n '2620,2750p' spec/requests/datacite_dois/post_spec.rb | head -100

Repository: datacite/lupo

Length of output: 4656


🏁 Script executed:

#!/bin/bash
# Look at more context to find where the describe block ends and if there's an after hook
sed -n '2720,2850p' spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 66


🏁 Script executed:

#!/bin/bash
# Search for any after hooks in the file around this section
rg -B 2 -A 2 "^\s*(after|end)" spec/requests/datacite_dois/post_spec.rb | sed -n '1,150p'

Repository: datacite/lupo

Length of output: 3604


🏁 Script executed:

#!/bin/bash
# Count describe and end to see structure, look for corresponding end to the describe block starting at 2629
wc -l spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 102


🏁 Script executed:

#!/bin/bash
# Check the very end of the file
tail -5 spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 95


🏁 Script executed:

#!/bin/bash
# Check if there's anything between line 2720 and end of file
sed -n '2715,2721p' spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 227


🏁 Script executed:

#!/bin/bash
# Check if there are any after hooks at the top level or file level that might restore
rg "^\s*after\s" spec/requests/datacite_dois/post_spec.rb

Repository: datacite/lupo

Length of output: 39


Avoid leaking global VCR/WebMock state from this example group.

VCR.turn_off! + WebMock.allow_net_connect! in the before hook are not restored, causing global state to leak and making later specs flaky.

Use an around hook instead to ensure state restoration:

Safer pattern
-    before do
-      VCR.eject_cassette
-      VCR.turn_off!
-      WebMock.allow_net_connect!
-    end
+    around do |example|
+      VCR.turned_off do
+        WebMock.allow_net_connect!
+        example.run
+      ensure
+        WebMock.disable_net_connect!(allow_localhost: true)
+      end
+    end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
before do
VCR.eject_cassette
VCR.turn_off!
WebMock.allow_net_connect!
end
around do |example|
VCR.turned_off do
WebMock.allow_net_connect!
example.run
ensure
WebMock.disable_net_connect!(allow_localhost: true)
end
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/requests/datacite_dois/post_spec.rb` around lines 2684 - 2688, The
before hook is turning off VCR and allowing net connections (VCR.eject_cassette,
VCR.turn_off!, WebMock.allow_net_connect!), which leaks global test state;
replace the before block with an around hook that saves current VCR and WebMock
state, yields the example, and then restores VCR (re-enable and re-insert
cassette if needed) and WebMock settings after the example so global state is
always restored even on failure — update the example group to use an around hook
that wraps the example execution and toggles VCR.turn_off!/VCR.turn_on! and
WebMock.allow_net_connect!/WebMock.disable_net_connect! symmetrically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svogt0511 Drawing your attention to this. Probably needs and after or an around block to take care of this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
app/models/schemas/doi/related_identifier.json (1)

18-18: ⚠️ Potential issue | 🔴 Critical

Fix relatedMetaDataScheme vs relatedMetadataScheme naming mismatch.

Line 18 uses relatedMetaDataScheme, while Line 37/Line 54 use relatedMetadataScheme. With additionalProperties: false (Line 63), this can reject the canonical field and/or validate the wrong one.

Proposed fix
-    "relatedMetaDataScheme": {
+    "relatedMetadataScheme": {
       "type": "string"
     },
+    "schemeType": {
+      "type": "string"
+    },
#!/bin/bash
set -euo pipefail

python - <<'PY'
import json, pathlib
p = pathlib.Path("app/models/schemas/doi/related_identifier.json")
s = json.loads(p.read_text())

top = set(s.get("properties", {}).keys())
then_props = set(s.get("then", {}).get("properties", {}).keys())
else_props = set()
for clause in s.get("else", {}).get("not", {}).get("anyOf", []):
    else_props.update(clause.get("required", []))

print("Top-level keys:", sorted(top))
print("Then keys:", sorted(then_props))
print("Else-disallowed keys:", sorted(else_props))
print("Mismatched related metadata key names present:",
      "relatedMetaDataScheme" in top and "relatedMetadataScheme" in (then_props | else_props))
print("Keys referenced in conditionals but missing from top-level properties:",
      sorted((then_props | else_props) - top))
PY

Also applies to: 37-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/related_identifier.json` at line 18, The schema has a
naming mismatch between relatedMetaDataScheme and relatedMetadataScheme causing
validation failures under additionalProperties: false; update the schema so the
property name is consistent across the top-level "properties", the conditional
"then" and "else" clauses, and any "required" arrays—prefer the canonical
relatedMetadataScheme (change relatedMetaDataScheme to relatedMetadataScheme) or
add an explicit alias property and mirror the conditional/required references to
both names; ensure the "properties" object includes the final name(s) and that
no conditional/required entries reference a name that isn't declared so
validation succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/schemas/doi/related_identifier.json`:
- Around line 28-50: The if/then branch currently only specifies types for
relatedMetadataScheme, schemeUri, schemeType, and relationTypeInformation when
relationType is "HasMetadata" or "IsMetadataFor" but doesn't enforce them;
update the "then" block to include a required array that lists
"relatedMetadataScheme", "schemeUri", "schemeType", and
"relationTypeInformation" so those properties are mandatory whenever
relationType equals HasMetadata or IsMetadataFor (referencing the relationType
enum and the then properties relatedMetadataScheme, schemeUri, schemeType,
relationTypeInformation).

---

Duplicate comments:
In `@app/models/schemas/doi/related_identifier.json`:
- Line 18: The schema has a naming mismatch between relatedMetaDataScheme and
relatedMetadataScheme causing validation failures under additionalProperties:
false; update the schema so the property name is consistent across the top-level
"properties", the conditional "then" and "else" clauses, and any "required"
arrays—prefer the canonical relatedMetadataScheme (change relatedMetaDataScheme
to relatedMetadataScheme) or add an explicit alias property and mirror the
conditional/required references to both names; ensure the "properties" object
includes the final name(s) and that no conditional/required entries reference a
name that isn't declared so validation succeeds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f3cd11d-353d-4147-8bbe-3457b38e3cfa

📥 Commits

Reviewing files that changed from the base of the PR and between 38f9fbd and fa2327b.

📒 Files selected for processing (6)
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_item_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/related_identifier.json
  • app/models/schemas/doi/related_item.json
🚧 Files skipped from review as they are similar to previous changes (5)
  • app/models/schemas/doi/controlled_vocabularies/related_item_type.json
  • app/models/schemas/doi/related_item.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/schemas/doi/title.json`:
- Around line 6-8: The "title" schema currently only enforces type:string so
empty or whitespace-only titles are allowed; update the "title" property (in
app/models/schemas/doi/title.json) to require non-empty, non-blank values by
adding constraints such as "minLength": 1 and a "pattern" that requires at least
one non-whitespace character (e.g., \\\\S), and ensure the existing required
declaration that references "title" still applies; apply the same constraints
for any other title occurrence mentioned in the schema.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c82f3ce-0cef-4020-b4ee-a276dd35c1ef

📥 Commits

Reviewing files that changed from the base of the PR and between 30cf06e and 13e2288.

📒 Files selected for processing (12)
  • app/models/schemas/doi/controlled_vocabularies/contributor_type.json
  • app/models/schemas/doi/controlled_vocabularies/date_type.json
  • app/models/schemas/doi/controlled_vocabularies/description_type.json
  • app/models/schemas/doi/controlled_vocabularies/funder_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/name_type.json
  • app/models/schemas/doi/controlled_vocabularies/number_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_item_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/controlled_vocabularies/title_type.json
  • app/models/schemas/doi/title.json
🚧 Files skipped from review as they are similar to previous changes (8)
  • app/models/schemas/doi/controlled_vocabularies/funder_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/resource_type_general.json
  • app/models/schemas/doi/controlled_vocabularies/title_type.json
  • app/models/schemas/doi/controlled_vocabularies/number_type.json
  • app/models/schemas/doi/controlled_vocabularies/date_type.json
  • app/models/schemas/doi/controlled_vocabularies/related_identifier_type.json
  • app/models/schemas/doi/controlled_vocabularies/relation_type.json
  • app/models/schemas/doi/controlled_vocabularies/contributor_type.json

Comment on lines +6 to +8
"title": {
"type": "string"
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Disallow empty/blank required titles.

Line 6Line 8 only enforce type: "string", so "" (or whitespace-only values) will still pass even though Line 17Line 19 marks title as required. Please add content constraints.

Suggested schema patch
     "title": { 
-      "type": "string"
+      "type": "string",
+      "minLength": 1,
+      "pattern": ".*\\S.*"
     },

Also applies to: 17-19

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/schemas/doi/title.json` around lines 6 - 8, The "title" schema
currently only enforces type:string so empty or whitespace-only titles are
allowed; update the "title" property (in app/models/schemas/doi/title.json) to
require non-empty, non-blank values by adding constraints such as "minLength": 1
and a "pattern" that requires at least one non-whitespace character (e.g.,
\\\\S), and ensure the existing required declaration that references "title"
still applies; apply the same constraints for any other title occurrence
mentioned in the schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants