Skip to content

Fix: Loosing data if XML was downloaded and uploaded again#1054

Merged
McNamara84 merged 10 commits intomainfrom
fix/various-upload-bug-after-save-as-xml
Apr 10, 2026
Merged

Fix: Loosing data if XML was downloaded and uploaded again#1054
McNamara84 merged 10 commits intomainfrom
fix/various-upload-bug-after-save-as-xml

Conversation

@McNamara84
Copy link
Copy Markdown
Owner

@McNamara84 McNamara84 commented Apr 10, 2026

This pull request significantly improves the robustness and flexibility of the XML-to-form mapping logic, particularly in handling XML namespaces and fallback scenarios. The most important changes include enhanced support for both namespaced and non-namespaced XML, improved logic for adding contact persons, and corresponding updates to the test suite to ensure correct behavior.

XML Namespace Handling and Fallbacks:

  • Updated processResourceType, getGeoLocationData, processSpatialTemporalCoverages, and processFunders to use XPath queries that support both namespaced and non-namespaced XML documents, ensuring broader compatibility with varying XML inputs. (js/mappingXmlToInputFields.js) [1] [2] [3] [4] [5]

Contact Person Handling:

  • Enhanced processContactPersons and processContactPersonsFromDataCite to automatically add new author rows if no matching author is found, and populate these rows with the contact person's data. This ensures that all contact persons from the XML are represented in the form, even if not previously present. (js/mappingXmlToInputFields.js) [1] [2]

Test Suite Updates:

  • Updated and extended tests to cover the new logic for adding author rows, including simulating the "add author" button and verifying that new rows are created and populated correctly when no match is found. (tests/js/contactPersonLoading.test.js) [1] [2] [3] [4] [5]
  • Updated module tests to use namespaced XML and pass the namespace resolver, ensuring that the new XML handling logic is covered by tests. (tests/js/mappingXmlToInputFields.module.test.js) [1] [2] [3]

Exports and Integration:

  • Added missing exports for several functions (processDescriptions, processRelatedWorks, processFunders, processSpatialTemporalCoverages) to ensure they are available for use in other modules or tests. (js/mappingXmlToInputFields.js)

These changes make the mapping logic more robust against varying XML formats and ensure all relevant metadata is properly captured in the form UI.

Notes for Reviewer

None.

Checklist

  • My code follows the style guide.
  • I have self-reviewed my code.
  • I added comments for hard-to-understand code.
  • If applicable, PHP code is documented using PHPDoc.
  • If applicable, JavaScript code is documented using JSDoc.
  • If needed, the ELMO Guide has been updated.
  • If needed, the README has been updated.
  • If needed, the API documentation has been updated.
  • If a new feature was added or a bug fixed, the changelog has been updated.
  • My changes do not create new warnings in the test browser console.
  • I have added unit tests that cover my code.
  • All new and existing unit tests pass locally.
  • All new and existing automated unit tests pass in the pull request.
  • If applicable, Playwright tests have been updated and new tests added.
  • All new and existing automated Playwright tests pass in the pull request.
  • I ensured the changes meet accessibility guidelines.

Known Issues

None.

Introduce a comprehensive new test suite (tests/js/xmlRoundtripDataLoss.test.js) to document known XML export→import data-loss scenarios. The file adds helpers (DOM/jQuery mock, Tagify stubs, buildDataCiteXml) and loads the mapping module in a VM to exercise import routines. Tests cover namespace/querySelector issues (funderIdentifier/funderIdType), awardURI handling, contact-person mapping (DataCite-only vs ISO envelope, unmatched names), relation-type matching, multiple funding references cloning, temporal coverage/timezone parsing, loss of xml:lang for descriptions/titles, preservation of thesaurus keyword metadata in Tagify, ORCID/ROR URL stripping, and contributor ORCID extraction. These tests serve to reproduce and prove bugs (phase 1) so they can be fixed and turned into regression tests (phase 2).
@McNamara84 McNamara84 self-assigned this Apr 10, 2026
@McNamara84 McNamara84 added the bug Etwas funktioniert nicht label Apr 10, 2026
@McNamara84 McNamara84 linked an issue Apr 10, 2026 that may be closed by this pull request
Switch test file to run in the jsdom environment and replace the custom jQuery/VM loader with real requires. Add beforeEach/afterEach to set up global $, jQuery, Tagify, translations and mappingModule (require("../../js/mappingXmlToInputFields.js")). Replace mock helpers and VM-based execution with direct DOMParser-parsed XML strings and direct calls to mappingModule.processFunders/processContactPersons. Update assertions to use the jQuery mock's API (e.g. .first().val()) and adjust test names/comments to clarify the namespace-related bugs. Clean up test scaffolding and mocks to improve reliability and readability.
Add processDescriptions, processRelatedWorks and processFunders to the exported API of js/mappingXmlToInputFields.js so these import-mapping helpers are available to other modules/tests. Also remove the large legacy tests/js/xmlRoundtripDataLoss.test.js file (round-trip XML data loss test suite) which was deleted.
Introduce comprehensive Jest tests (tests/js/xmlRoundtripDataLoss.test.js) that document and reproduce known data-loss scenarios when exporting/importing XML via mappingXmlToInputFields.js. The suite includes helpers (loadMappingModule, createJQuery, ns-prefix builder) and covers: funderIdentifier namespace/querySelector bug, awardURI import, contact-person email/website fallback and matching, relation type matching (text vs CamelCase), multiple funding references handling, temporal coverage timezone parsing, description/title xml:lang loss, thesaurus keyword Tagify preservation, and ORCID extraction edge cases under jsdom XPath limitations. These tests serve as phase‑1 failing cases to be turned into regression tests after fixes.
Update unit tests to reflect improved import behavior and XML namespace handling. Contact-person tests now simulate the add-author button (injecting a click handler) and assert that unmatched contact persons create new author rows with email/website populated and contact checkbox set. XML round-trip tests were rewritten to verify XPath usage for namespaced funderIdentifier (no querySelector), adjust ns: prefixes in fixtures, and reframe several BUG comments into FIX/regression or known-limitation descriptions. Also refine relation-type and language-attribute tests to match current importer expectations.
When no matching author was found for a contact person we previously logged a warning and skipped it. Update processContactPersons and processContactPersonsFromDataCite to click the add-author button, populate the new row's given/family names, and proceed to mark it as the contact person. Also improve processFunders by using XPath to retrieve the funderIdentifier node so its text content and funderIdentifierType attribute are read reliably (fixes issues with namespaced XML and querySelector).
Replace querySelector-based XML lookups with XPath evaluation using a namespace resolver.

- mappingXmlToInputFields.js: processResourceType now accepts a resolver and finds resourceType via xmlDoc.evaluate; getGeoLocationData now accepts xmlDoc and resolver and uses XPath helper lookups to read geoLocationPlace, geoLocationBox and geoLocationPoint values. Call sites updated to pass the resolver (including loadXmlToForm and spatial/temporal processing).
- Tests: Updated unit tests to use namespaced (ns:) XML fixtures, provide an nsResolver, and pass xmlDoc/resolver into the updated functions. Added regression tests that assert querySelector is no longer used on XML and verify correct parsing of resourceType and geoLocation values from namespaced XML.

Motivation: querySelector fails on namespaced XML; switching to XPath with a resolver ensures correct extraction of namespaced elements and prevents data-loss/regression when importing XML.
@McNamara84 McNamara84 marked this pull request as ready for review April 10, 2026 09:01
@McNamara84 McNamara84 requested a review from Copilot April 10, 2026 09:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes XML round-trip data loss scenarios by improving DataCite namespace handling during import (moving from querySelector to XPath with a namespace resolver) and by ensuring contact persons are not dropped when they don’t match existing author rows.

Changes:

  • Refactors multiple XML import paths to use XPath + namespace resolver for reliable parsing of namespaced DataCite XML.
  • Updates contact person import logic to create a new author row when no author matches the contact person.
  • Expands/updates Jest tests, including a new regression test suite covering several previously observed round-trip loss cases.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
js/mappingXmlToInputFields.js Switches key extractors to XPath/resolver and changes contact-person handling to add missing author rows.
tests/js/contactPersonLoading.test.js Updates tests to simulate author-row creation and assert new-row creation for unmatched contact persons.
tests/js/mappingXmlToInputFields.test.js Updates helper tests to use namespaced XML and pass a resolver for XPath-based parsing.
tests/js/mappingXmlToInputFields.module.test.js Adjusts module-coverage tests for updated function signatures and namespaced XML fixtures.
tests/js/xmlRoundtripDataLoss.test.js Adds a large regression suite for multiple XML export→import round-trip loss scenarios.

Support both namespaced and non-namespaced XML for resourceType by using a combined XPath (".//ns:resourceType | .//resourceType"). Add safety checks when creating new author rows: record creator-row count before/after clicking the add button, warn and skip if a new row could not be created. Update tests to remove brittle source-code introspection (string-matching for querySelector/evaluate) in favor of behavior-driven checks, add a test for non-namespaced resourceType, convert several xml:lang expectations into focused assertions and todos, and make ORCID-related tests tolerant of jsdom XPath limitations by asserting conditionally. These changes improve robustness when importing varied XML and make tests less dependent on implementation details.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Allow parsing of elements both with and without the expected namespace to avoid data loss when XML lacks namespaces. getGeoLocationData now checks for "ns:NAME | NAME" in getText/getNode, processSpatialTemporalCoverages XPath includes non-namespaced geoLocation nodes, and processFunders XPath and lookups were updated to accept non-namespaced funder/award elements while preserving identifier/URI attributes. Added unit tests to verify funder and geoLocation imports work for non-namespaced XML.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Add an XPath fallback to find date elements in non-namespaced XML (//dates/date[...]) so temporal coverage nodes are discovered when namespaces are absent. Also add processSpatialTemporalCoverages to the module exports. A test was added to xmlRoundtripDataLoss.test.js to verify processing of non-namespaced spatial/temporal coverage; the test verifies spatial fields and documents a jsdom limitation with XPath attribute predicates that prevents asserting temporal fields in this environment.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@McNamara84 McNamara84 merged commit aadeb25 into main Apr 10, 2026
14 checks passed
@McNamara84 McNamara84 deleted the fix/various-upload-bug-after-save-as-xml branch April 10, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Etwas funktioniert nicht

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] lost relaited work

2 participants