Skip to content

iFDO Validation#25

Open
GermanHydrogen wants to merge 6 commits intocsiro-fair:devfrom
GermanHydrogen:feat/ifdo-validator
Open

iFDO Validation#25
GermanHydrogen wants to merge 6 commits intocsiro-fair:devfrom
GermanHydrogen:feat/ifdo-validator

Conversation

@GermanHydrogen
Copy link
Copy Markdown
Collaborator

Added iFDO validation based on the iFDO JSON schema to warn the user if an incomplete iFDO is output to a dataset. This is done by logging and appending the suffix 'incomplete' to the filename of the iFDO. For this to work, I had to add the iFDO JSON schema to the marimba package.

This addition does not break the behavior of marimba.

GermanHydrogen and others added 6 commits May 27, 2025 14:23
# Conflicts:
#	marimba/core/wrappers/dataset.py
#	pyproject.toml
Updated the logger to include the file extension and indicate when incomplete iFDO files will be saved with the '.incomplete' suffix. This provides clearer feedback on the file naming and format during validation failures.
@cjackett
Copy link
Copy Markdown
Contributor

cjackett commented Jun 6, 2025

Hi @GermanHydrogen,

Thank you for implementing this iFDO validation feature. This is a good improvement to help ensure FAIR compliance.

I've tested the implementation and it works perfectly for both YAML and JSON output formats. I made a small modification to the logging warning message for better clarity and consistency with other dataset logging statements. This now shows the full filename with correct extension and clearly explains what's happening.

I note that the image-set-handle field is required by the iFDO schema and should contain a URL or DOI pointing to the final published dataset. However, we have a bit of a chicken-and-egg problem in our publishing workflow where we cannot obtain a DOI until after we've published the packaged dataset. This means all of our current pipelines will produce ifdo.incomplete.yml files due to this missing field alone. Have you encountered similar issues with your publication pathways? How do you handle the image-set-handle requirement?

I see two main options to address this:

Option 1: Field-specific ignore list

  • Add an optional ignored_fields parameter to the validator
  • Temporarily ignore image-set-handle validation until our DOI workflow is resolved
  • Still validates all other required fields

Option 2: Post-processing rename

  • Use a pipeline post-package hook to rename ifdo.incomplete.yml → ifdo.yml
  • Less elegant and could mask other real validation issues

My preference would be for Option 1 as it maintains validation for all other required fields, provides a clean, temporary workaround, doesn't require immediate changes to all of our existing pipelines, and can be easily removed once our DOI workflow is resolved.

What are your thoughts on this approach? Do you see any other solutions or have experience with similar publication workflow challenges?

@GermanHydrogen
Copy link
Copy Markdown
Collaborator Author

Hi @cjackett,

the problem you are describing is also true for image-handle. We are facing a similar problem, but we are solving it differently.
Our data publisher, PANGAEA does not add iFDOs in the dataset itself, but as a separate "Additional metadata" file. So the curators, who handle the publishing process, create the publication with the data first, update the iFDO with the handle URL and then add the iFDO to the publication. PANGAEA also has no support for image handles at the moment, but we are working on solving both problems.

GEOMAR is currently working on a solution for templating the image handles based on the image UUIDs, so that they are known prior to the publication.

I would also support option 1, but the ignore list should be set by a CLI argument or environment variable to allow for individualization. I will look into implementing this next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants