Added mechanism for pipelines to set the iFDO "image-set-header" section#31
Added mechanism for pipelines to set the iFDO "image-set-header" section#31GermanHydrogen wants to merge 8 commits intocsiro-fair:devfrom
Conversation
There was a problem hiding this comment.
Hi @GermanHydrogen,
Thank you for this PR and for taking the time to implement a solution for #22. I appreciate the thought you put into the design and the consideration for backward compatibility.
After reviewing this implementation, I wanted to let you know that I've recently completed work on this feature using a different approach that I think better fits Marimba's architecture. Please see PR #32
The key architectural differences include:
Ownership and Responsibility Model: Having pipelines directly construct and return MetadataHeader[ImageSetHeader] objects tightly couples pipelines to specific metadata schemas (iFDO in particular). In the new implementation, pipelines instead return generic dictionaries via a get_metadata_header() method, and the metadata schema classes (iFDOMetadata, GenericMetadata) are responsible for interpreting those generic dictionaries according to their own schema requirements. This separation provides several benefits: pipelines remain schema-agnostic and don't need to import iFDO-specific classes, the same pipeline can work with any metadata schema without changes, and there's a clearer separation of concerns where pipelines know about their data and metadata classes know about metadata formats.
Backward Compatibility: Changing the _package() return type signature is a major breaking change that would require updates to all existing pipeline implementations. The new implemented approach avoids this by adding a new optional method with a default implementation that returns an empty dictionary from the base class, allowing existing pipelines to work without any changes.
Context Differentiation: Because Marimba can produce output metadata files at multiple levels (dataset, pipeline, and collection), the feature needs to generate different metadata for each granularity level. Without context differentiation, all three levels receive the same header information. The implemented solution provides a context parameter so pipelines can return different metadata depending on the granularity level, enabling level specific header attributes for dataset, pipeline and collection levels.
Automatic Optimization: I've included your deduplication implementation from the iFDO compressor work - it's integrated into the metadata generation pipeline and operates automatically without requiring pipelines to manage it manually.
I have just submitted the redesigned implementation in PR #32 if you want to review the details. I believe this alternative implementation better serves Marimba's long-term maintainability and aligns with the framework's design principles of separation of concerns and schema-agnostic design.
I appreciate your contribution to the project and it would be great to get your approval on the resultant structure of the output iFDO files.
-Chris
This PR adds a mechanism for pipelines to set the iFDO "image-set-header" section based on the second solution proposed in #22:
Changes:
MetadataHeaderclass for wrapping different header dataBasePipeline._packagesignature to enable pipelines to set header dataiFDOMetdatato use theMetadataHeadersupplied from the pipelinesThis change should be backwards compatible.
To discuss @cjackett
iFDOMetadata.process_filesinsert the metadata into the EXIF Header?pydantic.BaseModelto ensure the header data can be serialized to and parsed fromdict. Alternatively the header class could be bound to a Protocol, which requiresmodel_dumpandmodel_validateto be implemented. (see https://github.com/GermanHydrogen/marimba/blob/4da04c6ceb572fbac27ce8f533bd381844978b98/marimba/core/schemas/header/base.py#L17)