feat: Add archive marking and detection plus Reader into builder options#1813
feat: Add archive marking and detection plus Reader into builder options#1813
Conversation
…active manifest - it doesn't have to be the parent ingrdient anymore.
Adds intelligent detection and handling of three distinct archive types: builder archives, archived ingredients, and regular C2PA manifests. Changes: - Add archive type marker to builder archives via ClaimGeneratorInfo - Implement ArchiveType enum and detection logic in Reader - Update into_builder() to handle three conversion paths: 1. Builder archives: restore original ManifestDefinition 2. Archived ingredients: create new builder with ingredient 3. Regular manifests: convert manifest to parent ingredient - Simplify add_ingredient_from_reader() to leverage into_builder() - Add comprehensive tests (currently ignored due to timestamp issues) All existing tests pass. Changes are backward compatible.
…ilder for both cases The restore_builder_from_archive method was fundamentally broken because it tried to directly restore ingredients without rebuilding their manifest_data. When Builder::to_archive() creates an archive, it: 1. Converts the builder to a unified Claim via to_claim() 2. Flattens all ingredient manifests into one Store 3. Ingredients reference claims within that unified store Therefore, both builder archives AND regular manifests need the same treatment: - Extract each ingredient's claim from the unified store - Recursively collect nested ingredient claims - Rebuild and embed as binary JUMBF manifest_data The manifest_into_builder() method already does this correctly, so we now use it for both ArchiveType::Builder and ArchiveType::Manifest. Removed 49 lines of broken code, simplified the logic.
…r manifests For ArchiveType::Manifest (regular signed C2PA assets), the correct behavior is to convert the ENTIRE reader into ONE parent ingredient in a new builder, not to reconstruct the manifest's internal structure. Added reader_to_parent_ingredient() which: - Creates a new parent Ingredient from the manifest metadata - Embeds the entire manifest store as manifest_data - Returns it for use in a fresh builder Now the three paths are clear: 1. Builder Archive -> Restore builder (rebuild ingredient stores from unified store) 2. Archived Ingredient -> Extract pre-packaged ingredient 3. Regular Manifest -> Convert entire reader to parent ingredient All tests pass.
The enum name ArchiveType was misleading since one of the variants (Manifest) represents a regular signed asset, not an archive. Renamed to ManifestType with clearer variant names: - Builder -> ArchivedBuilder - Ingredient -> ArchivedIngredient - Manifest -> SignedManifest Also renamed archive_type() method to manifest_type(). This better reflects that we're identifying the TYPE of manifest data in the Reader, not just archive formats.
Builder archives created before the com.contentauth.archive_type marker was added can still be detected by their characteristic signature: - They have a BoxHash assertion - The BoxHash contains only a "c2pa" box entry (no media boxes) This fallback detection ensures existing builder archives continue to work correctly with the new three-way conversion logic.
Updated manifest_type() detection logic to properly identify: 1. ArchivedBuilder (new): Has com.contentauth.archive_type marker 2. ArchivedBuilder (old): Has format=application/c2pa + DataHash + BoxHash(c2pa only) 3. ArchivedIngredient: Has format=application/c2pa + BoxHash(c2pa only) + NO DataHash 4. SignedManifest: Everything else (default) Key insight: Builder archives have placeholder signatures (DataHash), while archived ingredients are fully signed (no DataHash). For now, SignedManifest continues to use manifest_into_builder() for backward compatibility. The reader_to_parent_ingredient() method is available for future use when we want to treat entire manifests as single parent ingredients. All tests pass.
Previously, working_store_sign() created archives with BOTH BoxHash and DataHash assertions, which violates C2PA spec (two hard bindings). Changes: - With signer: Use BoxHash ONLY via get_box_hashed_embeddable_manifest() - Without signer: Use DataHash ONLY via get_data_hashed_manifest_placeholder() - Add org.contentauth.archive_type marker for new archives - Simplify manifest_type() detection logic - Remove dependency on format field for detection New archives have only one hard binding (C2PA compliant). Old archives with double bindings can still be detected for backward compatibility.
When converting a Reader to a parent ingredient, preserve the validation status and results from the reader. Changes: - Add pub(crate) setters for validation_status and validation_results to Ingredient - Update reader_to_parent_ingredient to copy validation data to the ingredient This ensures validation errors are preserved when a reader is converted to an ingredient in a new builder.
Merging this PR will not alter performance
Comparing Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1813 +/- ##
==========================================
- Coverage 76.22% 76.01% -0.21%
==========================================
Files 170 170
Lines 39416 39522 +106
==========================================
- Hits 30045 30044 -1
- Misses 9371 9478 +107 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Changes: - Update into_builder() to use reader_to_parent_ingredient for SignedManifest - Update add_ingredient_from_reader() to detect ManifestType and handle each case: - ArchivedIngredient: Extract the first ingredient - SignedManifest: Convert to parent ingredient with validation results - ArchivedBuilder: Return error (not supported) - Make reader_to_parent_ingredient pub(crate) and non-consuming (&self) - Removes all dead code warnings This ensures validation results are preserved when converting readers to ingredients.
Previously, this method called into_builder() which would incorrectly handle SignedManifest by creating a nested structure. Changes: - Detect ManifestType like add_ingredient_from_reader does - ArchivedIngredient: Extract the ingredient - SignedManifest: Convert to parent ingredient with validation - ArchivedBuilder: Return error (not supported) This ensures add_ingredient_from_stream behaves correctly for all manifest types.
…chives and archive types. This is stripped out when using archived data.
| /// # Errors | ||
| /// * Returns an [`Error`] if the builder doesn't have exactly one ingredient. | ||
| /// * Returns an [`Error`] if the archive cannot be written. | ||
| pub fn to_ingredient_archive(&mut self, stream: impl Write + Seek) -> Result<()> { |
There was a problem hiding this comment.
What about the information in the enclosed Builder? I think this would make more sense if it was Builder::to_ingredient_archive(ingredient_json, stream), and it archived every ingredient already inside the builder.
| /// * `reader` - The Reader to get the ingredient from (consumed). | ||
| /// # Returns | ||
| /// * A reference to the added ingredient. | ||
| fn add_ingredient_from_reader_owned( |
There was a problem hiding this comment.
Does this need to take an owned Reader since it borrows all its fields?
| fn working_store_sign(&mut self, archive_type: &str) -> Result<Vec<u8>> { | ||
| // Add archive metadata assertion to mark this as an archive | ||
| let archive_metadata = serde_json::json!({ | ||
| "@context": { "archive": "http://contentauth.org/archive/1.0/" }, |
| /// If a signer is available in the context, it uses box hash signing | ||
| /// Otherwise, it falls back to a DataHash placeholder |
There was a problem hiding this comment.
Why box/data hash depending on signer availability? I thought working stores are unsigned? Also, is there anything special that's done for box hash with application/c2pa?
| Ok(()) | ||
| } | ||
|
|
||
| // TODO: These tests require additional network-free test infrastructure |
There was a problem hiding this comment.
Can we disable timestamping for this test?
| ManifestType::SignedManifest => { | ||
| // Regular manifest: convert the entire reader to a parent ingredient | ||
| let ingredient = self.reader_to_parent_ingredient()?; | ||
| let mut builder = crate::Builder::from_shared_context(&self.context); | ||
| builder.add_ingredient(ingredient); | ||
| Ok(builder) | ||
| } | ||
| } |
There was a problem hiding this comment.
Why do we convert the parent to a signed manifest and not the entire thing as a a new ingredient? The latter is what I'd naturally expect.
|
|
||
| /// Converts the entire Reader into a single parent Ingredient. | ||
| /// The entire manifest store is embedded as manifest_data. | ||
| pub(crate) fn reader_to_parent_ingredient(&self) -> Result<Ingredient> { |
There was a problem hiding this comment.
I think it would make more sense to implement From<Reader> for Ingredient and consume an owned Reader since we are cloning all its fields anyways.
NOTE: This is NOT ready to Merge and is just a draft of ideas in progress:
Archive Type Detection Using Custom Metadata Assertion
Added org.contentauth.archive.metadata custom assertion to mark and detect archive types, replacing the format-based approach that failed with v2 claims.
Key Changes:
Archive Creation (builder.rs):
working_store_sign() now adds org.contentauth.archive.metadata assertion with archive:type field
Archive type determined by calling method: to_archive() → "builder", to_ingredient_archive() → "ingredient"
Fixed double hash binding (BoxHash + DataHash): now conditionally uses BoxHash or DataHash based on signer availability - this will sign with the current context signer if it is available, making a valid box hashed signed archive.
Archive Detection (reader.rs):
manifest_type() checks for archive metadata assertion first using Metadata type
Falls back to heuristics for old archives (double binding, presence of ingredients)
manifest_into_builder() strips the archive metadata assertion when restoring (no longer an archive)
Three-Way Conversion Logic (reader.rs, builder.rs):
Reader.into_builder(): ArchivedBuilder → restores builder state, ArchivedIngredient → extracts ingredient, SignedManifest → converts to parent ingredient
Builder.add_ingredient_from_reader(): handles all three types appropriately
Archive metadata naturally removed when converting to ingredient (ingredient archives only contain the ingredient, not the parent manifest)
New API (builder.rs):
Added Builder.to_ingredient_archive() for creating ingredient archives with validation