Skip to content

Comments

feat: Add archive marking and detection plus Reader into builder options#1813

Draft
gpeacock wants to merge 14 commits intomainfrom
gpeacock/archive_handling
Draft

feat: Add archive marking and detection plus Reader into builder options#1813
gpeacock wants to merge 14 commits intomainfrom
gpeacock/archive_handling

Conversation

@gpeacock
Copy link
Collaborator

@gpeacock gpeacock commented Feb 4, 2026

NOTE: This is NOT ready to Merge and is just a draft of ideas in progress:

Archive Type Detection Using Custom Metadata Assertion
Added org.contentauth.archive.metadata custom assertion to mark and detect archive types, replacing the format-based approach that failed with v2 claims.
Key Changes:
Archive Creation (builder.rs):
working_store_sign() now adds org.contentauth.archive.metadata assertion with archive:type field
Archive type determined by calling method: to_archive() → "builder", to_ingredient_archive() → "ingredient"
Fixed double hash binding (BoxHash + DataHash): now conditionally uses BoxHash or DataHash based on signer availability - this will sign with the current context signer if it is available, making a valid box hashed signed archive.
Archive Detection (reader.rs):
manifest_type() checks for archive metadata assertion first using Metadata type
Falls back to heuristics for old archives (double binding, presence of ingredients)
manifest_into_builder() strips the archive metadata assertion when restoring (no longer an archive)
Three-Way Conversion Logic (reader.rs, builder.rs):
Reader.into_builder(): ArchivedBuilder → restores builder state, ArchivedIngredient → extracts ingredient, SignedManifest → converts to parent ingredient
Builder.add_ingredient_from_reader(): handles all three types appropriately
Archive metadata naturally removed when converting to ingredient (ingredient archives only contain the ingredient, not the parent manifest)
New API (builder.rs):
Added Builder.to_ingredient_archive() for creating ingredient archives with validation

…active manifest - it doesn't have to be the parent ingrdient anymore.
Adds intelligent detection and handling of three distinct archive types:
builder archives, archived ingredients, and regular C2PA manifests.

Changes:
- Add archive type marker to builder archives via ClaimGeneratorInfo
- Implement ArchiveType enum and detection logic in Reader
- Update into_builder() to handle three conversion paths:
  1. Builder archives: restore original ManifestDefinition
  2. Archived ingredients: create new builder with ingredient
  3. Regular manifests: convert manifest to parent ingredient
- Simplify add_ingredient_from_reader() to leverage into_builder()
- Add comprehensive tests (currently ignored due to timestamp issues)

All existing tests pass. Changes are backward compatible.
…ilder for both cases

The restore_builder_from_archive method was fundamentally broken because it
tried to directly restore ingredients without rebuilding their manifest_data.

When Builder::to_archive() creates an archive, it:
1. Converts the builder to a unified Claim via to_claim()
2. Flattens all ingredient manifests into one Store
3. Ingredients reference claims within that unified store

Therefore, both builder archives AND regular manifests need the same treatment:
- Extract each ingredient's claim from the unified store
- Recursively collect nested ingredient claims
- Rebuild and embed as binary JUMBF manifest_data

The manifest_into_builder() method already does this correctly, so we now
use it for both ArchiveType::Builder and ArchiveType::Manifest.

Removed 49 lines of broken code, simplified the logic.
…r manifests

For ArchiveType::Manifest (regular signed C2PA assets), the correct behavior
is to convert the ENTIRE reader into ONE parent ingredient in a new builder,
not to reconstruct the manifest's internal structure.

Added reader_to_parent_ingredient() which:
- Creates a new parent Ingredient from the manifest metadata
- Embeds the entire manifest store as manifest_data
- Returns it for use in a fresh builder

Now the three paths are clear:
1. Builder Archive -> Restore builder (rebuild ingredient stores from unified store)
2. Archived Ingredient -> Extract pre-packaged ingredient
3. Regular Manifest -> Convert entire reader to parent ingredient

All tests pass.
The enum name ArchiveType was misleading since one of the variants
(Manifest) represents a regular signed asset, not an archive.

Renamed to ManifestType with clearer variant names:
- Builder -> ArchivedBuilder
- Ingredient -> ArchivedIngredient
- Manifest -> SignedManifest

Also renamed archive_type() method to manifest_type().

This better reflects that we're identifying the TYPE of manifest data
in the Reader, not just archive formats.
Builder archives created before the com.contentauth.archive_type marker
was added can still be detected by their characteristic signature:
- They have a BoxHash assertion
- The BoxHash contains only a "c2pa" box entry (no media boxes)

This fallback detection ensures existing builder archives continue to work
correctly with the new three-way conversion logic.
Updated manifest_type() detection logic to properly identify:

1. ArchivedBuilder (new): Has com.contentauth.archive_type marker
2. ArchivedBuilder (old): Has format=application/c2pa + DataHash + BoxHash(c2pa only)
3. ArchivedIngredient: Has format=application/c2pa + BoxHash(c2pa only) + NO DataHash
4. SignedManifest: Everything else (default)

Key insight: Builder archives have placeholder signatures (DataHash),
while archived ingredients are fully signed (no DataHash).

For now, SignedManifest continues to use manifest_into_builder() for
backward compatibility. The reader_to_parent_ingredient() method is
available for future use when we want to treat entire manifests as
single parent ingredients.

All tests pass.
Previously, working_store_sign() created archives with BOTH BoxHash and
DataHash assertions, which violates C2PA spec (two hard bindings).

Changes:
- With signer: Use BoxHash ONLY via get_box_hashed_embeddable_manifest()
- Without signer: Use DataHash ONLY via get_data_hashed_manifest_placeholder()
- Add org.contentauth.archive_type marker for new archives
- Simplify manifest_type() detection logic
- Remove dependency on format field for detection

New archives have only one hard binding (C2PA compliant).
Old archives with double bindings can still be detected for backward compatibility.
@gpeacock gpeacock marked this pull request as draft February 4, 2026 00:08
When converting a Reader to a parent ingredient, preserve the validation
status and results from the reader.

Changes:
- Add pub(crate) setters for validation_status and validation_results to Ingredient
- Update reader_to_parent_ingredient to copy validation data to the ingredient

This ensures validation errors are preserved when a reader is converted
to an ingredient in a new builder.
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 4, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing gpeacock/archive_handling (2fa2e43) with main (2f33f6c)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 39.75155% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.01%. Comparing base (2f33f6c) to head (2fa2e43).

Files with missing lines Patch % Lines
sdk/src/reader.rs 46.23% 50 Missing ⚠️
sdk/src/builder.rs 35.00% 39 Missing ⚠️
sdk/src/ingredient.rs 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1813      +/-   ##
==========================================
- Coverage   76.22%   76.01%   -0.21%     
==========================================
  Files         170      170              
  Lines       39416    39522     +106     
==========================================
- Hits        30045    30044       -1     
- Misses       9371     9478     +107     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Changes:
- Update into_builder() to use reader_to_parent_ingredient for SignedManifest
- Update add_ingredient_from_reader() to detect ManifestType and handle each case:
  - ArchivedIngredient: Extract the first ingredient
  - SignedManifest: Convert to parent ingredient with validation results
  - ArchivedBuilder: Return error (not supported)
- Make reader_to_parent_ingredient pub(crate) and non-consuming (&self)
- Removes all dead code warnings

This ensures validation results are preserved when converting readers to ingredients.
Previously, this method called into_builder() which would incorrectly handle
SignedManifest by creating a nested structure.

Changes:
- Detect ManifestType like add_ingredient_from_reader does
- ArchivedIngredient: Extract the ingredient
- SignedManifest: Convert to parent ingredient with validation
- ArchivedBuilder: Return error (not supported)

This ensures add_ingredient_from_stream behaves correctly for all manifest types.
…chives and archive types. This is stripped out when using archived data.
/// # Errors
/// * Returns an [`Error`] if the builder doesn't have exactly one ingredient.
/// * Returns an [`Error`] if the archive cannot be written.
pub fn to_ingredient_archive(&mut self, stream: impl Write + Seek) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the information in the enclosed Builder? I think this would make more sense if it was Builder::to_ingredient_archive(ingredient_json, stream), and it archived every ingredient already inside the builder.

/// * `reader` - The Reader to get the ingredient from (consumed).
/// # Returns
/// * A reference to the added ingredient.
fn add_ingredient_from_reader_owned(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to take an owned Reader since it borrows all its fields?

fn working_store_sign(&mut self, archive_type: &str) -> Result<Vec<u8>> {
// Add archive metadata assertion to mark this as an archive
let archive_metadata = serde_json::json!({
"@context": { "archive": "http://contentauth.org/archive/1.0/" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TBD?

Comment on lines +2500 to +2501
/// If a signer is available in the context, it uses box hash signing
/// Otherwise, it falls back to a DataHash placeholder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why box/data hash depending on signer availability? I thought working stores are unsigned? Also, is there anything special that's done for box hash with application/c2pa?

Ok(())
}

// TODO: These tests require additional network-free test infrastructure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we disable timestamping for this test?

Comment on lines +1261 to +1268
ManifestType::SignedManifest => {
// Regular manifest: convert the entire reader to a parent ingredient
let ingredient = self.reader_to_parent_ingredient()?;
let mut builder = crate::Builder::from_shared_context(&self.context);
builder.add_ingredient(ingredient);
Ok(builder)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert the parent to a signed manifest and not the entire thing as a a new ingredient? The latter is what I'd naturally expect.


/// Converts the entire Reader into a single parent Ingredient.
/// The entire manifest store is embedded as manifest_data.
pub(crate) fn reader_to_parent_ingredient(&self) -> Result<Ingredient> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to implement From<Reader> for Ingredient and consume an owned Reader since we are cloning all its fields anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants