Skip to content

feat: Add support for getting embeddable manifests with CAWG/Dynamic assertions#1847

Draft
gpeacock wants to merge 17 commits intomainfrom
gpeacock/cawg_embeddable
Draft

feat: Add support for getting embeddable manifests with CAWG/Dynamic assertions#1847
gpeacock wants to merge 17 commits intomainfrom
gpeacock/cawg_embeddable

Conversation

@gpeacock
Copy link
Collaborator

The existing Builder embeddable apis did not support Context and were not working with dynamic assertions so they do not support CAWG.
This PR changes that by adding some new, more hash agnostic, methods.
They change the model by requiring the caller to first add a hard binding placeholder to the Builder
then getting a placeholder (works for whatever hard binding was added but requires there already be one)
then updating the hash assertion in the Builder
and then passing the placeholder back in to be updated with the final hard binding and signed.

pub fn placeholder(&self, format: &str) -> Result<Vec<u8>>

pub fn sign_placeholder(&mut self, placeholder_jumbf: &[u8], format: &str) -> Result<Vec<u8>>

Workflow

  1. Add a placeholder hash assertion to the Builder with proper sizing
  2. Call this method to create a placeholder manifest
  3. Embed the placeholder into your asset
  4. Calculate the hash of the asset (excluding the placeholder)
  5. Update the hash assertion in the Builder: builder.add_assertion(DataHash::LABEL, &updated_hash)
  6. Call [Builder::sign_placeholder] to sign the manifest

Returns

  • The raw JUMBF bytes of the c2pa_manifest placeholder.

This adds a new builder.placeholder() method.
This allows us to preseve the dynamic assertions in the placeholder while leveraging a hard binding in the builder.
The dev need manages adding the hard binding assertion and updating it.
@gpeacock gpeacock marked this pull request as draft February 11, 2026 17:50
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 58.49315% with 303 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.58%. Comparing base (bc3f735) to head (84a9557).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
sdk/src/assertions/bmff_hash.rs 53.55% 98 Missing ⚠️
c2pa_c_ffi/src/c_api.rs 0.00% 92 Missing ⚠️
sdk/src/store.rs 47.44% 72 Missing ⚠️
sdk/src/builder.rs 89.54% 23 Missing ⚠️
sdk/src/utils/hash_utils.rs 0.00% 13 Missing ⚠️
sdk/src/claim.rs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1847      +/-   ##
==========================================
+ Coverage   76.56%   76.58%   +0.02%     
==========================================
  Files         170      172       +2     
  Lines       39514    40461     +947     
==========================================
+ Hits        30255    30989     +734     
- Misses       9259     9472     +213     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 11, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing gpeacock/cawg_embeddable (84a9557) with main (85b29f1)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment on lines 1964 to 1968
pub fn placeholder(&self, format: &str) -> Result<Vec<u8>> {
// Check that a hash assertion exists before proceeding
let has_data_hash = self.find_assertion::<DataHash>(DataHash::LABEL).is_ok();
let has_bmff_hash = self.find_assertion::<BmffHash>(BmffHash::LABEL).is_ok();

Copy link
Contributor

@ok-nick ok-nick Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function take a placeholder hash binding assertion as a parameter so we force them to think about it at compile time?

Copy link
Collaborator Author

@gpeacock gpeacock Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require the user to know how to generate a hash binding placeholder, and know which one to create. That's most of what this method is for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation, doesn't the user create the hash binding placeholder anyways and add it to the builder?

/// large enough to hold the final hash (e.g., with enough exclusions).
///
/// # Workflow
/// 1. Add a placeholder hash assertion to the Builder with proper sizing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need to do this for them right now, but it would be very helpful if we had methods on our hash bindings that can calculate the placeholder hash sizes given a stream.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we could, but that means another pass through the asset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move to the asset_io model, we can preserve the asset structure and reuse it, avoiding extra passes. This is more or less a stopgap until we get there.

Comment on lines 1988 to 1990
/// 3. Calculate the hash of the asset (excluding the placeholder)
/// 4. Update the hash assertion in the Builder: `builder.add_assertion(DataHash::LABEL, &updated_hash)`
/// 5. Call this method to sign: `builder.sign_placeholder(&placeholder, signer, format)`
Copy link
Contributor

@ok-nick ok-nick Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this approach and maybe we can change the API to "force" users into doing the right thing at compile time.

Here's what I'm thinking: what if we made Builder::placeholder return a Placeholder struct and the context gets passed along to it from the builder. Then, we can have a method like Placeholder::sign(&self, format, updated_hard_binding_assertion). In this approach, users are forced to do the correct thing and there isn't much room for mistakes.

There can also be a method such as Placeholder::as_bytes so it can be embedded in the asset for calculating offsets. Although if it's a box hash, you can completely skip that step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you mean. Let me think about that. I'm not eager to introduce more structures to the C_api but this may make more sense than extending the builder.

/// * Returns an [`Error`] if the placeholder cannot be signed.
pub fn sign_manifest(&mut self, signer: &dyn Signer, settings: &Settings) -> Result<Vec<u8>> {
let pc = self.provenance_claim_mut().ok_or(Error::ClaimEncoding)?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function used anywhere?

gpeacock and others added 5 commits February 12, 2026 08:04
added builder.update_hash_from_stream() to handle generating the correct hash values
Implement the Builder placeholder/update_hash/sign_placeholder APIs for
BMFF (MP4) assets using BmffHash v3 assertions with path-based exclusions.

- BmffHash: add set_default_exclusions, add_place_holder_hash,
  gen_hash_from_stream, add_merkle_map_for_mdats, add_exclusions
- Store: add get_bmff_hashed_manifest_placeholder and
  get_bmff_hashed_embeddable_manifest (sync + async)
- Builder: wire up BMFF branch in placeholder() and
  update_hash_from_stream(); fix bmff_version restoration after
  CBOR round-trip in both update_hash_from_stream and sign_placeholder
- BmffIO: implement ComposedManifestRef to wrap raw JUMBF in a C2PA
  UUID box
- Hasher: add finalize_reset() and new(alg) constructors
- Claim: add data_hash_assertions() helper
- test.rs: add write_bmff_placeholder_stream utility and
  NO_MANIFEST_MP4 fixture; add test_bmff_hashed_placeholder_workflow_complete

Co-authored-by: Cursor <[email protected]>
sdk/src/assertions/bmff_hash.rs
Add MAX_MDAT_BOXES = 4 constant
New add_merkle_placeholder(chunk_size_kb) — pre-allocates 4 root-only Merkle map slots with a u32::MAX sentinel count so the placeholder CBOR is never undersized
New add_mdat_leaf_hashes(leaf_hashes, chunk_size_kb) — builds Merkle trees from client-supplied chunk hashes, stores only the root (no UUID proof boxes)
New root-only verification path in validate_merkle_maps_mdat_boxes — rebuilds the full tree from the file and compares the computed root when only one hash is stored
sdk/src/asset_handlers/bmff_io.rs
read_bmff_c2pa_boxes signature made generic (R: Read + Seek + ?Sized) instead of &mut dyn CAIRead
sdk/src/builder.rs
Builder::placeholder — auto-adds Merkle maps when core.merkle_tree_chunk_size_in_kb is set in settings (no pre-call needed)
Builder::set_bmff_mdat_hashes(leaf_hashes) — reads chunk size from settings, returns Result<()>; replaces old 3-param version that returned UUID box bytes
Removed Builder::set_bmff_merkle_hint
Builder::sign_placeholder — zero-pads signed JUMBF to placeholder size when smaller (per C2PA spec: trailing bytes in an MP4 box are ignored)
Updated doc comments; added BMFF workflow steps to sign_placeholder docs
New test test_bmff_mdat_hashed_placeholder_workflow_complete
@tmathern
Copy link
Contributor

Ask: They change the model: does it mean this will be a breaking change compared to previous way of doing things?

Introduces Builder methods for workflows where the caller controls asset reading and writing:
placeholder(format) — returns composed manifest bytes to embed; stores JUMBF length for size-matching at sign time
set_data_hash_exclusions(ranges) — registers where the placeholder was embedded; replaces the dummy DataHash exclusions from placeholder() with real byte ranges
set_bmff_mdat_hashes(leaf_hashes) — provides externally-computed mdat Merkle leaf hashes
update_hash_from_stream(stream) — computes and stores the asset hash; algorithm derived from the existing hard binding assertion → ManifestDefinition::alg → "sha256"
sign_embeddable(format) — replaces sign_placeholder; Mode 1 (after placeholder) pads the signed JUMBF to the pre-committed size for in-place patching; Mode 2 (direct binding) signs and returns without padding
This sets the hash algorithm at the definition level, propagated to the claim and auto-created hash assertions.
example.
Note: we can't use alg because it conflicts with c2patools use of alg for the signer.
…from_stream

Add BuilderSettings::prefer_box_hash (default false): when enabled and the asset format supports BoxHash (JPEG, PNG, GIF), use chunk-based hashing instead of DataHash, eliminating the need for a placeholder
Add Builder::needs_placeholder(format): returns false for BoxHash-capable formats when prefer_box_hash is set, or when a BoxHash assertion is already present
Builder::placeholder(): skips auto-creating a DataHash for formats that don't need one under prefer_box_hash
Builder::update_hash_from_stream(format, stream): adds format parameter; adds BoxHash path that calls BoxHash::generate_box_hash_from_stream when a BoxHash assertion is present or auto-created; bound updated to R: Read + Seek + MaybeSend
BoxHash::generate_box_hash_from_stream: made generic over R: Read + Seek + MaybeSend (was &mut dyn CAIRead); removes #[allow(dead_code)]
ManifestDefinition::alg renamed to hash_alg to avoid collision with signer signature algorithm in flattened JSON configs; all references updated
test_all_setting: moves prefer_box_hash from [Core] to [Builder] section
The test covers four new code paths:
needs_placeholder("image/jpeg") returns false when prefer_box_hash=true (BoxHash-capable format)
needs_placeholder("video/mp4") returns true regardless (BMFF always requires a placeholder)
update_hash_from_stream("image/jpeg", stream) auto-creates a BoxHash assertion with computed hashes for every non-excluded segment
sign_embeddable("image/jpeg") (Mode 2 — no prior placeholder()) returns non-empty composed bytes
placeholder("image/jpeg") returns empty bytes when the format doesn't need one (the new early-return path), and no DataHash/BmffHash/BoxHash is auto-created
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments