feat: Add support for getting embeddable manifests with CAWG/Dynamic assertions#1847
feat: Add support for getting embeddable manifests with CAWG/Dynamic assertions#1847
Conversation
This adds a new builder.placeholder() method.
This allows us to preseve the dynamic assertions in the placeholder while leveraging a hard binding in the builder. The dev need manages adding the hard binding assertion and updating it.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1847 +/- ##
==========================================
+ Coverage 76.56% 76.58% +0.02%
==========================================
Files 170 172 +2
Lines 39514 40461 +947
==========================================
+ Hits 30255 30989 +734
- Misses 9259 9472 +213 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will not alter performance
Comparing Footnotes
|
sdk/src/builder.rs
Outdated
| pub fn placeholder(&self, format: &str) -> Result<Vec<u8>> { | ||
| // Check that a hash assertion exists before proceeding | ||
| let has_data_hash = self.find_assertion::<DataHash>(DataHash::LABEL).is_ok(); | ||
| let has_bmff_hash = self.find_assertion::<BmffHash>(BmffHash::LABEL).is_ok(); | ||
|
|
There was a problem hiding this comment.
Can this function take a placeholder hash binding assertion as a parameter so we force them to think about it at compile time?
There was a problem hiding this comment.
That would require the user to know how to generate a hash binding placeholder, and know which one to create. That's most of what this method is for.
There was a problem hiding this comment.
In this implementation, doesn't the user create the hash binding placeholder anyways and add it to the builder?
sdk/src/builder.rs
Outdated
| /// large enough to hold the final hash (e.g., with enough exclusions). | ||
| /// | ||
| /// # Workflow | ||
| /// 1. Add a placeholder hash assertion to the Builder with proper sizing |
There was a problem hiding this comment.
Maybe we don't need to do this for them right now, but it would be very helpful if we had methods on our hash bindings that can calculate the placeholder hash sizes given a stream.
There was a problem hiding this comment.
Yes we could, but that means another pass through the asset
There was a problem hiding this comment.
If we move to the asset_io model, we can preserve the asset structure and reuse it, avoiding extra passes. This is more or less a stopgap until we get there.
sdk/src/builder.rs
Outdated
| /// 3. Calculate the hash of the asset (excluding the placeholder) | ||
| /// 4. Update the hash assertion in the Builder: `builder.add_assertion(DataHash::LABEL, &updated_hash)` | ||
| /// 5. Call this method to sign: `builder.sign_placeholder(&placeholder, signer, format)` |
There was a problem hiding this comment.
I've been thinking about this approach and maybe we can change the API to "force" users into doing the right thing at compile time.
Here's what I'm thinking: what if we made Builder::placeholder return a Placeholder struct and the context gets passed along to it from the builder. Then, we can have a method like Placeholder::sign(&self, format, updated_hard_binding_assertion). In this approach, users are forced to do the correct thing and there isn't much room for mistakes.
There can also be a method such as Placeholder::as_bytes so it can be embedded in the asset for calculating offsets. Although if it's a box hash, you can completely skip that step.
There was a problem hiding this comment.
I think I see what you mean. Let me think about that. I'm not eager to introduce more structures to the C_api but this may make more sense than extending the builder.
| /// * Returns an [`Error`] if the placeholder cannot be signed. | ||
| pub fn sign_manifest(&mut self, signer: &dyn Signer, settings: &Settings) -> Result<Vec<u8>> { | ||
| let pc = self.provenance_claim_mut().ok_or(Error::ClaimEncoding)?; | ||
|
|
There was a problem hiding this comment.
Is this function used anywhere?
added builder.update_hash_from_stream() to handle generating the correct hash values
… to add bmff support.
Implement the Builder placeholder/update_hash/sign_placeholder APIs for BMFF (MP4) assets using BmffHash v3 assertions with path-based exclusions. - BmffHash: add set_default_exclusions, add_place_holder_hash, gen_hash_from_stream, add_merkle_map_for_mdats, add_exclusions - Store: add get_bmff_hashed_manifest_placeholder and get_bmff_hashed_embeddable_manifest (sync + async) - Builder: wire up BMFF branch in placeholder() and update_hash_from_stream(); fix bmff_version restoration after CBOR round-trip in both update_hash_from_stream and sign_placeholder - BmffIO: implement ComposedManifestRef to wrap raw JUMBF in a C2PA UUID box - Hasher: add finalize_reset() and new(alg) constructors - Claim: add data_hash_assertions() helper - test.rs: add write_bmff_placeholder_stream utility and NO_MANIFEST_MP4 fixture; add test_bmff_hashed_placeholder_workflow_complete Co-authored-by: Cursor <[email protected]>
sdk/src/assertions/bmff_hash.rs Add MAX_MDAT_BOXES = 4 constant New add_merkle_placeholder(chunk_size_kb) — pre-allocates 4 root-only Merkle map slots with a u32::MAX sentinel count so the placeholder CBOR is never undersized New add_mdat_leaf_hashes(leaf_hashes, chunk_size_kb) — builds Merkle trees from client-supplied chunk hashes, stores only the root (no UUID proof boxes) New root-only verification path in validate_merkle_maps_mdat_boxes — rebuilds the full tree from the file and compares the computed root when only one hash is stored sdk/src/asset_handlers/bmff_io.rs read_bmff_c2pa_boxes signature made generic (R: Read + Seek + ?Sized) instead of &mut dyn CAIRead sdk/src/builder.rs Builder::placeholder — auto-adds Merkle maps when core.merkle_tree_chunk_size_in_kb is set in settings (no pre-call needed) Builder::set_bmff_mdat_hashes(leaf_hashes) — reads chunk size from settings, returns Result<()>; replaces old 3-param version that returned UUID box bytes Removed Builder::set_bmff_merkle_hint Builder::sign_placeholder — zero-pads signed JUMBF to placeholder size when smaller (per C2PA spec: trailing bytes in an MP4 box are ignored) Updated doc comments; added BMFF workflow steps to sign_placeholder docs New test test_bmff_mdat_hashed_placeholder_workflow_complete
|
Ask: |
Introduces Builder methods for workflows where the caller controls asset reading and writing: placeholder(format) — returns composed manifest bytes to embed; stores JUMBF length for size-matching at sign time set_data_hash_exclusions(ranges) — registers where the placeholder was embedded; replaces the dummy DataHash exclusions from placeholder() with real byte ranges set_bmff_mdat_hashes(leaf_hashes) — provides externally-computed mdat Merkle leaf hashes update_hash_from_stream(stream) — computes and stores the asset hash; algorithm derived from the existing hard binding assertion → ManifestDefinition::alg → "sha256" sign_embeddable(format) — replaces sign_placeholder; Mode 1 (after placeholder) pads the signed JUMBF to the pre-committed size for in-place patching; Mode 2 (direct binding) signs and returns without padding
This sets the hash algorithm at the definition level, propagated to the claim and auto-created hash assertions. example. Note: we can't use alg because it conflicts with c2patools use of alg for the signer.
…from_stream Add BuilderSettings::prefer_box_hash (default false): when enabled and the asset format supports BoxHash (JPEG, PNG, GIF), use chunk-based hashing instead of DataHash, eliminating the need for a placeholder Add Builder::needs_placeholder(format): returns false for BoxHash-capable formats when prefer_box_hash is set, or when a BoxHash assertion is already present Builder::placeholder(): skips auto-creating a DataHash for formats that don't need one under prefer_box_hash Builder::update_hash_from_stream(format, stream): adds format parameter; adds BoxHash path that calls BoxHash::generate_box_hash_from_stream when a BoxHash assertion is present or auto-created; bound updated to R: Read + Seek + MaybeSend BoxHash::generate_box_hash_from_stream: made generic over R: Read + Seek + MaybeSend (was &mut dyn CAIRead); removes #[allow(dead_code)] ManifestDefinition::alg renamed to hash_alg to avoid collision with signer signature algorithm in flattened JSON configs; all references updated test_all_setting: moves prefer_box_hash from [Core] to [Builder] section
The test covers four new code paths:
needs_placeholder("image/jpeg") returns false when prefer_box_hash=true (BoxHash-capable format)
needs_placeholder("video/mp4") returns true regardless (BMFF always requires a placeholder)
update_hash_from_stream("image/jpeg", stream) auto-creates a BoxHash assertion with computed hashes for every non-excluded segment
sign_embeddable("image/jpeg") (Mode 2 — no prior placeholder()) returns non-empty composed bytes
placeholder("image/jpeg") returns empty bytes when the format doesn't need one (the new early-return path), and no DataHash/BmffHash/BoxHash is auto-created
The existing Builder embeddable apis did not support Context and were not working with dynamic assertions so they do not support CAWG.
This PR changes that by adding some new, more hash agnostic, methods.
They change the model by requiring the caller to first add a hard binding placeholder to the Builder
then getting a placeholder (works for whatever hard binding was added but requires there already be one)
then updating the hash assertion in the Builder
and then passing the placeholder back in to be updated with the final hard binding and signed.
pub fn placeholder(&self, format: &str) -> Result<Vec<u8>>pub fn sign_placeholder(&mut self, placeholder_jumbf: &[u8], format: &str) -> Result<Vec<u8>>Workflow
builder.add_assertion(DataHash::LABEL, &updated_hash)Builder::sign_placeholder] to sign the manifestReturns
c2pa_manifestplaceholder.