Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does this work for video authentication? #16

Open
mishmosh opened this issue Dec 4, 2024 · 10 comments
Open

How does this work for video authentication? #16

mishmosh opened this issue Dec 4, 2024 · 10 comments

Comments

@mishmosh
Copy link
Contributor

mishmosh commented Dec 4, 2024

@benhylau from Starling Labs & @eqtylab mentioned this use case in a video call in September:

  • I want to use one implementation to generate a CID for a given video testimony, and publish that CID onchain
  • Next year, I want to fetch that video and use another implementation to generate a CID and make sure it's the same CID as the one from the public chain

@darobin thinks video and other large files can be done with a folder structure where the first block contains metadata (not only metadata about the ingested file, but any pre-processing steps specific to video or other unique formats).

@benhylau Does that work? Do we need more details?

@darobin
Copy link
Owner

darobin commented Dec 5, 2024

I would separate these two cases out.

If the video is large and all you want is to make it verifiable, then I would not split it at all. I would just use the BLAKE3 and call it a day.

If you need to split the video because of delivery constraints (like you want to stream it as DASH) then you should split it and yes I think it's possible to agree on a metadata wrapper that describes the split (so it's reproducible). I think that dCBOR42+CAR is all you need for that, along with an agreed metadata vocabulary.

@mishmosh
Copy link
Contributor Author

mishmosh commented Dec 5, 2024

We should confirm this, but I remember some domain experts saying it's typical to want to archive multiple versions of the same media, such as high-res, low-res, and raw.

@bumblefudge
Copy link
Collaborator

bumblefudge commented Dec 6, 2024

How many layers of indirection/wrapping are tolerable? The "status quo" way of doing this would be a CAR file for the "folder" of the various "equivalent" files, and a "content credential" for each containing metadata, plus the DASL and/or chunked CID. I'm tempted to imagine a richer CAR file or richer content credential before mushing them into one credential-containing CAR file.

It's also worth nothing that content credentials currently use a UCAN-y syntax for expressing "data equivalence" (these two CIDs point to the same bytes) but not "content equivalence" (this is the same 34 seconds of footage encoded two different ways or at 2 different resolutions). For the latter, a distance hash like ISCC might be needed. It bears mention that ISCC is already a concrete data model for metadata with proximate content equivalence built in, while we're collecting prior art/building-block options.

@2color
Copy link

2color commented Dec 6, 2024

If you need to split the video because of delivery constraints (like you want to stream it as DASH) then you should split it and yes I think it's possible to agree on a metadata wrapper that describes the split (so it's reproducible). I think that dCBOR42+CAR is all you need for that, along with an agreed metadata vocabulary.

In this instance, how are the individual DASH video chunks addressed and encoded? Just binary with a raw BLAKE3 hash (which can easily packed into a CAR) and then linked from the dCBOR42 metadata objects?

@benhylau
Copy link

If the video is large and all you want is to make it verifiable, then I would not split it at all. I would just use the BLAKE3 and call it a day.

Yes that would be my expectation for files like mp4 and mov.

If you need to split the video because of delivery constraints (like you want to stream it as DASH) then you should split it and yes I think it's possible to agree on a metadata wrapper that describes the split (so it's reproducible).

For DASH and HLS (e.g. .m3u8 manifest + .ts chunks), there would be a directory of files. Each of these files need to be hashed into a CID, and we have to preserve the directory structure, so it can be recreated.

archive multiple versions of the same media, such as high-res, low-res, and raw.

The directory can contain different bit rates, for example:

manifest.m3u8
hi/
  001-hi.ts
  002-hi.ts
low/
  001-low.ts
  002-low.ts

Based on the current spec, how would I represent this directory of files? I assume a Blake3 hash of each file (i.e. .m3u8, .ts). Would I keep the file structure in a dCBOR42 metadata object?

@darobin
Copy link
Owner

darobin commented Dec 19, 2024

The best option is IMHO dCBOR42 metadata. It's not specified yet but I'm working on a metadata spec that I think would work for you. Here's my thinking, it would be great to hear if you think that's right or not:

Not Directories

I don't think that directories are the right mechanism. That's how you would store them on a file system but we're dealing with URLs and resources, FS abstractions get weird very fast.

So the metadata would me more something like:

resources: {
    /manifest.m3u8: metadata-object
    /hi/001-hi.ts: metadata-object
    /hi/002-hi.ts: metadata-object
    /low/001-low.ts: metadata-object
    /low/002-low.ts: metadata-object
}

This makes pathing trivial (it's just a look up) and it avoids indirection through multiple directories. It corresponds a lot better to how resources work on the web. This is the model used in tiles, it's adapted to "this is a container of a bunch of related but content-addressed things" cases.

HTTP Metadata

The metadata-object above is something that should for the most part be HTTP headers in CBOR (possibly using their binary expression, still looking into it). This has the advantage of building on something successful that works and that we don't need to specify. It means you could then use:

  • Content-Length to indicate the size of the chunks in the metadata — this makes it much easier to reproduce and verify the payload.
  • Content-Type to flag the media type, which is required for anything that interacts with a browser and also can be used to capture codec information.
  • Content-Location that's a Tag42 link to the CID. (Or we make our own if we want it distinct, like dasl-src.)
  • Content-Encoding to know whether the resource was compressed before the CID was minted.

Magic?

One thing I don't know is if these metadata objects should have a way of distinguishing them from other dCBOR42 content or if duck typing is enough. We could have them have a field like $type: masl-manifest or $type: masl-meta. (MASL is Metadata for Arbitrary Structure and Links, the name of the spec-in-prep.)

@2color
Copy link

2color commented Dec 23, 2024

That's an interesting idea. Would the metadata also be hashed and addressed by CID?

@lidel
Copy link

lidel commented Dec 24, 2024

@2color Likely. Either by being part of root CID, or a separate CID referenced from root CID. In both cases, metadata is part of data, and impacts final root CID.

@darobin some late night thoughts on "directories in dCBOR42", happy to brainstorm more if useful:

Not Directories

I really like the general abstraction of "not directories" being a dCBOR map from tiles-style "opaque string paths" to Tag42 CIDs:

/index.html:       tag42-cid
/player.js:        tag42-cid
/manifest.m3u8:    tag42-cid
/hi/001-hi.ts:     tag42-cid
/hi/002-hi.ts:     tag42-cid
/low/001-low.ts:   tag42-cid
/low/002-low.ts:   tag42-cid

When processing request for /ipfs/cid/low/001-low.ts HTTP gateway could see the cid is CBOR, then check for a key with value equal to the path remainder /low/001-low.ts and find tag42-cid mapping in the CBOR document. If such key-value exists, HTTP server is returning payload behind the CID.
Composes nicely onto web hosting based on existing IPFS gateways, we could bring this to the legacy stack (boxo/helia) once specs are finalized.

As for metadata, we probably want to be careful to not bring overly complex schemas and syntax into this (learning from IPLD adoption/interop), and perhaps not leak HTTP-only metadata into something that aims to be simpler alternative to UnixFS directory.

Personally like idea of extremely simplified CBOR mode where HTTP gateways allow for following every opaque string that directly points at CIDTag42, and metadata object is optional:

/file:      tag42-cid || metadata-object

HTTP Metadata

If useful, including HTTP headers in metadata was previously requested/discussed in ipfs/specs#257 (comment). At the time the idea was to use separate _headers file to match _redirects, but TLDR useful takeaway is that allowing arbitrary headers is likely a security/interop footgun. Implementers will end up hardcoding limited allowlist anyway, so its better to be explicit in the spec with small set of safe and generic headers.

With that lens:

  • Content-Type and Content-Length are both really good and generally useful (and both fix issues with existing UnixFS)
  • Content-Encoding may be out of scope: brings complexity and mixes abstractions, likely harming interop

ps. On DASH / HLS

Representing video as DASH/HLS is effectively doing custom chunking in userland. It does not solve "cid reproducibility". It is the same as UnixFS with custom chunker – you need both file and tools to get to the same CID, file alone is not enough.

Still, it is a nice way of dealing with big files until DASL addresses how to seek in a 4GiB video :P
(only half-joking, maybe my above example where DASH chunks are bundled together with index.html player is our version of real politik, and the way to go?)

@achingbrain
Copy link

This thread is heading towards files/directories and I am reminded a little of our attempts to land UnixFSv2. It's probably worth looking through old issues on the original repo and the IPLD specs repo to see what was once important.

Some thoughts, not all coherent:

CBOR

I think we like CBOR because it's simple and schema-free like JSON but also concise on the wire and gives us niceties like embedded CIDs.

That said we don't really use CBOR, we use DAG-CBOR which is a subset of CBOR that makes the encoding deterministic (which you want, otherwise you get different CIDs).

It's not perfect - DAG-CBOR says you have to encode a number in it's smallest representation but then the original number type is erased, so if you encode a BigInt that is within Number range, it'll decode as a Number.

Numbers and BigInts are incompatible in JavaScript (trying to add them together will throw), so when used in anger you need to add a schema or a post-processing step to convert any values to the expected types - we may even be heading in that sort of direction with the $type: masl-manifest suggestion.

One of the CBOR tags that DAG-CBOR explicitly doesn't use is 2 - bignum which could solve this problem. I'm not sure why this was not included in the first place, perhaps because most sane programming environments would coerce a 32bit number to a 64bit value when multiplying by another 64bit number - maybe we should relax that requirement?

FWIW Protocol Buffers does not suffer from this problem, and has the advantage of not encoding object key literals so the message sizes are smaller still. Map support is there so you can have arbitrary extensibility if you want it.

I think pb has a bad name with this crowd having been associated with the leaky abstraction/best decision at the time dag-pb/UnixFSv1-metadata thing - I do wonder if we'd look on it more favourably if we'd just combined the DAG-layout/UnixFS-metadata into one structure instead of having one embedded inside the other as is being proposed here.

Mono-codec CIDs

I think having everything be DAG-CBOR would be good for future use-cases we haven't thought of, but structures produced for those use-cases won't necessarily be backwards compatible (for example different file/dir layout object shapes or metadata fields in general, etc), especially if we punt conventions on directories/sharding/chunking/etc to userland.

Part of the joy of the CID codec field is that it tells us how to interpret the block data, if we have only DAG-CBOR that's great for extensibility and flexibility but we push the problem into userland and suddenly it's everyone's problem.

That's not to say we can't combine the two - dag-jose has it's own codec but the block data is DAG-CBOR. A schema is sort of implicit here, in that the dag-jose codec lets you infer what shape the DAG-CBOR object should be when decoded.

Embedding HTTP Headers

This is interesting. Some thoughts:

  1. Encoding header names in the DAG-CBOR will make for bigger messages
  2. Defining canonical casing and ordering will be necessary, otherwise different CIDs will be generated
  3. If we use a small set of safe/generic headers, someone needs to be the arbiter of what those are
  4. If we encounter extra headers are these ignored? Do we reject the message?
  5. Do these need to be HTTP headers? Or can they be generic metadata that is easy to convert into HTTP headers?

Since we can add arbitrary fields to DAG-CBOR I'm guessing we wouldn't want to just blurt the headers into an HTTP response, so we'd need to do some processing anyway, either via schema validation or userland/application level.

Metadata+file data vs embedded data

If I had some file data with some metadata, I could embed it in one structure like:

{
  "name": "foo.txt",
  "contents": "aGVsbG8gd29ybGQ="
}

Or I could decorate a CID that resolves to the contents like:

{
  "name": "foo.txt",
  "contents": CID("bafyfoo..")
}

This makes the content block de-dupable but at the cost of more wantlist entries and network requests.

In the past this was too high a price to pay. I don't know if it still is now we have things like CAR files and bitswap sessions.

Directories

I like the simplicity of:

{
  "resources": {
    "/manifest.m3u8": tag42-cid || metadata-object
    "/hi/001-hi.ts": tag42-cid || metadata-object
    "/hi/002-hi.ts": tag42-cid || metadata-object
    "/low/001-low.ts": tag42-cid || metadata-object
    "/low/002-low.ts": tag42-cid || metadata-object
  }
}

..but I don't know how I would represent Wikipedia with this, or someone's flat directory with 5m NFTs all with absurdly long filenames.

Perhaps something like:

{
  "resources": CID("bafy-DASL-HAMT")
}

@bumblefudge
Copy link
Collaborator

..but I don't know how I would represent Wikipedia with this, or someone's flat directory with 5m NFTs all with absurdly long filenames.

Just chatted with Adin about this today, actually, I think it makes sense to have (at least as an extension, not a requirement for all RASL/DASL/MASL implementations to be able to parse) a non-flat structure for, as you say, huge maps that need something more efficient than seeking, and if we're already making a non-flat manifest option, maybe one that interops with or onramps to bitswap/amino/etc deserves a little extra consideration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants