Skip to content

Commit

Permalink
rfc: rfcBBL209 stay strictly trustless for now.
Browse files Browse the repository at this point in the history
Rather than build off protocol#25, I think it better to do all the parts prior
to protocol#25, so we have less complexity to wrap our heads around when
evaluating the security properties etc.

Generalizing as done with protocol#25 I agree is a good idea, but can always be
done as a follow-up step.
  • Loading branch information
Ericson2314 committed Jan 17, 2021
1 parent 8719471 commit fd84635
Showing 1 changed file with 68 additions and 14 deletions.
82 changes: 68 additions & 14 deletions RFC/rfcBBL209/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ MD pseudo-code looks roughly like:
func Hash(D []byte) []byte {
pieces = getChunks(D)

var S state
var St state
var Sz uint64 = 0
for i, p := range pieces {
S = process(S, p) // Call this result S_i
St = process(S, p) // Call this result S_i
Sz += size(p)
}

return finalize(S) // Call this H, the final hash
return finalize(St, Sz) // Call this H, the final hash
}
```

Expand All @@ -38,6 +40,23 @@ From the above we can see that:

The implication for Bitswap is that if each piece size is not more than 1MiB then we can send the file **backwards** in 1MiB increments. In particular a server can send `(S_n-2, P_n-1)` and the client can use that to compute that `P_n-1` is in fact the last part of the data associated with the final hash `H`. The server can then send `(S_n-3, P_n-2)` and the client can calculate that `P_n-2` is the last block of `S_n-2` and therefore also the second to last block of `H`, and so on.

N.B. A helpful analogy might be linked lists.
A Merkle–Damgård hash is a (highly imbalanced) Merkle tree in the same sense that a linked list is a (highly imbalanced) tree.
The only caveat is that each node is a "freestart" hash that begins with the parent hash rather than the normal fixed initial state, hashing the parent hash like data.

#### Statelessness

By finalizing the intermediate hashes, we can make "genuine" requests for the prefixes of data.
This makes the reverse-streaming protocol a bit less of a special case.

For example, imagine if whenever a large file was added, every n MiB prefix was also indecently added.
Then imagine that the response of the first request of the tail of the file is the remainder:
instead of being a whole 1 MiB, it is just enough to take us back to the MiB boundary.
Now, every subsequent request is a for the tail of one of those separately-added prefix files.

Of course in practice, we would not want to do something naive as separately storing prefixes, wasting quadratic space.
But maybe storing all the (finalized) MiB-boundary hashes would be OK (merely linear space).

#### Extension

This scheme requires linearly downloading a file which is quite slow with even modest latencies. However, utilizing a scheme like [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) (i.e. downloading metadata manifests up front) we can make this fast/parallelizable
Expand All @@ -54,21 +73,53 @@ While SHA-3 is not a Merkle–Damgård construction it follows the same psuedoco

In tree constructions we are not restricted to downloading the file backwards and can instead download the parts of the file the we are looking for, which includes downloading the file forwards for sequential streaming.

There is detail about how to do this for Blake3 in the [Blake3 paper](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) section 6.4, Verified Streaming
There is detail about how to do this for Blake3 in the [Blake3 paper](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) section 6.4, Verified Streaming.freestart
Note that the merkle tree is more legitimate in this case, because there is nothing like the "freestart" caveat that may weaken the security of tail blocks for Merkle–Damgård construction hashes.
Also note that because of the lack of a free start their is less associativity:
whereas the chunking size doesn't matter for SHA-1 construction, it does for Blake3.
However, there is still finalization in that only the root note is hashed with the `ROOT` flag set.

### Implementation Plan

#### Bitswap changes

* When a server responds to a request for a block if the block is too large then instead send a traversal order list of the block as defined by the particular hash function used (e.g. linear and backwards for SHA-1,2,3)
* Large Manifests
* If the list is more than 1MiB long then only send the first 1MiB along with an indicator that the manifest is not complete
* When the client is ready to process more of the manifest then it can send a request WANT_LARGE_BLOCK_MANIFEST containing the multihash of the entire large block and the last hash in the manifest
* When requesting subblocks send requests as `(full block multihash, start index, end index)`
* process subblock responses separately from full block responses verifying the results as they come in
* As in [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) specify how much trust goes into a given manifest, examples include
* download at most 20 unverified blocks at a time from a given manifest
* grow trust geometrically (e.g. 10 blocks, then if those are good 20, 40, ...)
##### Merkle–Damgård

Let `CHUNK_SIZE` be some constant such that no response overflows the bitswap limit.

* When a server responds to a request for a block, if the block is too large then instead send the final block which will hash to the requested hash: `(S_n-1, P_n, total_size)`.
The length of a the block be such that the remainder is can be split into an exact number whole chunks:
```golang
(total_size - size(P_n)) % CHUNK_SIZE == 0
```

* The client verifies the final hash:
```golang
S_n == finanlize(process(S_n-1, P_n), total_size)
```

* The client can request the previous chunk with a finalized hash computed by:
```golang
S_n-1 = finanlize(S_n-1, total_size - size(P_n))
```
Unlike with the original request, the client now has an expected size of the remainder which it can verify against the server's response.

##### Blake3

* When a server responds to a request for a block, if the block is too large then instead send the final chunk and the size.

* The client can verify the block.

* Using the size, calculate what the sizes of the children should be, or whether in fact the block is a parent block.
The formula is given in the paper.

* If the block is a parent block the client may request the children:

* Simply take the first or second 32 bytes to have a new hash.

* However, use a different multihash to account for `ROOT=false`.

* Verify the size of the response against the calculated subtree size.

#### Datastore

Expand All @@ -89,7 +140,10 @@ There is detail about how to do this for Blake3 in the [Blake3 paper](https://gi
## Prior Work

* This proposal is almost identical to the one @Stebalien proposed [here](https://discuss.ipfs.io/t/git-on-ipfs-links-and-references/730/6)
* Utilizes overlapping principles with [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25)

## Future work

To trade trustless for less sequentiality, utilize [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) to requesting the children of yet-unfetched chunks.

### Alternatives

Expand Down

0 comments on commit fd84635

Please sign in to comment.