This document serves as an entry point for understanding all of the data structures in filecoin.
An address is an identifier that refers to an actor in the Filecoin state. All actors (miner actors, the storage market actor, account actors) have an address. An address encodes information about the network it belongs to, the type of data it contains, the data itself, and depending on the type, a checksum.
To learn more, take a look at the address spec.
For most objects referenced by Filecoin, a Content Identifier (CID for short) is used. This is effectively a hash value, prefixed with its hash function (multihash) prepended with a few extra labels to inform applications about how to deserialize the given data. To learn more, take a look at the CID Spec.
CIDs are serialized by applying binary multibase encoding, then encoding that as a CBOR byte array with a tag of 42.
A block represents an individual point in time that the network may achieve consensus on. It contains (via merkle links) the full state of the system, references to the previous state, and some notion of a 'weight' for deciding which block is the 'best'.
// Block is a block in the blockchain.
type Block struct {
// Miner is the address of the miner actor that mined this block.
Miner Address
// Tickets is a chain (possibly singleton) of tickets ending with a winning ticket submitted with this block.
Tickets []Ticket
// ElectionProof is a signature over the final ticket that proves this miner
// is the leader at this round
ElectionProof Signature
// Parents is an array of distinct CIDs of parents on which this block was based.
// Typically one, but can be several in the case where there were multiple winning ticket-
// holders for a round.
// The order of parent CIDs is not defined.
Parents []Cid
// ParentWeight is the aggregate chain weight of the parent set.
ParentWeight Integer
// Height is the chain height of this block.
Height Uint64
// StateRoot is a cid pointer to the state tree after application of the
// transactions state transitions.
StateRoot Cid
// Messages is the set of messages included in this block. This field is the Cid
// of the root of a sharray of Messages.
Messages Cid
// BLSAggregate is an aggregated BLS signature for all the messages in this block that
// were signed using BLS signatures
BLSAggregate Signature
// MessageReceipts is a set of receipts matching to the sending of the `Messages`.
// This field is the Cid of the root of a sharray of MessageReceipts.
MessageReceipts Cid
// The block Timestamp is used to enforce a form of block delay by honest miners.
// Unix time UTC timestamp (in seconds) stored as an unsigned integer.
Timestamp Timestamp
// BlockSig is a signature over the hash of the entire block with the miners
// worker key to ensure that it is not tampered with after creation
BlockSig Signature
}
The Message and MessageReceipts fields are each Cids of sharray datastructures. The Messages
sharray contains the Cids of the messages that are included in the block. The MessageReceipts
sharray contains the receipts directly.
type Message struct {
To Address
From Address
// When receiving a message from a user account the nonce in
// the message must match the expected nonce in the from actor.
// This prevents replay attacks.
Nonce Uint64
Value BigInteger
GasPrice Integer
GasLimit Integer
Method uint64
Params []byte
}
Parameters to methods get encoded as described in the basic types section below, and then put into a CBOR encoded array.
(TODO: thinking about this, it might make more sense to just have Params
be an array of things)
A signed message is a wrapper type over the base message.
type SignedMessage struct {
Message Message
Signature Signature
}
The signature is a serialized signature over the serialized base message. For more details on how the signature itself is done, see the signatures spec.
type MessageReceipt struct {
ExitCode uint8
Return []byte
GasUsed Integer
}
Message receipts are serialized by using the FCS.
type Actor struct {
// Code is a pointer to the code object for this actor
Code Cid
// Head is a pointer to the root of this actors state
Head Cid
// Nonce is a counter of the number of messages this actor has sent
Nonce Uint64
// Balance is this actors current balance of filecoin
Balance BigInteger
}
The state trie keeps track of all state in Filecoin. It is a map of addresses to actors
in the system. It is implemented using a HAMT.
{{% notice todo %}} TODO: link to spec for our CHAMP HAMT {{% /notice %}}
All signatures in Filecoin come with a type that signifies which key type was used to create the signature.
For more details on signature creation, see the signatures spec.
type Signature struct {
Type int
Data []byte
}
Key Type | Value |
---|---|
Secp256k1 | 1 |
BLS12-381 ECDSA | 2 |
In their serialized form the raw bytes (only the Data
field) are serialized and then tagged according to the FCS tags, to indicated which signature type they are.
FaultSets are used to denote which sectors failed at which block height.
type FaultSet struct {
Index uint64
BitField BitField
}
The Index
field is a block height offset from the start of the miners proving period (in order to make it more compact).
Types that appear in messages or in state must be encoded as described here.
The public key type is simply an array of bytes. (TODO: discuss specific encoding of key types, for now just calling it bytes is sufficient)
BytesAmount is just a re-typed Integer.
PeerID is just the serialized bytes of a libp2p peer ID.
Spec incomplete, take a look at this PR: libp2p/specs#100
Integers are encoded as LEB128 signed integers.
Bitfields are a set of bits encoded using a custom run length encoding: rle+. rle+ is specified below.
TODO
A fault set is a BitField and a block height, encoding TBD.
BlockHeader is a serialized Block
.
SealProof is an opaque, dynamically-sized array of bytes.
PoStProof is an opaque, dynamically-sized array of bytes.
TokenAmount is a re-typed Integer.
This is taken from the Dwarf Standard 4, Appendix C
do
{
byte = low order 7 bits of value;
value >>= 7;
if (value != 0) /* more bytes to come */
set high order bit of byte;
emit byte;
} while (value != 0);
more = 1;
negative = (value < 0);
size = no. of bits in signed integer;
while(more)
{
byte = low order 7 bits of value;
value >>= 7;
/* the following is unnecessary if the
* implementation of >>= uses an arithmetic rather
* than logical shift for a signed left operand
*/
if (negative)
/* sign extend */
value |= - (1 << (size - 7));
/* sign bit of byte is second high order bit (0x40) */
if ((value == 0 && sign bit of byte is clear) ||
(value == -1 && sign bit of byte is set))
more = 0;
else
set high order bit of byte;
emit byte;
}
result = 0;
shift = 0;
while(true)
{
byte = next byte in input;
result |= (low order 7 bits of byte << shift);
if (high order bit of byte == 0)
break;
shift += 7;
}
result = 0;
shift = 0;
size = number of bits in signed integer;
while(true)
{
byte = next byte in input;
result |= (low order 7 bits of byte << shift);
shift += 7;
/* sign bit of byte is second high order bit (0x40) */
if (high order bit of byte == 0)
break;
}
if ((shift <size) && (sign bit of byte is set))
/* sign extend */
result |= - (1 << shift);
Datastructures in Filecoin are encoded as compactly as is reasonable. At a high level, each object is converted into an ordered array of its fields (ordered by their appearance in the struct declaration), then CBOR marshaled, and prepended with an object type tag.
FCS Type | CBOR tag |
---|---|
Block v1 | 43 |
Message v1 | 44 |
SignedMessage v1 | 45 |
MessageReceipt v1 | 46 |
Signature Secp256k1 v1 | 47 |
Signature BLS12-381 v1 | 48 |
For example, a message would be encoded as:
tag<44>[
msg.To,
msg.From,
msg.Nonce,
msg.Value,
msg.GasPrice,
msg.GasLimit,
msg.Method,
msg.Params
]
Each individual type should be encoded as specified:
type | encoding |
---|---|
Uint64 | CBOR major type 0 |
BigInteger | CBOR bignum |
Address | CBOR major type 2 |
Uint8 | CBOR Major type 0 |
[]byte | CBOR Major type 2 |
string | CBOR Major type 3 |
bool | CBOR Major type 7, value 20/21 |
Objects should be encoded using canonical CBOR, and decoders should operate in strict mode. The maximum size of an FCS Object should be 1MB (2^20 bytes). Objects larger than this are invalid.
Additionally, CBOR Major type 5 is not used. If an FCS object contains it, that object is invalid.
Cids for FCS objects should use the FCS multicodec (0x1f
), and should use a blake2b-256 multihash.
Below are some sample vectors for each data type.
Encoded:
d82c865501fd1d0f4dfcd7e99afcb99a8326b7dc459d32c6285501b882619d4
6558f3d9e316d11b48dcf211327026a1875c245037e11d600666d6574686f64
4d706172616d73617265676f6f64
Decoded:
To: Address("f17uoq6tp427uzv7fztkbsnn64iwotfrristwpryy")
From: Address("f1xcbgdhkgkwht3hrrnui3jdopeejsoatkzmoltqy")
Nonce: uint64(117)
Value: BigInt(15000000000)
Method: string("method")
Params: []byte("paramsaregood")
Encoded:
d82b895501fd1d0f4dfcd7e99afcb99a8326b7dc459d32c628814a69616d617
469636b6574566920616d20616e20656c656374696f6e2070726f6f6681d82a
5827000171a0e40220ce25e43084e66e5a92f8c3066c00c0eb540ac2f2a1733
26507908da06b96f678c242bb6a1a0012d687d82a5827000171a0e40220ce25
e43084e66e5a92f8c3066c00c0eb540ac2f2a173326507908da06b96f678808
0
Decoded:
Miner: Address("f17uoq6tp427uzv7fztkbsnn64iwotfrristwpryy")
Tickets: [][]byte{"iamaticket"}
ElectionProof: []byte("i am an election proof")
Parents: []Cid{"zDPWYqFD5abn4FyknPm1PibXdJ2kwRNVPDabKyzfdXVJGjnDuq4B"}
ParentWeight: NewInt(47978)
Height: uint64(1234567)
StateRoot: Cid("zDPWYqFD5abn4FyknPm1PibXdJ2kwRNVPDabKyzfdXVJGjnDuq4B")
Messages: []SignedMessage{}
MessageReceipts: []MessageReceipt{}
RLE+ is a lossless compression format based on RLE. It's primary goal is to reduce the size in the case of many individual bits, where RLE breaks down quickly, while keeping the same level of compression for large sets of contiugous bits.
In tests it has shown to be more compact than RLE iteself, as well as Concise and Roaring.
The format consists of a header, followed by a series of blocks, of which there are three different types.
The format can be expressed as the following BNF grammar.
<encoding> ::= <header> <blocks>
<header> ::= <version> <bit>
<version> ::= "00"
<blocks> ::= <block> <blocks> | ""
<block> ::= <block_single> | <block_short> | <block_long>
<block_single> ::= "1"
<block_short> ::= "01" <bit> <bit> <bit> <bit>
<block_long> ::= "00" <unsigned_varint>
<bit> ::= "0" | "1"
An <unsigned_varint>
is defined as specified here.
The header indiciates the very first bit of the bit vector to encode. This means the first bit is always the same for the encoded and non encoded form.
The blocks represent how many bits, of the current bit type there are. As 0
and 1
alternate in a bit vector
the inital bit, which is stored in the header, is enough to determine if a length is currently referencing
a set of 0
s, or 1
s.
If the running length of the current bit is only 1
, it is encoded as a single set bit.
If the running length is less than 16
, it can be encoded into up to four bits, which a short block
represents. The length is encoded into a 4 bits, and prefixed with 01
, to indicate a short block.
If the running length is 16
or larger, it is encoded into a varint, and then prefixed with 00
to indicate
a long block.
Note: The encoding is unique, so no matter which algorithm for encoding is used, it should produce the same encoding, given the same input.