Skip to content

Commit

Permalink
Merge pull request ipfs#390 from ipfs/ipfs-principles
Browse files Browse the repository at this point in the history
IPFS Principles
  • Loading branch information
lidel authored Mar 31, 2023
2 parents 74711ac + 52834c6 commit 48da7a6
Show file tree
Hide file tree
Showing 3 changed files with 323 additions and 0 deletions.
21 changes: 21 additions & 0 deletions src/architecture/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: Architecture
description: |
These documents define the architectural principles that IPFS is built upon, and can be used as tools to evaluate
implementations and applications of IPFS.
---

{% include 'header.html' %}

<main>
<dl>
<dt><a href="/architecture/principles/">IPFS Principles</a></dt>
<dd>
IPFS is a suite of specifications and tools that are defined by two key characteristics: content-addressing and
transport-agnosticity. This document provides context and details about these characteristics. In doing so it defines what
is or is not an IPFS implementation.
</dd>
</dl>
</main>

{% include 'footer.html' %}
287 changes: 287 additions & 0 deletions src/architecture/principles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
---
editors:
- name: Robin Berjon
email: [email protected]
url: https://berjon.com/
github: darobin
twitter: robinberjon
mastodon: "@[email protected]"
affiliation:
name: Protocol Labs
url: https://protocol.ai/
maturity: reliable
date: 2023-03-28
---

# IPFS Principles

The IPFS stack is a suite of specifications and tools that share two key characteristics:

1. Data is addressed by its contents using an extensible verifiability mechanism, and
2. Data is moved in ways that are tolerant of arbitrary transport methods.

This document provides context and details about these characteristics. In doing so it defines
what is or is not an IPFS implementation. This is a **living document**; it is expected to
change over time as we define more of the principles that guide the architecture of IPFS or
find clearer ways of describing those we have already defined.

## Addressing

The web's early designers conceived it as a universal space in which identifiers map to
information resources. As the web grew, they enshrined in
[web architecture](https://www.w3.org/TR/webarch/#identification) that all resources
should have an identifier and
defined "addressability" as meaning that
"[*a URI alone is sufficient for an agent to carry out a particular type of interaction.*](https://www.w3.org/2001/tag/doc/whenToUseGet.html#uris)" (:cite[webarch])

This design is tremendously successful. For all its flaws, the web brings together a
huge diversity of software, services, and resources under universal addressability.

Unfortunately, HTTP addressability is based on a hierarchy of authorities that places
resources under the control of a host and places hosts under the control of the DNS system
(further issues with this model are discussed further in the Appendix). As indicated
in :cite[RFC3986]:

> Many URI schemes include a hierarchical element for a naming
> authority so that governance of the name space defined by the
> remainder of the URI is delegated to that authority (which may, in
> turn, delegate it further).
[CIDs](https://github.com/multiformats/cid) in IPFS offer an improvement over HTTP URLs by
maintaining universal addressability while eliminating the attack vectors inherent in
hierarchical authority. Content addressability derives identifiers from the content of
an information resource, such that any party can both mint the identifier and verify
that it maps to the right resource. This eliminates the need for any authority outside
of the resource itself to certify its content. It makes CIDs the universal
self-certifying addressability component of the web.

Addressing data using [CIDs](https://github.com/multiformats/cid) is the first defining
characteristic of IPFS. And the second characteristic, transport-agnosticity, can be
supported thanks to the verifiability that CIDs offer. Across a vast diversity of implementations,
architectures, and services, *IPFS is the space of resources that can be interacted with
over arbitrary transports using a CID*. As Juan Benet once put it,
"[*That's it!*](https://github.com/multiformats/cid/commit/ece08b40a6b1e9eeafc224e2757d8d1ef3317163#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R43)"

Conversely, any system that exposes interactions with resources based on CIDs is
an IPFS implementation. There are
[many contexts in which CIDs can be used for addressing](https://docs.ipfs.tech/how-to/address-ipfs-on-web/)
and [content routing delegation](https://github.com/ipfs/specs/blob/main/routing/ROUTING_V1_HTTP.md)
can support a wealth of interaction options by resolving CIDs.

## Robustness

Common wisdom about network protocol design is captured by *Postel's Law* or the
*Robustness Principle*. Over the years it has developed multiple formulations, but the
canonical one from :cite[RFC1958] ("*Architectural Principles of the Internet*") is:

> Be strict when sending and tolerant when receiving.
This principle is elegant, and expresses an intuitively pleasing behavior of protocol
implementations. However, over the years, the experience of internet and web protocol
designers has been that this principle can have detrimental effects on interoperability.
As discussed in the Internet Architecture Board's recent work on
[*Maintaining Robust Protocols*](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance),
implementations that silently accept faulty input can lead to interoperability defects
accumulating over time, leading the overall protocol ecosystem to decay.

There are two equilibrium points for protocol ecosystems: when deployed implementations
are strict, new implementations, out of necessity, are required to be strict as well, leading to a
strict ecosystem; conversely, when deployed implementations are tolerant, new
implementations will have a strong incentive to tolerate non-compliance so as to
interoperate. Tolerance is highly desirable for extensibility and adaptability to new
environments, but strictness is highly desirable to prevent a protocol ecosystem from
decaying into a complex collection of corner cases with poor or difficult
interoperability (what the IETF refers to as
"[virtuous intolerance](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance#name-virtuous-intolerance)").

IPFS approaches this problem space with a new iteration on the robustness principle:

> Be strict about the outcomes, be tolerant about the methods.
CIDs enforce strict outcomes because the mapping from address to content is verified;
there is no room for outcomes that deviate from the intent expressed in an address.
This strictness is complemented by a design that proactively expects change thanks to
a self-describing format (CIDs are a [multiformat](https://multiformats.io/) implementation and support
an open-ended list of hashes, codecs, etc.). The endpoints being enforceably strict means
that everything else, notably transport, can be tolerant. Being tolerant about methods
enables adaptability in how the protocol works, notably in how it can adapt to specific
environments, and in how intelligence can be applied at the endpoints in novel ways, while
being strict with outcomes guarantees that the result will be correct and interoperable.

Note that this approach to robustness also covers the
[End-to-end Principle](https://en.wikipedia.org/wiki/End-to-end_principle). The end-to-end
principle states that the reliability properties of a protocol have to be
supported at its endpoints and not in intermediary nodes. For instance, you can best guarantee
the confidentiality or authenticity of a message by encrypting or signing at one endpoint and
decrypting or verifying at the other rather than asking relaying nodes to implement local
protections. IPFS's aproach to robustness, via CIDs, is well aligned with that principle.

## IPFS Implementation Requirements

An :dfn[IPFS Implementation]:
- MUST support addressability using CIDs.
- MUST expose operations (eg. retrieval, provision, indexing) on resources using CIDs. The operations
that an implementation may support is an open-ended set, but this requirement should cover any interaction
which the implementation exposes to other IPFS implementations.
- MUST verify that the CIDs it resolves match the resources they address, at least when it
has access to the resources' bytes. Implementations MAY relax this requirement in
controlled environments in which it is possible to ascertain that verification has happened
elsewhere in a trusted part of the system.
- SHOULD name all the important resources it exposes using CIDs. Determining which resources are
important is a matter of judgment, but anything that another agent might legitimately wish to
access is in scope, and it is best to err on the side of inclusion.
- SHOULD expose the logical units of data that structure a resource (eg. a CBOR document, a file or
directory, a branch of a B-tree search index) using CIDs.
- SHOULD support incremental verifiability, notably so that it may process content of arbitrary sizes.
- MAY rely on any transport layer. The transport layer cannot dictate or constrain the way in which
CIDs map to content.

## Boundary Examples

These IPFS principles are broad. This is by design because, like HTTP, IPFS supports an open-ended set of
use cases and is adaptable to a broad array of operating conditions. Considering cases
at the boundary may help develop an intuition for the limits that these principles draw.

### Other Content-Addressing Systems

CIDs are readily made compatible with other content-addressable systems, but this does not
entail that all content-addressable systems are part of IPFS. Git's SHA1 hashes aren't CIDs
but can be converted into CIDs by prefixing them with `f01781114`. Likewise, BitTorrent v2
uses multihashes in the `btmh:` scheme. BitTorrent addresses aren't CIDs, but can be
converted to CIDs by replacing `btmh:` with `f017b`.

The simplicity with which one can expose these existing system over IPFS by simply prefixing
existing addresses to mint CIDs enables radical interoperability with other content-addressable
systems.

### Verification Matters

The requirements above state that an implementation may forgo verification when "*it is
possible to ascertain that verification has happened elsewhere in a trusted part of the system.*"
This is intended as a strict requirement in which implementors take trustlessness seriously, an indication
that it's okay to not constantly spend cycles verifying hashes in an internal setup which you
have reasons to believe is trustworthy. This is *not* a licence to trust an arbitrary data
source just because you like them.

For instance:

- A JS code snippet that fetches data from an IPFS HTTP gateway without verifying it is not an
IPFS implementation.
- An IPFS HTTP gateway that verifies the data that it is pulling from other IPFS implementations
before serving it over HTTP is an IPFS implementation.
- That JS piece of code in the first bullet can be turned into an IPFS implementation if it
fetches from a :cite[trustless-gateway] and verifies what it gets.

## Self-Certifying Addressability

:dfn[Authority] is control over a given domain of competence. :dfn[Naming authority] is
control over what resources are called.

:dfn[Addressability] is the property of a naming system such that its names are sufficient
for an agent to interact with the resources being named.

:dfn[Verifiability] is the property of a naming system such that an agent can certify that
the mapping between a name it uses and a resource it is interacting with is correct without
recourse to an authority other than itself and the resource.

:dfn[Self-certifying addressability] is the property of a naming system such that it is both
addressable and verifiable: any name is sufficient to interact with a resource, and its mapping
to that resource can be certified without recourse to additional authority. Self-certifying
addressability is a key component of a
[self-certifying web](https://jaygraber.medium.com/web3-is-self-certifying-9dad77fd8d81)
and it supports capture-resistance which can help mitigate against centralization.

CIDs support :ref[self-certifying addressability]. With CIDs, the authority to name a resource
resides only with that resource and derives directly from that resource's intrinsic
property: its content. This frees interactions with CID-named resources from the power
relation implicit in a client-server architecture. CIDs are the trust model of IPFS.

An implementation may retrieve a CID without verifying that the resource matches it, but that
loses the resource's naming authority. Such an implementation would be comparable to an HTTP
client looking DNS records up from a random person's resolver: it cannot guarantee that the
addressing is authoritative. Implementers may make informed decisions as to where in their
systems they support verification, but they should ensure that the mapping between CID and resource
is verified whenever they have access to both the resource and the CID that maps to it.

## Appendix: Historical Notes

We tend not to think about addressability because it is so foundational that we
struggle to apprehend a system without it, but that is precisely why it is important
that we get it right. You can find extensive historical evidence that TimBL and others saw
URLs as arguably the most fundamental invention of the Web, and the early groups that
worked on Web architecture discussed and debated the properties of URLs at length. The
problems of centralization we face today trace their lineage back to those decisions.

The hierarchical nature of the HTTP addresses was intentional, as TimBL wrote clearly in
[Web Architecture from 50,000 feet](https://www.w3.org/DesignIssues/Architecture.html):
> The HTTP space consists of two parts, one hierarchically delegated, for which the
> Domain Name System is used, and the second an opaque string whose significance is
> locally defined by the authority owning the domain name.
The model that the Web's earlier designers had in mind was a federated model
in which authority is delegated and addresses are *owned* based on that
authority delegation. This is notably clear in the *URI Ownership* passage of the
[*Architecture of the World Wide Web, Volume One*](https://www.w3.org/TR/webarch/#def-uri-ownership):
>URI ownership is a relation between a URI and a social entity, such as a person,
>organization, or specification. URI ownership gives the relevant social entity certain
>rights, including:
> * to pass on ownership of some or all owned URIs to another owner—delegation; and
> * to associate a resource with an owned URI—URI allocation.
>
> By social convention, URI ownership is delegated from the IANA URI scheme registry,
> itself a social entity, to IANA-registered URI scheme specifications.(…)
>
> The approach taken for the "http" URI scheme, for example, follows the pattern whereby
> the Internet community delegates authority, via the IANA URI scheme registry and the
> DNS, over a set of URIs with a common prefix to one particular owner. One consequence
> of this approach is the Web's heavy reliance on the central DNS registry.(…)
>
> URI owners are responsible for avoiding the assignment of equivalent URIs to multiple
> resources. Thus, if a URI scheme specification does provide for the delegation of
> individual or organized sets of URIs, it should take pains to ensure that ownership
> ultimately resides in the hands of a single social entity. Allowing multiple owners
> increases the likelihood of URI collisions.
>
> URI owners may organize or deploy infrastruture [sic] to ensure that representations of
> associated resources are available and, where appropriate, interaction with the resource
> is possible through the exchange of representations. There are social expectations for
> responsible representation management (§3.5) by URI owners. Additional social
> implications of URI ownership are not discussed here.
This notion of address or name ownership is
[pervasive across architectural documents](https://www.w3.org/DesignIssues/). This passage
from an interview of TimBL
([Philosophical Engineering and Ownerhip of URIs](https://www.w3.org/DesignIssues/PhilosophicalEngineering.html)) is explicit:
> **Alexandre Monnin**: Regarding names and URIs, a URI is not precisely a philosophical
> concept, it's an artifiact [sic]. So you can own a URI while you cannot own a philosophical
> name. The difference is entirely in this respect.\
> **Tim Berners-Lee**: For your definition of a philosophical name, you cannot own it.
> Maybe in your world, in your philosophy, you don't deal with names that are owned, but
> in the world we're talking about, names are owned.
This expectation of delegated naming authority was so strong among early Web architects
that the development of naming conventions in HTTP space (eg. `robots.txt`, `favicon.ico`,
all the `.well-known` paths) is described as "*expropriation*" in the
[Web Architecture](https://www.w3.org/TR/webarch/) and the W3C's Technical Architecture
Group (TAG) issue on the topic stated that it "breaks the web".

Federated models only have weak capture-resistance because the federated entities can always
concede power (precisely because they have ownership) but lack established means to
support collective organization. As a result, any power imbalance will likely become hard
to dislodge. A good example is search: as a publisher (the owner of delegated authority
over your domain) you can cede the rights to index your content but you can't have a voice
in what is done with the indexed content (individual opt out is not an option). This was
fine when you could barter content for links, but once search power consolidated, the
terms of trade deteriorated with no immediate recourse.

## Acknowledgements

Many thanks to the following people, listed alphabetically, whose feedback was instrumental
in producing this document:
Adin Schmahmann,
biglep,
Dietrich Ayala,
Juan Benet,
lidel,
Molly Mackinlay, and
mosh.
15 changes: 15 additions & 0 deletions src/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,21 @@ <h3><a href="/ipns/">InterPlanetary Naming System</a></h3>
</p>
{% include 'specs/ipns.html' %}
</section>
<section>
<h3><a href="/architecture/">Architecture</a></h3>
<p>
These documents define the architectural principles that IPFS is built upon, and can be used as tools to evaluate
implementations and applications of IPFS.
</p>
<dl>
<dt><a href="/architecture/principles/">IPFS Principles</a></dt>
<dd>
IPFS is a suite of specifications and tools that are defined by two key characteristics: content-addressing and
transport-agnosticity. This document provides context and details about these characteristics. In doing so it defines what
is or is not an IPFS implementation.
</dd>
</dl>
</section>
<section>
<h3><a href="/meta/">Meta</a></h3>
<p>
Expand Down

0 comments on commit 48da7a6

Please sign in to comment.