-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathrasl.html
191 lines (188 loc) · 12.7 KB
/
rasl.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
<!DOCTYPE html><html lang="en"><head><!--
!!!!! GENERATED SPEC — DO NOT EDIT. Look for the .src.html instead !!!!
-->
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>RASL — Retrieval of Arbitrary Structures & Links</title>
<link rel="stylesheet" href="spec.css"><link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><rect x=%220%22 y=%220%22 width=%22100%22 height=%22100%22 fill=%22%2300ff75%22></rect></svg>"><meta name="twitter:card" content="summary_large_image"><meta name="twitter:title" property="og:title" content="DASL: RASL — Retrieval of Arbitrary Structures & Links"><meta name="twitter:description" property="og:description" content="RASL is a URL scheme used to identify content-addressed DASL resources along with a simple HTTP-based retrieval method."><meta name="twitter:image" property="og:image" content="https://dasl.ing/rasl.png"><meta name="twitter:image:alt" content="Very colourful stripes, so colourful it hurts"><meta name="twitter:url" property="og:url" content="https://dasl.ing/"><meta property="og:site_name" content="DASL"><meta property="og:locale" content="en"><meta name="theme-color" content="#00ff75"></head>
<body><div class="nav-back">A specification of the <a href="/">DASL Project</a>.</div><main><header><h1>RASL — Retrieval of Arbitrary Structures & Links</h1><table><tbody><tr><th>date</th><td>2025-03-17</td></tr><tr><th>editors</th><td><a href="https://berjon.com/">Robin Berjon</a> <<a href="mailto:[email protected]">[email protected]</a>><br><a href="https://bumblefudge.com/">Juan Caballero</a> <<a href="mailto:[email protected]">[email protected]</a>></td></tr><tr><th>issues</th><td><a href="https://github.com/darobin/dasl.ing/issues">list</a>, <a href="https://github.com/darobin/dasl.ing/issues/new">new</a></td></tr><tr><th>abstract</th><td><div id="abstract">
<p>
RASL is a URL scheme used to identify content-addressed DASL resources
along with a simple HTTP-based retrieval method.
</p>
</div></td></tr></tbody></table></header>
<!--
XXX FOR FUTURE WORK
- API for publishing & looking up over AT, with storage in repos
- do we want hosts gossipping to one another or is AT enough?
-->
<section>
<h2>Introduction</h2>
<p>
Content-addressed resources are "self-certifying," which is to say that
you don't need any external authority to certify that the content you
have when you resolve the identifier is correct: because the identifier
contains a hash, you can (and should) verify that you obtained the right
content yourself ([<a href="#ref-ipfs-principles" class="ref">ipfs-principles</a>]). The identifier is enough to
certify the content. This has several implications, but two are
particularly relevant for this specification:
</p>
<ul>
<li>
When resolving a content-addressed identifier, you can obtain the content
from anyone. It doesn't have to be the content's author. You can even
obtain it from entirely untrusted sources — given that you can always
certify it, you don't need to trust whoever gives it to you. As a result,
the authority part of a URL — the part that can certify the content you
get, which is the domain part in an <code>https</code> URL — is the
CID itself ([<a href="#ref-cid" class="ref">cid</a>]).
</li>
<li>
Because it doesn't matter where you get content from, content-addressed
URLs are inherently transport-independent. There are benefits to agreeing
on transport (if only so that people can find one another's content)
but as a client, if you know of several potential ways of obtaining a
CID you are free to use whichever you prefer or to try several in
whatever order.
</li>
</ul>
<p>
Taking these aspects into consideration, this specification defines a URL
scheme in which the CID is the authority, along with optional hints of
potential look-up locations, and defines a retrieval method but does not
mandate that RASL retrieval rely on it.
</p>
<section>
<h2>The <code>web+rasl</code> URL Scheme</h2>
<p>
RASL URLs look like this:
<code>web+rasl://bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4;berjon.com,bsky.app/</code>.
This breaks down into the following components:
</p>
<ul>
<li>
The <code>web+rasl</code> scheme. This uses the <code>web+</code> prefix
to facilitate registration of the scheme in browser contexts so that
RASL URLs can be used on the web directly.
</li>
<li>
An authority composed of:
<ul>
<li>a DASL CID <code>bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4</code> and</li>
<li>
(optionally) a comma-separated list of URI-encoded hostnames (any authority parts that are
acceptable in HTTP) that can be used to attempt retrieval from.
</li>
</ul>
</li>
<li>
A path (here just <code>/</code>) that is empty or <code>/</code> for raw
CIDs like the above but can contain a complete path if the CID resolves
to MASL data ([<a href="#ref-masl" class="ref">masl</a>]).
</li>
</ul>
<p>
Use the following steps to <dfn id="dfn-parse-a-rasl-url">parse a RASL URL</dfn>:
</p>
<ol>
<li>Accept a string <var>url</var> and parse it according to the <a href="https://url.spec.whatwg.org/#url-parsing">URL Standard</a> ([<a href="#ref-url" class="ref">url</a>]).</li>
<li>If that's a failure, return the failure.</li>
<li>Read the <code>host</code> part of the parsed URL up to either the first <code>;</code> character or to the end of the string. Store that in <var>cid</var>.</li>
<li>If <var>cid</var> is not a valid CID ([<a href="#ref-cid" class="ref">cid</a>]), return failure.</li>
<li>
If there was no <code>;</code> then <var>hints</var> is an empty array. Otherwise:
<ol>
<li>Split the remainder of the string on <code>,</code>.</li>
<li>Apply <code>decodeURIComponent()</code> to each part.</li>
<li>The results is the <var>hints</var> array.</li>
</ol>
</li>
<li>Return the URL's parts as well as <var>cid</var> and <var>hints</var>.</li>
</ol>
</section>
<section>
<h2>Fetching RASL</h2>
<p>
A user agent may retrieve a CID in whichever way it prefers. This section
provides a simple standard for HTTP-based CID retrieval, to make it
easy for authors to publish content to their own sites and have it
retrieved, without having to worry about operating any infrastructure
beyond the web server they already have.
</p>
<p>
Use the following steps to <dfn id="dfn-fetch-a-rasl-url">fetch a RASL URL</dfn>:
</p>
<ol>
<li>Accept a string <var>url</var> and parse it according to the steps to <a href="#dfn-parse-a-rasl-url" class="dfn-ref">parse a RASL URL</a>.</li>
<li>
Construct a <var>request</var> using <var>cid</var> from the <var>url</var> as well as <var>hints</var> that may
be from the URL or from elsewhere (this is entirely up to you):
<ol>
<li>
For each hint, construct a request URL that is the concatenation of <code>https://</code>,
the hint as host, <code>/.well-known/rasl/</code>, and the <var>cid</var>.
</li>
<li>
Prepare the request such that it has a method of either <code>GET</code> or <code>HEAD</code>,
that it is stateless (no cookies, no credentials of any kind), and that it uses no content
negotiation.
</li>
</ol>
</li>
<li>
Fetch the <var>request</var>s. How these get prioritised is entirely up to the implementation. It
is common to run them all in parallel and abort them with the first success response.
Note that the <code>.well-known</code> path may redirect, so be ready to handle
that. This makes it possible to create sites that are published
the usual way and to have a RASL that is simply a redirect to the
resource. So for instance, you may have an existing
<code>https://berjon.com/kitten.jpg</code> the CID for which is
<code>bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4</code>.
This can be published as this RASL URL:
<code>web+rasl://bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4;berjon.com/</code>.
A client can retrieve it by constructing the a request to this URL:
<code>https://berjon.com/.well-known/rasl/bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4</code>.
In turn, the latter may simply 307 back to <code>https://berjon.com/kitten.jpg</code>.
(Yes, this is HTTP with extra steps, but the extra steps get you
self-certifying content.)
</li>
<li>
If the response is a redirect but not a 307, the client should treat it as if it
had been a 307 anyway.
</li>
<li>
If none of the responses are successful, return failure.
</li>
<li>
Set the response's media type to <code>application/octet-stream</code>. (The server should have
done that already, but may not have done so, notably if it relied on a redirect.) The purpose
of RASL is to retrieve data in ways that are independent of the server — any media type
processing must therefore take place at another layer. Without this, we lose the self-certifying
nature of the system. (Note that servers are encouraged to enforce that so as not to have their
RASL endpoints used for general-purpose web serving, which can be a security vector depending on
where the data being served came from.)
</li>
<li>
Produce a CID for the retrieved data. If that CID does not match the requested <var>cid</var>,
return failure.
</li>
<li>
Return the data.
</li>
</ol>
</section>
</section>
<section>
<h2>RASL Pathing</h2>
<p>
The <code>pathname</code> component of a RASL URL can only be interpreted in the context of the
content that is retrieved. It is never transmitted to the RASL server and is purely interpreted
on the client side.
</p>
<p>
As of this time, the only case in which pathing is defined is if the CID has a dCBOR42 type
([<a href="#ref-dcbor42" class="ref">dcbor42</a>]) <em>and</em> the content is MASL ([<a href="#ref-masl" class="ref">masl</a>]) <em>and</em> it has a
<code>resources</code> map. But carrying that resolution happens outside of RASL fetching.
</p>
</section>
<section><h2>References</h2><dl><dt id="ref-cid">[cid]</dt><dd>Robin Berjon & Juan Caballero. <a href="https://dasl.ing/cid.html"><cite>Content IDs (CIDs)</cite></a>. 2025-03-17. URL: <a href="https://dasl.ing/cid.html">https://dasl.ing/cid.html</a></dd><dt id="ref-dcbor42">[dcbor42]</dt><dd>Robin Berjon & Juan Caballero. <a href="https://dasl.ing/dcbor42.html"><cite>Deterministic CBOR with tag 42 (dCBOR42)</cite></a>. 2025-03-17. URL: <a href="https://dasl.ing/dcbor42.html">https://dasl.ing/dcbor42.html</a></dd><dt id="ref-ipfs-principles">[ipfs-principles]</dt><dd>Robin Berjon. <a href="https://specs.ipfs.tech/architecture/principles/"><cite>IPFS Principles</cite></a>. march 2023. URL: <a href="https://specs.ipfs.tech/architecture/principles/">https://specs.ipfs.tech/architecture/principles/</a></dd><dt id="ref-masl">[masl]</dt><dd>Robin Berjon & Juan Caballero. <a href="https://dasl.ing/masl.html"><cite>MASL — Metadata for Arbitrary Structures & Links</cite></a>. 2025-03-17. URL: <a href="https://dasl.ing/masl.html">https://dasl.ing/masl.html</a></dd><dt id="ref-url">[url]</dt><dd>WHATWG. <a href="https://url.spec.whatwg.org/"><cite>URL</cite></a>. Living Standard. URL: <a href="https://url.spec.whatwg.org/">https://url.spec.whatwg.org/</a></dd></dl></section></main></body></html>