the Sunrise Choir is a team developing the next generation of solarpunk peer-to-peer communication systems using Scuttlebutt.
we aim to evolve the fundamentals of the Scuttlebutt core protocol: to create both a future-proofed, maintainable, and accessible implementation and a tightly-coordinated, clearly defined, focused, and rhythmic team.
If you'd like to read more, here is Mikey's original post announcing the project.
- @mikey: Website | GitHub |
@6ilZq3kN0F+dXFHAPjAwMm87JEb/VdB+LC9eIMW3sa0=.ed25519
- @piet: Website | GitHub |
@U5GvOKP/YUza9k53DSXxT0mk3PIrnyAmessvNfZl5E0=.ed25519
- @matt: GitHub |
@FbGoHeEcePDG3Evemrc+hm+S77cXKf8BRQgkYinJggg=.ed25519
- (no longer on the team) @aljoscha: Website | GitHub |
@zurF8X68ArfRM71dF3mKh36W0xDM8QmOnAS5bYOq8hA=.ed25519
In early September, Mikey had an idea to support the next generation of core Scuttlebutt protocol work, and thus began the Sunrise Choir:
thanks to the generosity and leadership of people like @dominic, we now have a well-designed core protocol and fully-functional reference implementation. at the same time, our core protocol has unspecified behaviors, missing context on fundamental design hypotheses, and knowledge centralized in a few minds. our reference implementation has cryptic code, scattered documentation, confusing modularity.
even to the core maintainers among us, we still struggle with understanding the next module where a bug crops up. not everyone is willing to spend hours trying to understand the deep poems within our source code, or know how to search for the "living documentation" of the protocol, or overcome feeling like an imposter when every step of the contribution process is a barrier.
to sum up, Scuttlebutt is an amazing innovation. yet it's inaccessible to new developers, unmaintainable by existing developers, and not yet prepared for the next stage of adoption. i want to help the project evolve to the next stage, the "early adopters". i think this means capturing all our learnings from Scuttlebutt today, designing the next generation of clear well-described specifications, and empowering the next generation of developers to implement, document, and test these.
The team started as Mikey, Piet, and Aljoscha, with a weekly meeting rhythm, a GitHub org, a website (redirect), a Twitter account, and an abundance of excitement. ☀️
To get started, Piet worked on writing JavaScript bindings to private-box and Aljoscha's ssb-legacy-message module.
The focus was on taking a small module with only a few functions (encrypt + decrypt) and using node's new N-API to make bindings. We were hoping to be able to cross-compile the bindings for multiple hosts on one machine.
We now have JavaScript bindings for the legacy message module here. We also factored out a useful rust module that wraps the native calls to N-API here.
This was all well and good, the bindings to ssb-legacy-message
work, but they're not faster than plain JavaScript. We learned that the overhead of JavaScript bindings outweigh the performance benefits of Rust when the module is really small.
Aljoscha spent time writing specifications (spec.scuttlebutt.nz) as well as implementing them in rust.
He did valuable work fuzzing the current implementation and found a lot of edge-cases, prompting a couple of fixes in the reference javascript implementation.
Highlights from his wrap up post:
In general, I produced four different kinds of outputs:
- specifications
- implementations in rust
- test data sets
- long discussions
The first big block of work was about cleanly defining the message format that is currently used by ssb. There is a full specification of the format here (repo). If it contradicts the protocol guide, chances are the specification is right and the protocol guide is wrong - I've done some pretty thorough fuzz-testing (more on that later).
The spec is composed of three parts:
- General data types used throughout the protocol(s) (such as multikeys, multihashes, etc.)
- the free-form data format that forms the content of messages
- the message meta data that links messages together into the signature chain
In general, the spec clearly distinguishes between the logical data structures and concrete encodings. Building on this, it proposes some compact binary encodings that could functions as drop-in replacements with identical semantics. Taking all of these alterative encodings together yields the Concise Legacy Message Representation (clmr). It isn't used anywhere yet, but you could implement e.g. replication rpcs that use clmr, yielding better performance without any really hard work.
Corresponding to the three parts of the spec, there are three repositories with rust code, implementing all of it:
- data types: https://github.com/sunrise-choir/ssb-multiformats
- free-form data format: https://github.com/sunrise-choir/legacy-msg-data
- metadata and feeds: https://github.com/sunrise-choir/ssb-legacy-msg
The third artifacts with respect to the current data formats is test data. This data has been generated by running a fuzzer on the rust implementations, and then using the js code to filter between valid encodings and data that has to be rejected. There's data sets for:
- the general data types: https://github.com/sunrise-choir/multiformats-testdata
- the free-form json data format: https://github.com/sunrise-choir/legacy-value-testdata
While the legacy format has been the main focus of my work, there were a few other things I got to:
Bpmux is a potential alternative for packet-stream, the multiplexing layer of ssb. There's a specification of the logical concepts, and then there's a concrete encoding over reliable, bidirectional byte streams. There's no implementation yet, but the format has quite a few advantages over packet-stream (more efficient, backpressure, heartbeats).
I did a write-up on Merkle trees applied to ssb. There's some pretty good information on the topic in the discussion it triggered, but the overall result was that we can't simply throw Merkle trees at ssb to completely solve the problem of verification for partial feeds.
Aljoscha dropped the full-time work in the sunrise choir for personal reasons, but continues to be excited about the raw potential of ssb. He is however deeply traumatized by the json intricacies of float encoding and string escapes, and hopes his work will protect future developers from suffering the same fate.
Flume is the database which powers the Scuttlebutt stack.
Piet had done some work on writing Flume in Rust before the choir started. We decided that Flume is much bigger module and that by pulling more processing into Rust we'd be able to see more performance benefits in JavaScript.
The flumedb-rs
module isn't very complicated. It defines traits for a FlumeLog
and a FlumeView
and implements FlumeLog
for an OffsetLog
Matt had the idea that instead of trying to port all of Flume to Rust, we could just parse the offset log binary file and insert that information into a SQL database! He called the idea a "Flume follower." It's not a FlumeView
, it's not part of the existing Flume system, but it does follow the Flume log.
Here's the schema from ssb-flume-follower-sql:
Features from the README:
- Easy to use
- models all the common relationships between messages, enabling powerful queries that are generally hard to do
- uses knex for powerful async queries
- Fast + flexible
- built on Rust + Sqlite3
- supports processing the log and inserting into the db in spare cpu time, so your application never chokes
- bulding + queries are FAST (benchmarked!) roughly 10x faster than current js indexing.
- Advanced features
- query the view at any time - for when waiting for the view to be in sync isn't relevant
- supports multiple identities (WIP) (multiple keypairs), with decryption done with Rust
- supports multiple clients running different versions of this view, by giving you control over where this view is saved
- Friendly. We have a code of conduct and are commited to abiding by it.
- Found something missing from the docs? Can't understand the code? Found an un-tested edge case? Spotted a poorly named variable? Raise an issue! We'd love to help.
One tension that continues to plague Scuttlebutt app developers is how to run multiple apps at the same time, each sharing the common database which is ssb-server
, without any conflicts between them.
Prompted by @mix in Proliferate the Apps, @matt shared his ideas on Why it is so hard to build SSB apps ☔:
We can imagine SSB as a protocol like http, where many different kinds of apps can be built on top and interact together.
But for this to be a reality, all of these apps must agree on a set of constraints and implement a compatible interface that allows all of them to run simultaneously and access the important data without clobbering each other.
I think that the current requirement for everyone to agree on what goes in .ssb and how this is accessed is dramatically hindering development of new ideas on this protocol. I think that the level of coordination required is the number one problem with SSB right now!
The great thing about ssb is that all of the peers that make up the network do no need to agree about many things. There is no singleton state. Everyone sees the network slightly differently. However, this does not apply to apps running on the same machine.
In order to move past this current bottleneck, I think we're gonna have to make some concessions to the requirements (shared identity, shared storage, shared data) listed above. We need to decouple the clients from each other.
In this case, you would still have a single sbot, however all that it would contain would be replication and the main log.
It wouldn't have queries or views. These would be done in the individual apps. Apps would probably read the log directly for indexing, and for writes, via some kind pipe directly into the process.
With this you get the shared identity and some amount of shared data/storage. There would still be some duplication of data, since each app would have its own indexes. However, the best part about this option is that the only thing that apps need to coordinate about is "writes". Reads would all be done straight off the clients local index (and taking advantage of the append only nature of the shared log, could just use offsets to read directly).
In support of this possible direction, the Choir met to discuss about a proof-of-concept, using the work currently being done by @piet on follower flumes: %meil3xW/YT4WJ26mAsmLVHKxxh2rbtbM7pyvpA1N4v8=.sha256.
Moving forward, @matt has now joined the Choir and this will become his focus: to unshackle the Scuttlestack from The Great Sbot, may a thousand apps bloom in consensus-free harmony! 🌻
The Sunrise Choir has become a well-functioning team, with increasing momentum, and we're just getting started on our journey. 🌄
Our current roadmap has settled into 3 parts, which we will each be presenting before and during ScuttleCamp:
- Mikey: How to evolve the Scuttlebutt protocol by evolving our development practices. Or, why a maintainable and accessible Scuttlebutt stack might mean moving from loose ad-hoc swarms to tight coordinated crews with agreed best practices, clear roadmaps, and regular rhythms.
- Piet: How to boost Patchwork to read directly from the Flume log and build a SQL-based Flume view to enable faster and friendlier queries, vastly improve initial sync times, remove a bunch of JavaScript dependencies, and make it easier to run multiple clients at the same time.
- Matt: How to support a future Scuttlebutt app ecosystem without conflicts, by reducing the shared sbot to only write to the log, and giving each app their own sbot to read (and index) the log.
Thanks for reading all the way here. If you're interested in our work, we need your support! We've recently created a Sunrise Choir OpenCollective, we'd love if you were able to contribute any amount of money to help us become financially sustainable.