Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: latest batch of meeting notes #44

Merged
merged 2 commits into from
Nov 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion notes/2022-07-13.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## July 13th - SIG Registries meeting
| | |
| | |
| -------- | -------- |
| Attending | Luke Wagner, Peter Huene, Bailey Hayes, Michelle Dhanani, Lann Martin, Brian Hardock, Nicholas Farshidmehr, Roman Volosatovs, Isabella, jlbirch
| Note Taker |
Expand Down
2 changes: 1 addition & 1 deletion notes/2022-07-20.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## July 20th - SIG Registries meeting

| | |
| | |
| -------- | -------- |
| Attending | Luke Wagner, Peter Huene, Bailey Hayes, Michelle Dhanani, Lann Martin, Roman Volosatovs, Radu Matei, Kyle Brown, Nathaniel McCallum, Ralph Squillace, Brian Hardock, Johnnie Birch
| Note Taker | Radu Matei
Expand Down
2 changes: 1 addition & 1 deletion notes/2022-07-27.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## July 27th - SIG Registries meeting

| | |
| | |
| -------- | -------- |
| Attending | Luke Wagner, Peter Huene, Lann Martin, Roman Volosatovs, Radu Matei, Kyle Brown, Brian Hardock, Johnnie Birch, Taylor Thomas, Nicholas Farshidmehr,
| Note Taker | Radu Matei
Expand Down
2 changes: 1 addition & 1 deletion notes/2022-08-03.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## August 3rd - SIG Registries meeting

| | |
| | |
| -------- | -------- |
| Attending | Luke Wagner, Peter Huene, Lann Martin, Roman Volosatovs, Radu Matei, Kyle Brown, Brian Hardock, Johnnie Birch, Michelle Dhanani, Taylor Thomas,
| Note Taker | Bailey Hayes
Expand Down
2 changes: 1 addition & 1 deletion notes/2022-08-10.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## August 10th - SIG Registries meeting

| | |
| | |
| -------- | -------- |
| Attending | Peter Huene, Lann Martin, Radu Matei, Kyle Brown, Brian Hardock, Johnnie Birch, Taylor Thomas, Ralph Squillace, Bailey Hayes, Nathaniel McCallum, George Kulakowski,
| Note Taker | Kyle Brown and Bailey Hayes
Expand Down
48 changes: 48 additions & 0 deletions notes/2022-08-17.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
## August 17th - SIG Registries meeting

| | |
| -------- | -------- |
| Attending | Kyle Brown, Bailey Hayes, Lann Martin, Peter Huene, George Kulakowski, Nicholas Farshidmehr, Nathaniel McCallum, Luke Wagner, Johnnie Birch
| Note Taker | Lann Martin

### Agenda

- [Kyle] WIP Benchmarking
- [Kyle] Addressing Scaling
- Designing for npm scale
- [Kyle] Higher-level than package log discussions
- Registry-level concerns
- How to manage scale at this level
- Work ongoing
- [Lann] Any opinions on prior topics?
- Namespacing packages by registry / mirroring
- Bailey: Discussion about mirroring a package log from e.g. SingleStore into Bytecode Alliance registry
- Need to define "fork" vs "mirror" vs other kinds of "replications"
- > (Lann is bad at taking notes, sorry)
- Kyle: Registries can choose how to handle package dependencies: mirror, link out, reject, etc.
- Luke: Package dependencies are like module dependencies
- Luke: Feels like Clojure; build a new immutable state, then bump one top-level mutable variable. Mirroring vs other kinds of replication feel like whether you are copying immutable state or pointing at mutable variables
- Bailey: From beginning of asm.js, a whitepaper was circulated with details that got people excited. We need a little more, but it would be great to publish something around log-based registries.
- Nathaniel: Same idea; once we have clear ideas we should publish. Suspect that this system is sufficiently complex to be patentable. Should we patent to protect against other people patenting?
- Lann: Publishing should protect against others patenting.
- Luke: [...] When might we be ready to publish?
- Kyle: a few weeks?
- Bailey: We should start outlining what we might publish.
- Luke: On the Bytecode Alliance blog?
- Luke: Sounds like these could eventually be published with outside standards bodies (W3C)
- > (more failure to take notes)
- Bailey: We should create discussion topics re: patents / standardization / technical publications and more markety e.g. blog posts / conferences.
- Luke: In the BCA we have researchers that would probably like to be involved in verification / publication
- [Bailey] Package Format
- https://github.com/bytecodealliance/SIG-Registries/issues/30
- Bailey: We've discussed bundling everything in a .wasm; do we have tools that produce those?
- Peter: We have some component model features enabled in wasmtime.
- <discussion about where to put metadata (description, author, etc)>
- Nathaniel: Maybe that metadata should live in the registry?
- Luke: Frank Dennis et al was working on wasm-signatures, which might be helpful.
- Bailey: What about dependencies? Are they in the component?
- Kyle: The registry could also store that information
- Nathaniel: A registry may have package information without having access to the content.
- Luke: The component needs to have information about imports. That may be copied into the registry.
- Luke: Import information would go in the "import string(s)", which would be structured to contain different information.
- Lann: The registry can index content outside of the log, it just won't be cryptographically verified there.
62 changes: 62 additions & 0 deletions notes/2022-08-24.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
## August 24th - SIG Registries meeting

| | |
| -------- | -------- |
| Attending | Kyle Brown, Nathaniel McCallum, Lann Martin, Peter Huene, George Kulakowski, Nicholas Farshidmehr, Luke Wagner, Mossaka
| Note Taker | Bailey Hayes

### Agenda

- [Kyle & Lann & Nathaniel] - Update on high-level registry design
- Discussing log and registry structure
- ppt presentation
- Certificate transparency project - detect when cert authorities are lying. Shares similar problems that we are trying to address.
- Verifiable Log (Certificate Transparency). Allow parts of the log to be verified over-time and valuable given size of the log we're expecting.
- Verifiable Map (Certificate Revocation). Essentially a tree map that is similar to verifiable log. Using in certification revocation, don't care about timeline, but care if a particular certificate has been revoked.
- Log is a collection of a diff. The map is a state
- Two related data structures, you can reconstruct the map from the log.
- Make small queries into the map without needing to replay the log.
- Extending this idea by treating each node in this map, these little-logs are smaller and consumable in their entireity.
- Hash chained log and this high-level verifiable data structure.
- Registry log => Package logs
- The top-level boxes are a single-hash that commits to the state of everything underneath it. That commits to the entire package registry state. This is what lets us scale.
- In the map, quickly locate head of a sublog then quickly walk the tree.
- 1. Permissions 2. Releases 3. Auditing 4. Annotations
- Operator Log: maintaining structures about properties of the log. Key rotation, hash rotation, auditing.
- Operator log: small
- Registry log: ~256GB, 8 mil packages. Too big for a client to handle it but fine for the registry to manage. Therefore clients verify consistency and package log inclusion.
- Package logs: small. 8 mil of them, but only download necessary packages and fully validate those.
- The linux kernel has had 755 releases. So very large project my have 1000 releases.
- Package contents are obviously going to be larger, but this covers the log contents.
- Luke: some algos like dep-solving, we could calculate with a single-hash input. In our package log, a dep-solved component, a locked component, we could make this entire thing deterministic operation by sticking this into the package log. Each package release gives a closed set of dependencies that it needs.
- Nathaniel: Other props that are important that emerge out of this too. If the client has some knowledge about some state of the log, it wants the server to commit some state, making it much more difficult to lie.
- Luke: Vaguely in the zero-knowledge proof setting?
- Nathaniel: Inclusion proofs of the log. Or a consistency proof. Prove that registry knows about old state.
- Lann: Queryable commitment
- Nathaniel: Ask registry what is the state? You're never dealing with mutable state across multiple queries.
- [Luke] Question: what's a high-level summary of our relation to sigstore?
- [Kyle] There doesn't have to be any. You can use an instance backed by them, but there isn't any constraint that you have to use sigstore to use the registry.
- [Luke] At a high-level, would these ever be complimentary? Does this subsume sigstore?
- [Lann] One way they could overlap is that you could post signatures to sigstore. One of their components is a verifiable log (rekor).
- [Luke] Is there anything that would be a candidate for re-use?
- [Lann] We will have too much data for rekor to consume. The primary way to use rekor is to publish signatures for containers.
- [Nathaniel] We're doing key management within the log and therefore we needed to build something to build it in.
- [Lann] The overlap is at the top-level not at the lower-level.
- [Nathaniel] Trust anchor. I don't have to validate the entire log. Sentinels, validators that are publicly available. Ask sentinels what heads of the log you have seen? Ask trust anchor. Between the two can validate registry and package log, and have high-confidence.
- [Luke] does sigstore have a web of trust?
- [Lann] sigstore could be like the following metaphor: equivalent to publishing your proof in the newspaper
- [Luke] Let's say you're already using sigstore for other things and you're happy with it. Could the same sigstore log cover components and containers.
- [Lann] Yes, like publish every hour or validator connect directly to the registry and receive a live stream of events.
- [Luke] Thank you. I don't have a good info on its pervasiveness or successfulness?
- [Lann] Largest in container spaces. PyPi was looking at it.
- [Nathaniel] The reason you hear about it, they do an excellent job at marketing. But we haven't seen much adoption out of the container space. If you go back to the package log slides, it would be a very reasonable thing to publish a release, then annotate the release with info with like build and git hash. Auditor comes along and reproduces the build and then the auditor could annotate that they were able to reproduce the build.
- [Bailey] events
- [Nathaniel] 2.5x operations per sec than bitcoin.
- [Kyle] we can parallelize some of this, then registry log is something we can throw a bunch of events at it. We're going to need a very performant content delivery network. All of it will be content addressable.
- [Luke] Immutable URL's that map to mutable. I might know of BA members that are great at serving content from the edge.
- [Mossaka] Raft consensus
- [Lann] We don't quite have a distributed systems problem. The amount of data we have will fit within in RAM and won't need to be distributed. Everything else can be embarrassingly parallizable.
- [Kyle] Passing up to registry logs, if it doesn't see an update to the registry for a bit, that's OK. The key is that the package log/contents shouldn't be broadcasted until uploaded to registry log.
- [Nathaniel] Dev workflow vs deployment workflow. Dev we want latest state minus some diffs. For deployment, we want full dep resolution.
- [pypi](https://discuss.python.org/t/pep-480-surviving-a-compromise-of-pypi-end-to-end-signing-of-packages/5666) is this related?
- [Dan package paper](https://cseweb.ucsd.edu/~dstefan/pubs/brown:2017:spam.pdf)
55 changes: 55 additions & 0 deletions notes/2022-08-31.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
## August 31st - SIG Registries meeting

| | |
| -------- | -------- |
| Attending | Kyle Brown, Nathaniel McCallum, Luke Wagner, Peter Huene, Lann Martin, Roman Volosatovs, Brian Hardock, Nicholas Farshidmehr, jlbirch
| Note Taker | Kyle Brown

### Agenda

- [Luke] - In registry mirroring/curating scenarios, do we *always* want to maintain the exact fully-qualified dependencies as the original, or are there any use cases for wiring them up differently in the downstream registry?
- Luke: When I have a package published in one registry that depends on a package in another registry, it seems like the default would be that that reference refers to the original registry/name. With mirroring, if I mirrored some other registry, I would store a copy of its contents.
- Luke: Are there ever cases where a package wants to refer to something using a "relative name" that isn't scoped to a fixed registry?
- Lann: That kind of functionality is out of scope for the registry, local packaging tooling could swap out a dependency. But for the registry to do this would undermine guarantees we're putting effort to prevent the registry to do so.
- Kyle: ...
- Luke: This all makes sense
- Nathaniel: Another way to express this is that we've been talking about "clients" and we need to talk about "developers" and "deployers". We have a package registry in Rust that is highly dynamic with few guarantees. So we pull down a bunch of stuff, hope its good, verify whatever, and then lock it down in a lock file with dependency hashes. Hopefully the deployer only deploys from the lock file. In the case of a registry. With a verifiable registry, the developer can pull down a bunch of stuff and the lock file just identifies a registry state and everything else is implied by that root. You don't need a separate lock because everything is reproducible.
- Luke: I was thinking this is the usual case, are there exceptions though? or is this a guarantee. For a given dependency, am I including the registry identifier (e.g. domain name) in that binding.
- Nathaniel: The question of whether cross-registry dependency is possible has not been settled
- Lann: This use case would be solved by forking and having new maintainers sign this new package, that you then swap in.
- Lann: A big-ish team that wants to track upstream dependencies but needs to carry patches. You could fork that package while carrying patches.
- Luke: NPM Link is popular. It allows you to resolve locally with some substitution.
- Lann: I think that actually even sym-links in the substitue
- Luke: I can imagine solving this locally, I just wanted to make sure that was the plan
- Lann: There is an open question of whether dependency specification is name/version or hash-based with another lockfile behind that. To some sense that's out of scope, but very important for local tooling story.
- Luke: I could use a content identifier and if I verify/trust hte log, I can determine the content hash. I could obviously put it inline to be more paranoid.
- Lann: That would open you up to potential transient attacks.
- Nathaniel: Is this also true for "deployers"?
- Luke: It's a b
- Nathaniel: The developer is an important user, but if a registry was vulnerable when the developer updated, they would get these false resolutions if the registry were lying. That could be backed into a deployment, but with the requirement that the developer accepted it
- Lann: You should be limited to the damage a deployer could do locally. You could write a system where your component has arbitrary disk access. That's a major issue today with supply-chain vulns. developer machines get compromised by these malicious packages ignoring deployment. It's quite bad for even a developer machine to be compromised.
- [Follow Up] Are cross registry dependencies allowed?
- Luke: Is there a meaningful difference between pointing to a package in my registry vs. another?
- Lann: That depends on the intersection between mirroring and remote references. The main question at hte high-level design, is the implications of trust when using an external registry. You implicitly (to some limited extent) trust your primary registry. There is some degree of trust (that you verify) that you have in that, and when you have remote dependencies (if remote means, you're leaving hte initial trust domain) then you're expanding your trust in the universe, potentially surprisingly.
- Nathaniel: There's also the old wine and sewage analogy. What happens in this situation is that you get the trustability of the least trustworthy registry in this web, which is a way to rapidly decrease trust.
- Kyle: Operator policy allows you to limit what links are possible.
- Nthaniel: Registries don't build on each other, they centralize on one main one to keep quality high.
- Lann: At the protocol level, I don't think there's anything wrong with allowing cross-registry dependencies. The policy level could handle this as Kyle is saying. The BA registry may want to do this for example.
- Luke: A theoretical Fastly registry may allow only the BA registry to be remote-linked.
- Nathaniel: The developer is not really trusting the Fastly and BA registry, they are trusting the Fastly registry and transitively through it's trust of BA registry, trusting it. Fastly needs.
- Kyle: ...
- Nathaniel: With transitive trust, we want to encourage behavior with good cyclic properties. We want to encourage developers to opt-in to trusting one registry and have that registry manage trust in other registries for them.
- Luke: Separation of development and deployment registries. That deployment registry needs to be the things that's super up. When you deploy the transitive dependencies need to be fully mirrored. BA should only be quiried at development time. We have hashes that lock things, but the mirroring only happens when you do a deployment.
- Nathaniel: If there is a lock file that locks down the contents, then the only thing that matters about what registry you talk to is Confidentiality and Availability.
- Luke: I want that availability and some of that confidentially. Maybe "Fastly" is fetching a lot, but they don't know that Luke is doing so. I want the.
- Luke: A deployment registry will be a very different kind of registry. I don't know if these are different implementations or one with different compile flags. It's
- Taylor: You may want to proxy things straigh to the Bytecode Alliance, but have things where
- Nathaniel: I think we can support both of them and should. One way we can talk about this is the "correlation" between the different registry logs. The question is "where do those logs get correlated?". The answer has to be before you prepare the content for deployment and locks everything down and resolves everything. The pass-through defers the locking down, the mirroring does it eagerly.
- Luke: Do you mean associating roots for both regsitries when you say lock?
- Nathaniel: I need to associate every registry in my dependency graph together. I can just assert the heads of all the logs, which makes resolution deterministic. If one log wants to sync with another, then I can derive a head of one log from another. I think we should enable both and it's worth stating that developer->deployer you need to lock them down.
- Luke: I'll note in my EBNF diagram, my depth-solve involved taking in parameters that produced the content hashes. You can verify that the depth-solve produced these.
- Lann: Not to derail too much, but you could provide a content hash for the component that performed the depth-solve
- Luke: We can spec out the depth solve, but that characteristic is super interesting.
- Lann: Signing the component lets you verify your implementation in addition to your spec
- Nathaniel: nothing is perfectly reproducible

11 changes: 11 additions & 0 deletions notes/2022-09-14.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## September 14th - SIG Registries meeting

| | |
| -------- | -------- |
| Attending | Kyle Brown, Nathaniel McCallum, Luke Wagner, Peter Huene, Lann Martin, Roman Volosatovs, Brian Hardock, Nicholas Farshidmehr, Mossaka, David Justice, Bailey Hayes
| Note Taker |

### Agenda

- [Nathaniel / Kyle] Data structures progress
- Kyle: Initial append-only log is merged, upcoming PR will switch to an index-based API. Next up is working on external API design
Loading