Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suport for layer mounting + garbage collector refactor #92

Merged
merged 13 commits into from
Feb 19, 2025

Conversation

Dramelac
Copy link
Contributor

@Dramelac Dramelac commented Jan 29, 2025

Hello,

Here is a PR to add support for layer mounting operations (issue #85).

Overview

I have chosen to create pseudo-symlink system with R2 file and customMetadata to link a blob file to an existing blob on any other repository inside the registry.
An R2 symlink will include a customMetadata: X-Serverless-Registry-Symlink as key and the name of the targeted repository in its value (to avoid parsing the repository name from the full path of the targeted blob). The content of this symlink file is the full path to the targeted blob as text.

Changes

Registry

  • New mountExistingLayer method for creating a new blob as an R2 symlink targeting another existing blob in another repository (if it exists).
    • This method avoids recursive symlinks
    • A layer mount operation only needs to return the new layer's digest and location URL
  • Downloading a blob (or layer) now automatically follows R2 symlink to the referenced blob.

Router

When uploading a new blob, if the client successfully mounts a layer, the registry should reply with code 201. If there is any error during the layer mounting operation, it will automatically fallback to the classic layer uploading process with response code 202.

Garbage collector

The garbage collector collectInner function has been refactored to address many changes, including some existing bugs and limitations:

  • Config blobs were ignored and not considered as referenced and then removed by GC causing errors
  • Manifest list was not supported, ignoring architecture-specific untagged manifests and referenced blobs
  • The GC was unable to delete untagged manifests in untagged mode, only blobs
    With the new refactored function, it can now:
  • Manage layer and config blobs for manifest v1 and v2
  • Support manifest-list references in both unreferenced and untagged mode
  • Support manifest deletion in untagged mode
  • Add security checks for unreferenced R2 symlinks

Limitation of the new method / design

  • I didn't include the deleteThreshold / number of blobs to delete at once, I don't know if this limit is still needed or not.
  • Because the R2 symlink directly targets another blob in another directory, we need to list all the blobs in each repository (except the current one) to check that there is no symlink reference to the blob before removal. This check can be quite demanding for large registry.
    • If it could be possible to update customMetadata of an existing R2 object with the Worker R2 API, we could update a referenced counter on each symlink target blob and skip this heavy step but as far as I know it doesn't seem possible at the moment. However, if it's planned / soon-to-be release feature, it should be added in this first implementation to avoid adding an upgrade task later on.

The garbage collector list function now includes customMetadata to detect R2 symlinks.

This change requires updating the tsconfig.base.json file by adding a minimum compatibility date. I've added the most recent date 2023-07-01 but it can be replaced by any older one that has the include parameter.

Tests

To check the new features, I've added several tests.

I'm also waiting for PR #89 to be merged as it'll introduce some test changes that will cause a conflict with this PR.
Updated !

Updating existing tests

  • The generateManifest function allow to generate a simple v1 or v2 manifest and upload referenced blobs to a repository.
    This function had several problems:
    • blob data was hardcoded, so several images had the same blob digest, wich introduced unrealistic results
    • the layer blob and config blob were the same, missing the bug in the old garbage collector that deleted it while it was still in use.
      The new version of this function generates random layer data and a unique config blob for each manifest, allowing different bugs to be detected. This change has introduced the need to update several tests to handle the new number of unique blobs.
  • The v2 manifests/PUT then list tags with GET /v2/:name/tags/list test had to be reduced down from 50 images to 40 as it introduced some vitest instability, sometimes it works and sometime I get a Network error without any change or reason...
  • Added blob count check on some tests

Generic methods

  • Added a new generateManifestList method to generate a manifest-list v2 from 2 manifests for AMD and ARM architectures.
  • Added a new getLayersFromManifest method to extract every blob referenced by a given v1 or v2 manifest.
  • Added a new mountLayersFromManifest method to mount every blob of a manifest from one repository to another.
  • Added a new createManifestList method to generate two manifest and a manifest-list and upload all three to a repository with or without a tag.
  • Added a new runGarbageCollector method to run the garbage collector on a given repository in a given mode (unreferenced, untagged or both).

New tests

v2 manifest

Added a new test with recursive layers mounting of a simple image manifest

v2 manifest-list

Added a new tests to upload manifest-list images with or without layer mounting

garbage collector

Added a few tests to check that the garbage collector deletes blobs and manifests as required, no more and no less.

  • Test GC with single arch images
    • no action required
    • in untagged mode
    • in unreferenced blobs
  • Test GC with mutli-arch / manifest-list images
    • no action required
    • in untagged mode
    • in unreferenced blobs
  • Test GC with mutli-arch / manifest-list image including R2 symlink references
    • no action required
    • in unreferenced / untagged mode on repo A
    • in unreferenced / untagged mode on repo B

@Dramelac
Copy link
Contributor Author

Dramelac commented Jan 30, 2025

Hello,

I noticed that the RegistryHTTPClient class also implements the Registry interface in which I defined mountExistingLayer and added it to R2Registry. I don't quite understand the role / purpose of RegistryHTTPClient and if / how I should implement mountExistingLayer too in it.

For now I've added a method with throw new Error("unimplemented");.

@gabivlj
Copy link
Collaborator

gabivlj commented Jan 30, 2025

Hello,

I noticed that the RegistryHTTPClient class also implements the Registry interface in which I defined mountExistingLayer and added it to R2Registry. I don't quite understand the role / purpose of RegistryHTTPClient and if / how I should implement mountExistingLayer too in it.

For now I've added a method with throw new Error("unimplemented");.

Unimplemented for http client is OK as its not supposed to support uploads.

@gabivlj
Copy link
Collaborator

gabivlj commented Jan 30, 2025

Amazing work @Dramelac, lgtm. Just one nit comment.

@gabivlj gabivlj assigned Dramelac and gabivlj and unassigned Dramelac Jan 30, 2025
@Dramelac
Copy link
Contributor Author

Thanks @gabivlj !
About this subject, what you do think ?

If it could be possible to update customMetadata of an existing R2 object with the Worker R2 API, we could update a reference counter on each symlink target blob and skip this heavy step but as far as I know it doesn't seem possible at the moment. However, if it's planned / soon-to-be release feature, it should be added in this first implementation to avoid adding an upgrade task later on.

@gabivlj
Copy link
Collaborator

gabivlj commented Jan 30, 2025

Thanks @gabivlj ! About this subject, what you do think ?

If it could be possible to update customMetadata of an existing R2 object with the Worker R2 API, we could update a reference counter on each symlink target blob and skip this heavy step but as far as I know it doesn't seem possible at the moment. However, if it's planned / soon-to-be release feature, it should be added in this first implementation to avoid adding an upgrade task later on.

I am ok with doing this as an alternative for now. I agree its a very expensive operation though. I think garbage collection with R2 only will always be a harder problem to solve, for now lets just make it work with a small n and if we see people encounter many CPU/memory limits exceeded we can think a bit harder about it.

@Dramelac
Copy link
Contributor Author

Updated !
I also added a small optimization by filtering the target of the symlink with the current repository on which GC is running, without needing to fetch the contents of the symbolic link that targets another repository.

@Dramelac Dramelac requested a review from gabivlj February 11, 2025 17:12
@gabivlj
Copy link
Collaborator

gabivlj commented Feb 19, 2025

Sorry for the delay to merge this. LGTM :). Thank you!

@gabivlj gabivlj merged commit 136562c into cloudflare:main Feb 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants