Description
We would like to set up a single global cache for all Roc assets, including compiler versions, packages, and build artifacts. We should first find an appropriate cache folder, one of the following (taken from our current approach):
- The XDG_CACHE_HOME environment variable, if it's set.
- Otherwise, ~/.cache on UNIX and %APPDATA% on Windows.
And then the Roc cache directory will be a folder named "roc" within that folder on Unix systems, and "Roc" on Windows systems. So ~/.cache/roc
will be typical on UNIX, and %APPDATA%\\Roc
will be typical on Windows.
It will have three subdirectories:
compiler/
note: this is a tentative plan that will be cleaned up later.
The compiler/
directory will be the simplest, and it will contain a flat collection of compiler binaries named after their respective versions, i.e. 0.1.0
, or a commit hash for nightly releases. There will be one more executable named simply roc
which is a symlink to the currently selected Roc compiler version. This folder will be populated by a future issue to manage compiler versions, which will be properly designed later, but has been at least partially discussed in this thread on Zulip, which surrounded this Google doc.
build/
For each Roc project in the user's filesystem, we will hash the main file for the project (main.roc
for packages and platforms and the <app name>.roc
for apps) and use that as the root folder for that project in the global cache. The next file level will be the Roc version (e.g. 0.1.0
). And then the compile artifacts for the project will be stored in a flat collection within that version-named folder.
For each *.roc
source file in the user's project, when caching, we should take the base64-encoded BLAKE3 hash of the source file's contents (the same hashing scheme we use for packaging) and store all cacheable artifacts for that source file (i.e. canonicalization info, type info, etc.) in build/<project hash>/<roc version>/<file content hash>
. To manage the cache size, we plan the following strategy for when to write to/read from the cache:
- for every file in the project, find the hash of its contents
- if there is a file with that hash's name in the project's build cache folder, use it
- if it is not there, compile the module in isolation and cache its artifacts in the project's build cache folder
- all other files in the project's build cache folder should be deleted
When writing to the cache, we should first generate a random file in the system's temp directory, save the build artifacts to that file, and then atomically rename the file to the intended cache file. This will avoid two compiler instance writing to the same file and corrupting the contents.
packages/
All packages have their cache files in the packages/
subdirectory, and they follow the scheme ~/.cache/roc/packages/<repository website>/<archive hash>/...
. For example, the v0.5.1 release of Weaver would go in ~/.cache/roc/packages/github.com/nqyqbOkpECWgDUMbY-rG9ug883TVbOimHZFHek-bQeI/...
. This is the format we are using already.
Each package has two subdirectories, one for the packages source, and the other for its build artifacts.
src/
This directory will contain the uncompressed files in their provided directory structure from the archive downloaded from the internet.
build/
This directory works almost the same way as the primary build/
cache directory for user code. The difference is that we don't first store everything in a folder named by a hash of the project's main.roc
file, since the source of the package is immutable. We also read all files and hash their contents, and load the cached artifacts we have, and calculate the rest. However, there's no need to look for files to delete given the immutability of the package. In the future, we can attempt to store some info per-package per-Roc version to avoid needing to read and hash all files per package every time.
Directory layout overview
For example the directory for the mentioned Weaver version would look like this inside of the ~/.cache/roc/
cache:
~/.cache/roc/
compiler/
roc
0.1.0
<commit hash>
build/
<project main.roc hash>/
<roc version>/
<build artifacts by file hash>
packages/
github.com/
nqyqbOkpECWgDUMbY-rG9ug883TVbOimHZFHek-bQeI/
src/
<decompressed source files>
build/
<roc version>/
<build artifacts by file hash>
Some notes:
- All hashing (besides Git commit hashes) should following package URL hashing, which is a base64-encoded BLAKE3 hashing of the given data.
- This design is not necessarily written in stone, so whoever implements this should expect the possibility of design discussion before a PR is merged. The implementer can lower the odds of needing to rewrite their work by double-checking that their approach makes sense in the original Zulip thread.
- The current packaging cache should be combined/reworked into a singular cache as a separate crate, maybe named
roc_cache
inroc-lang/roc/crates/cache/
.