-
-
Notifications
You must be signed in to change notification settings - Fork 404
Make out/
folder contents (more) reproducible and filesystem layout agnostic (1500USD bounty)
#3660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
out/
folder contents filesystem layout agnostic (1000USD bounty)out/
folder contents reproducible and filesystem layout agnostic (1000USD bounty)
out/
folder contents reproducible and filesystem layout agnostic (1000USD bounty)out/
folder contents (more) reproducible and filesystem layout agnostic (1000USD bounty)
out/
folder contents (more) reproducible and filesystem layout agnostic (1000USD bounty)out/
folder contents (more) reproducible and filesystem layout agnostic (1500USD bounty)
Trying this issue, it looks interesting... |
@rahat2134 got for it! Feel free to ask here if you have any questions |
…rocesses (#365) Necessary for com-lihaoyi/mill#3660 in Mill. Essentially we need a way to (a) serialize absolute paths as relative paths and (b) set up the necessary symlinks in any subprocess folder such that the relative paths point to the correct absolute location
Does reproducibility have to be a property? What if it is defined as a transformation? A filesystem agnostic image can be created by replacing (string) values with "known env vars" in task output file copies. |
I would be interested in working on this if the issue is still open? Edit: Upon further reading, I am requesting a lock on the bounty--- for at 1 week. If I don't have anything mostly done by then, you can free it. Seeing as how its from October, I don't think that is an unreasonable amount of time. Thank you. Update 2: I have studied the project in the 5 days now since, and have a working prototype. I should, in 2 days have a hopefully, somewhat working prototype. And the time after will be left for writing tests :) Update 3: https://github.com/albassort/mill I predict it will be mostly be done by Wednesday, maybe the test will be done on Wednesday |
Our large Scala projects workaround the reproducibility issues with this hacky tool: https://github.com/Avimitin/mill-ivy-fetcher and finally made mill-based solutions reproducible. It has been fully integrated into chipsalliance/t1. I think we may help the out dir reproducibility, and wanna hear more suggestions from haoyi. |
PR Submitted |
Uh oh!
There was an error while loading. Please reload this page.
From the maintainer Li Haoyi: I'm putting a 1500USD bounty on this issue, payable by bank transfer on a merged PR implementing this.
The goal of this ticket is to make the
out/
folder contents more reproducible, such that it contains the same bytes and hashes regardless of the user's filesystem layout outside of that folder. This is would allow re-using theout/
folder as a build cache between different machines that may have the checkout in different place (e.g./Users/alice/my-repository
vs/Users/charlie/my-repository
), both coarse grained (e.g. by sending over a zip file) and fine grained (via the bazel remote cache protocol)The main thing that needs to happen is that every
os.Path
andmill.api.PathRef
that is serialized within a "known" directory needs to be normalized to a path relative to an abstract reference to that known directory. e.g./Users/alice/my-repository/out/foo/bar.dest/qux
should be serialized as$WORKSPACE/out/foo/bar.dest/qux
/Users/lihaoyi/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-library/2.13.14/scala-library-2.13.14.jar
should be serialized as$COURSIER_CACHE/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-library/2.13.14/scala-library-2.13.14.jar
/Users/alice/thing-outside-repository
should be serialized as$HOME/thing-outside-repository
AFAIK the necessary known roots should all be available globally (e.g.
mill.api.workspace.WorkspaceRoot.workspaceRoot
,os.home
,sys.env("COURSIER_CACHE")
). It should be easy enough to add to the serialization logic:mill.api.PathRef
serializationmill/main/api/src/mill/api/PathRef.scala
Lines 175 to 197 in e0a2c93
os.Path
serializationmill/main/api/src/mill/api/JsonFormatters.scala
Lines 27 to 31 in e0a2c93
Apart from
PathRef
andPath
, we will also need to deal with:Files in
out/
which are naturally non-deterministic:mill-profile.json
,mill-chrome-profile.json
,mill-server/*
andmill-no-server/*
, etc.Modified times are also expected to vary. These may need to be zeroed out in the process of making
zip
andjar
files such that they do not affect the byte contents, and ignored as part of any equivalence comparisonAny
foo.json
files belonging to workers can also be expected to differ since they contain thetoString
of the worker, and may need to be renamed tofoo.worker.json
or similar to make them identifiable.There will also be inherent differences between files generated on different platforms (e.g. native binaries). This is fine for now, and likely unavoidable.
There may be other files that need to be made reproducible that are not listed here
The success criteria would be a test in
integration/feature/
that:example/scalalib/web/5-webapp-scalajs-shared
into two separate subfolders.example/scalalib/web/5-webapp-scalajs-shared
is somewhat arbitrary, but should give us good coverage of a variety of Mill module and task types, exercising a wide range of code paths./mill runBackground && ./mill clean runBackground && ./mill jar && ./mill assembly
in each folderCOURSIER_CACHE
and-Duser.home
passed in),out/
folder is byte-for-byte identicalRelated issues with prior discussion:
The text was updated successfully, but these errors were encountered: