Skip to content

crate_universe: Review how to interract with environment variables #662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
UebelAndre opened this issue Mar 30, 2021 · 8 comments
Closed

Comments

@UebelAndre
Copy link
Collaborator

Continuing a conversation from #655 (comment)

we should only force things to be included that affect the outputs and allow for users to explicitly request that additional environment variables get used.

I think that's a reasonable approach - given that, what should this actually look like for env vars?

Should we have an allowlist in crate_universe_resolver (requiring changes to rules_rust to allowthings)?
Should we have a user-supplied allowlist in the crate_universe rule (potentially requiring every repo to duplicate the same data)?
Should we have both of the above, so that we have a shared list, but also users can augment it if they need?
Should we ignore or warn or error if you have CARGO_ env vars set which we will ignore, because you may get surprising different results building the same code in cargo and bazel?

This may be my lack of familiarity with cargo's dependency resolution but I feel it's not quite correct to have the version factor into the hash of the lockfile. Can two versions of cargo not generate the same set of dependencies when given the same lockfile?

Given the same Cargo.toml, two versions of cargo can resolve different deps, yeah. e.g. in 1.52 rust-lang/cargo#9255 will be released, and in the next edition the default resolver is going to switch from 1 to 2.

cc @illicitonion

@UebelAndre
Copy link
Collaborator Author

Given the same Cargo.toml, two versions of cargo can resolve different deps, yeah. e.g. in 1.52 rust-lang/cargo#9255 will be released, and in the next edition the default resolver is going to switch from 1 to 2.

I think the smarter thing to do would be to check for the version of the feature resolver and embed that into the metadata since that's the thing impacting the lockfile and not specifically the version of cargo. The version of cargo just has different behavior for when that's not set.

@illicitonion
Copy link
Collaborator

Given the same Cargo.toml, two versions of cargo can resolve different deps, yeah. e.g. in 1.52 rust-lang/cargo#9255 will be released, and in the next edition the default resolver is going to switch from 1 to 2.

I think the smarter thing to do would be to check for the version of the feature resolver and embed that into the metadata since that's the thing impacting the lockfile and not specifically the version of cargo. The version of cargo just has different behavior for when that's not set.

Sure, we could hard-code logic to parse out the edition and the cargo version and embed that. But how would you handle the other example I gave? Where a bug was fixed which means that dependencies which used to accidentally be put into multiple kinds (e.g. a normal dep would also appear as a dev dep) is now only put in one? Or the I'm sure dozens of other bugfixes or changes in behaviour which mean that resolves are slightly different.

My point is there are a bunch of places where different cargo versions can lead to different behaviours. We could try to find all of them and hard-code an indicator into our cache key (where for the other example I gave, the thing we're hard-coding would be "Are you >= 1.52"), but that ends up basically coming down to embedding the cargo version. We'd get a small gain (we would maybe consider 1.50 and 1.51 equivalent, for instance), and the expense of maybe being incorrect because we didn't analyse every change that went into cargo correctly...

@UebelAndre
Copy link
Collaborator Author

My point is there are a bunch of places where different cargo versions can lead to different behaviours. We could try to find all of them and hard-code an indicator into our cache key (where for the other example I gave, the thing we're hard-coding would be "Are you >= 1.52"), but that ends up basically coming down to embedding the cargo version. We'd get a small gain (we would maybe consider 1.50 and 1.51 equivalent, for instance), and the expense of maybe being incorrect because we didn't analyse every change that went into cargo correctly...

Perhaps it's not necessary to include this information at all? There seems to be two modes here (which might be obvious to some):

  1. generating a lockfile
  2. consuming a lockfile

I think it makes sense to store information like the cargo version and any and all information related to generating a lockfile to ensure it's reproducible given the same available revisions and releases. But once a lockfile is generated, I no longer care about the way the file was generated, I want to take a lockfile and go get those dependencies. It sounds like the current implementation combines those two steps and prevents the lockfile from being usable between two different versions of cargo? Is that true?

@illicitonion
Copy link
Collaborator

Going back to the two important properties from #655 (comment):

if you're using a lockfile:

  1. When I first git clone and bazel build a repo with a lockfile in place, I don't need to do expensive work. Expensive work, in my mind, includes fetching a crates index, or performing constraint solving on versions. Basically, we shouldn't need to invoke cargo.
  2. If any of the Cargo.toml files or package entries, or anything else we use in version resolution (e.g. the version of cargo) change in ways that change anything about how the final targets are generated, it's impossible to use stale values from a previous generation.

If we ignore the version of cargo being used, we violate 2.

Example scenario:

  • I have my repo, and generated a lockfile with cargo 1.51.
  • cargo 1.52 is released with a bug in cargo which means my resolve should start failing.
  • I decide to commit a version bump pinning my repo to rust 1.52 - my expectation is that if this update breaks my repo, I'll find out during this commit.
  • My commit has green CI and gets merged.
  • Next time someone adds a dependency to a Cargo.toml file, they re-generate the lockfile, and find that they break the repo.

This is what we're trying to avoid. We want the person who changes the cargo version to be the person who finds out they broke it, rather than the hapless person who next happens to touch the file.

@UebelAndre
Copy link
Collaborator Author

UebelAndre commented Mar 30, 2021

This is what we're trying to avoid. We want the person who changes the cargo version to be the person who finds out they broke it, rather than the hapless person who next happens to touch the file.

This makes the file unsharable. Which is fine, but then how do you allow for cross workspace dependency resolution?

@illicitonion
Copy link
Collaborator

This is what we're trying to avoid. We want the person who changes the cargo version to be the person who finds out they broke it, rather than the hapless person who next happens to touch the file.

This makes the file unsharable.

What makes the file unsharable?

@UebelAndre
Copy link
Collaborator Author

What makes the file unsharable?

WORKSPACE.bazel

workspace(name = "universe")

# Pretend for a moment `cargo-raze` was built using crate_universe
http_archive(
    name = "cargo_raze",
    # ...
)

load("@rules_rust//crate_universe:defs.bzl", "crate_universe")

crate_universe(
    name = "crates",
    some_new_attribute_for_additional_lockfiles = ["@cargo_raze//impl:lockfile"],
)

BUILD.bazel

rust_binary(
    name = "mars",
    deps = [
        "@crates//:cargo_raze",
        # ...
    ],
    # ...
)

This is specifically what I'm looking to be able to support and what I mean by "sharable". Perhaps the lockfile is not the thing that should be sharable. And if that's the case then I agree with you, it'd make sense to have a very locked down lockfile which should largely be based on the configuration of the current repo, then have something else which can be used to share dependency trees.

This is seeming like a separate feature request but I think is important to consider early on.

@UebelAndre
Copy link
Collaborator Author

I don't think this is relevant since the crate_universe rewrite. Closing this out for now but feel free to reopen if there's an aspect of it that I missed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants