Skip to content

Conversation

AlexTMjugador
Copy link
Contributor

@AlexTMjugador AlexTMjugador commented Sep 5, 2025

Context, problem statement and description

While giving the new sqlx.toml feature a try, I discovered that its database_url_var setting behaved inconsistently compared to other environment variables: it wasn't loaded from .env files, causing it to be effectively ignored in favor of the default DATABASE_URL variable. This is an inconvenient outcome for the multi-database workspaces that this feature is designed to support.

To address this issue, I reworked the .env loading logic to account for this configuration and make it simpler to use to achieve consistent behavior. The new algorithm works as follows:

  • First, it generates lists of environment variables that can be loaded from .env files and candidate .env file paths. When applicable, the compiler is informed to track changes to their elements.
  • Next, it loads the set of potentially tracked environment variables to a static hash map (in a previous revision of this PR, set_env was used, but I changed it as a response to the review comments below). When a variable is defined in both the process environment and a .env file, the process environment takes precedence, as before.
  • Macro code can now use the env function freely to read environment variable values, abstracting itself away from their source, which results in simpler, less error-prone code.

Trivially, this rework resolves the issue I encountered because the database_url_var value is now part of the list of loadable environment variables. Future code can easily make such additions as necessary.

Does your PR solve an issue?

To my knowledge, this PR doesn't directly address any previously reported issue in this repository.

Is this a breaking change?

Technically yes when compared to the released 0.9.0 alpha version, as environment variables like database_url_var may now be loaded from another source. However, I don't think this technically breaking change will cause significant inconvenience for most users, especially since the new behavior is more consistent and useful, and the affected variables are bound to not be widely used due to them being published as an alpha release so far.

@AlexTMjugador AlexTMjugador force-pushed the fix/consistent-macro-env-handling branch from a8a9e0c to 89858d7 Compare September 5, 2025 10:32
Copy link
Collaborator

@abonander abonander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, but needs a bit more work.

Comment on lines 425 to 426
static LOADED_ENV_VARS: Mutex<HashMap<String, String, BuildHasherDefault<DefaultHasher>>> =
Mutex::new(HashMap::with_hasher(BuildHasherDefault::new()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't actually use a single global static here, because of Rust-Analzyer. RA will load the proc-macro dylib into its process and keep it resident for all compiler invocations, so we can't assume the state is reset between different crates.

This was the cause of #3738 and the solution was to store all relevant context per-crate keyed by CARGO_MANIFEST_DIR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason, we might actually want to store the modified time of the .env files we load and check for changes to that. If the file has been modified, we need to re-load it.

Copy link
Contributor Author

@AlexTMjugador AlexTMjugador Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about our approach to cache invalidation within this macro, and to be honest, I'm not sure I precisely understand what the goals and non-goals here are.

From what I can tell, our code for reacting to environment variable and .env file changes depends on unstable proc_macro features gated by procmacro2_semver_exempt. Since it can be assumed that most people are using a stable Rust toolchain, most people aren't getting any sort of change tracking in the macro either, so why the suggestion to handle .env files in a somewhat special way by manually checking their modification times? My understanding is that the intention is to rely on proc_macro::tracked_path for refreshing these, right?

Now, onto the elephant in the room: the static METADATA cache. While it is keyed by crate manifest path, cache entries are never invalidated, so the way I see it, that means the following sequence of events could happen even without this PR:

  • The proc macro runs once for crate A with procmacro2_semver_exempt enabled, populating METADATA and instructing the compiler to re-evaluate the macro when environment variables change.
  • The value of DATABASE_URL is modified.
  • The proc macro runs again for crate A after the compiler/RA notices the change. However, since a METADATA entry for crate A already exists, the proc macro library may not be unloaded between executions, and any set_env across threads is not necessarily thread-safe, the stale database URL from the cache may get used instead.

That may not be a huge issue in practice, since users always can restart RA, not use RA, or clear build caches if things go wrong. But it does raise the question of this PR's scope. Should we be trying to tackle cache invalidation here, or is that rabbit hole best left out of scope?

While giving the new `sqlx.toml` feature a try, I discovered that its
`database_url_var` setting behaved inconsistently compared to other environment
variables: it wasn't loaded from `.env` files, causing it to be effectively
ignored in favor of the default `DATABASE_URL` variable. This is an inconvenient
outcome for the multi-database workspaces that this feature is designed to
support.

To address this issue, I reworked the `.env` loading logic to account for this
configuration and make it simpler to use to achieve consistent behavior. The new
algorithm works as follows:

- First, it generates lists of environment variables that can be loaded from
  `.env` files and candidate `.env` file paths. When applicable, the compiler is
  informed to track changes to their elements.
- Next, it loads the set of potentially tracked environment variables using
  `set_var`, similar to how `dotenvy::dotenv()` operates. When a variable is
  defined in both the process environment and a `.env` file, the process
  environment takes precedence, as before.
- Macro code can now use the `env` function freely to read environment variable
  values, abstracting itself away from their source, which results in simpler,
  less error-prone code.

Trivially, this rework resolves the issue I encountered because the
`database_url_var` value is now part of the list of loadable environment
variables. Future code can easily make such additions as necessary.
@AlexTMjugador AlexTMjugador force-pushed the fix/consistent-macro-env-handling branch from 2b5e2fd to 57c1dcf Compare September 25, 2025 18:28
@AlexTMjugador
Copy link
Contributor Author

I've finally addressed most of the review comments in the latest commits I pushed to this PR, which I also rebased on top of the latest main commit at this time! 🎉

As requested, the static map of environment variables loaded from .env files is now keyed by the corresponding crate manifest directory, which should play nicer with RA's proc-macro execution model.

To preserve the ergonomics of the existing env function by not introducing a manifest directory parameter, a new thread-local variable, CURRENT_CRATE_MANIFEST_DIR, is initialized early on with the manifest directory of the crate whose macro is being expanded. This variable can then be read by any env invocation to determine the correct map to query. Using a thread-local here should be alright for the foreseeable future, since there's little reason to introduce multithreading on our part in a proc macro.

See also PR #4039 for a different implementation take on this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants