Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the deal with = in environment variable names? #23331

Open
squeek502 opened this issue Mar 23, 2025 · 2 comments
Open

What's the deal with = in environment variable names? #23331

squeek502 opened this issue Mar 23, 2025 · 2 comments
Labels
standard library This issue involves writing Zig code for the standard library.

Comments

@squeek502
Copy link
Collaborator

squeek502 commented Mar 23, 2025

Context: #23265 and #23272

Environment variable blocks on POSIX and Windows contain strings of the format name=value, where the name cannot contain a = character*

Note

On Windows, a = is allowed, but only as the first character of an environment variable name (e.g. =FOO). These are semi-hidden (they don't show up when running set) and are used for things like per-drive CWDs, see https://devblogs.microsoft.com/oldnewthing/20100506-00/?p=14133).
More about this later

From the POSIX spec on environment variables:

names shall not contain any bytes that have the encoded value of the character '='

Although the POSIX spec seems fairly clear, C libraries/OS APIs don't seem to agree on how to handle getenv calls for a name with a = in it.

Here's the test case:

  • An environment variable named FOO set to the value ABC=123, so in the environment block it looks like FOO=ABC=123
  • Calling the libc's getenv("FOO=ABC") or the equivalent OS API
Test project

Run with zig build

// build.zig
const std = @import("std");

pub fn build(b: *std.Build) void {
    const test_step = b.step("test", "Test it");
    b.default_step = test_step;

    const optimize = b.standardOptimizeOption(.{});
    const target = b.standardTargetOptions(.{});

    const main = b.addExecutable(.{
        .name = "main",
        .root_module = b.createModule(.{
            .root_source_file = b.path("main.zig"),
            .target = target,
            .optimize = optimize,
            .link_libc = true,
        }),
    });

    const run = b.addRunArtifact(main);
    run.clearEnvironment();
    run.setEnvironmentVariable("FOO", "ABC=123");
    run.disable_zig_progress = true;
    test_step.dependOn(&run.step);
}
// main.zig
const std = @import("std");

pub fn main() !void {
    if (std.c.getenv("FOO=ABC")) |value_ptr| {
        std.debug.print("{s}\n", .{std.mem.span(value_ptr)});
    } else {
        std.debug.print("null\n", .{});
    }
}

Here's the results of that from what @g-logunov and I have tested so far:

  • glibc returns 123
  • musl returns null
  • macos libc returns ABC=123
  • MSVC libc returns 123
  • MinGW libc returns 123
  • GetEnvironmentVariableW returns ERROR_ENVVAR_NOT_FOUND
  • Currently, the Zig environment variable getter APIs return null (as long as they don't call into libc)

Note

On Windows, those semi-hidden variables that start with = cannot be gotten via the libc getenv API, but can be gotten from GetEnvironmentVariableW


So, what should the Zig APIs do? I can think of a few options:

  1. Ensure that we return null if the user ever asks to look up an environment variable name with a = in it (but allow getting semi-hidden environment variables on Windows)
  2. Assume environment variable names that the user provides will not contain = and let whatever happens happen (as long as it doesn't lead to illegal behavior), and focus solely on providing the highest performance implementation possible otherwise
    • This is kind of what I'm assuming the libc implementations (other than musl) have gone with
  3. Do option 2, but also assert that the name does not contain = in optimization modes that have safety enabled (Debug/ReleaseSafe)
  4. Choose some arbitrary libc behavior and go with that across the board
  5. Try to match the 'native' libc behavior for the platform

Personally, I think option 1 or option 3 make the most sense.

(for the hidden Windows environment variables, I think being able to get them makes sense, so no need to change anything there IMO)

@squeek502 squeek502 added the standard library This issue involves writing Zig code for the standard library. label Mar 23, 2025
@rootbeer
Copy link
Contributor

Given env vars are often system specific, and the embedded = behavior is ill-specified ("don't do that", doesn't say what the semantics are when it does happen), I think its fine if Zig's behavior is also vague here. Just don't crash, and don't return unreasonable garbage. So I'm putting in a vote for your option #2. I don't think Zig should assert on the = not being present.

For the FOO=ABC=123 test case, I'm curious how getenv("FOO") and getenv("FOO=DEF") behave. Oh, and getenv("ABC"). My expectation is just that these don't crash, and for a given system they behave consistently (e.g., re-running the program on the same system should give the same behavior) but other than that, it seems reasonable to say anything of not-found, or some variation on ABC=123 or 123.

Here's a bug from NodeJS where they discovered some platforms trim a leading =: nodejs/node-v0.x-archive#8467. Given how at odds that is with the Windows semantics, I don't think we're going to get a unified Zig semantics.
(Also tests with leading and trailing = are probably worth adding.)

Hmm ... a consistent EnvMap behavior might be funky if the single FOO=ABC=123 entry satisfies both getenv("FOO") and getenv("FOO=ABC"). Probably fine if EnvMap just declares a preference. Poking around, it seems a couple other projects (e.g., vscode) had problems when over-aggressively splitting environment entries on = (and the fixes involved just splitting on the first =). So this is a bug Zig should probably avoid introducing. These also imply that environment variable name lookups won't match a = (the extra = are always assumed to be in the value part).

@squeek502
Copy link
Collaborator Author

I'm curious how getenv("FOO") and getenv("FOO=DEF") behave

There are basically three behaviors in practice:

  1. Continue matching for the entire name length, then check for = after the name matched fully. If = is found after the key, return everything after the = as the value.
    • This is the behavior of glibc, mingw, msvc
  2. Match the name up until the first = is found, then treat everything after the = as the value. Doesn't matter if the name length wasn't fully matched
    • This is the behavior of macos
  3. Check for = in the name and return null if it's found
    • This is the behavior of musl

In all three, getenv("FOO") would return ABC=123.
getenv("FOO=DEF") would return null with 1 and 3, but ABC=123 with 2.

EnvMap behavior

IMO EnvMap already behaves ideally. FOO=ABC=123 will get added to the map as key: FOO, value: ABC=123. On Windows, leading = is accounted for and added as normal, so =FOO=BAR will be added as key: =FOO, value: BAR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

2 participants