Skip to content

Conversation

@lloeki
Copy link
Member

@lloeki lloeki commented Nov 3, 2025

Why?

Deployment mode is one of the major use cases in the wild.

What does this PR do?

Beat bundler and rubygems into submission via a two-stage injector:

  • set up GEM_HOME and GEM_PATH environment variables in ways compatible with BUNDLE_PATH
  • patch bundler to ignore deployment mode and not reset our modifications
  • patch rubygems to not call Bundler.setup before we do

Since vendored mode is not set, Bundler code that filters out other paths will be able to consider additional paths we add to gem paths.

How to test the change?

CI

Additional Notes:

A tough nut to crack. "Pure" deployment mode is supported (via BUNDLE_DEPLOYMENT=true), but not yet "standalone" vendored mode (e.g via BUNDLE_PATH=/some/where, optionally combined with BUNDLE_FROZEN=true for the same effect as deployment mode)

lloeki added 10 commits November 3, 2025 16:42
Deployment mode comes from vendored mode + frozen bundle
Since the test forwarder is written in Ruby and spawned as a separate
process, it would be subject to injection through RUBYOPT during
injection itself, creating a recursion.
Understanding the state of `Gem.path`, `GEM_PATH`, and `GEM_HOME` is
critical for debugging.
Evaluation on every fetch is costly, especially with a fork.
@lloeki lloeki marked this pull request as ready for review November 14, 2025 15:43
@lloeki lloeki requested a review from a team as a code owner November 14, 2025 15:43
Copy link
Member

@p-datadog p-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything that is wrong in the diff but I cannot evaluate it for correctness either. The regexp I commented on for example, I don't understand what it is actually doing.

Copy link

@sarahchen6 sarahchen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable!

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it a big pass. In general... I find this really hard to follow. We're doing very specific things in a very specific order to target very specific behaviors and... I'm dearly missing notes explaining why.

Without such context, I'm just looking at path changes, env variable changes, etc, which make it really hard to review beyond going "well the minimal test is passing so I hope real applications aren't meaningfully different". In particular, it's hard to for instance figure out if there's any gaps in our approach without the context explaining why and what.

Vendored mode (`BUNDLE_PATH` / `use_system_gems?` => `false`) removes
all paths but the vendor path from the Gem path list.

This effectively hides all other gems, making any injection impossible.

Massage `GEM_PATH` and `GEM_HOME` to set them to values that:
- are identical in behaviour with `BUNDLE_PATH='vendor/bundle'`
- allow injection once vendored mode is relaxed

Note: this does not take an arbitrary `BUNDLE_PATH` into account yet,
instead focusing on the default for deployment mode only
(`vendor/bundle`).
When deployment mode is detected we patch Bundler to act as if it
wasn't set.

This:
- makes Bundler not set `use_system_gems?` to `false`
- makes Bundler not set the vendored path to `vendor/bundle`

But thanks to the `GEM_HOME` and `GEM_PATH` variables that have been set
beforehand, gems will be looked up in the appropriate location, all the
while having broken out of being able to see _only_ vendored gems.

Ergo, gem injection can proceed.
Since Ruby 1.9 `rubygems` is automatically required default. Ruby and
`rubygems` contain mechanisms that enforce rubygems is not just loaded,
but that it is being loaded *first*.

Right before executing the Ruby process we make sure that the Ruby
injector is appearing first, and whatever else is present still appears
too, but afterwards.
It turns out rubygems bootstraps bundler straight from `rubygems` code.
This is used so that once, say, `bundle exec` completes, running ruby
programs is also executed in that bundler context.

Indeed withotut this `bundle exec foo` itself would not work! As soon as
it `exec`s to `foo`, any previous Ruby context would be lost. But it
also means that Ruby loads bundler stuff too early.

Unset the undocumented `BUNDLER_SETUP`.
When `bundle exec` happens, it proceeds with processing Bundler things,
then `exec`s into the actual Ruby program to execute.

When it does so, this is a new process, hence any patch that we have
applied goes away, which results in deployment mode being re-armed.

Apply the `bundler` patches to override Bundler behaviours.

We can safely load `bundler` withotu forking in that case since we are
in a bundled case.
Both threads and pipes were otherwise leaking.
When bundle is unlocked `bundle exec` (obviously) fails, and thus makes
the test fail due to the exit code.

Instead, when unlocked, run the stub directly to test guardrails are in
effect.

Note: an extension of this change would be to:

- still run bundle exec and ensure it exits with a non-zero status,
  behaving as expected.
- run without bundle exec in a locked bundle with `RUBYGEMS_GEMDEPS`
  unset (as of this commit).
- run without bundle exec in a locked bundle with `RUBYGEMS_GEMDEPS` set
  to `-` and/or the path to the fixture `Gemfile`.
- `-r` is not supposed to have a space even though it can in some later
  Ruby versions
- Before 1.9 `Gem` isn't present until `rubygems` is required
There's a discrepancy when e.g. `BUNDLE_PATH` is set.
When `BUNDLE_PATH` is set to `/bundle` the result of `bundle install`
is lost, so the bundle is empty come test time.

Store this path in a volume.
There are two cases we don't handle right:
- vendor path by itself: we only patch Bundler to ignore deployment mode
- deployment mode non-default vendor path: we hardcode the path
Bundler doesn't define `Bundler::CLI` outside of `lib/bundler/cli.rb`
and defines commands as e.g `class CLI::Exec` to save nesting.

This makes requiring `lib/bundler/cli/exec.rb` standalone crash.
Test packages were pinned to a problematic version of `libddwaf` that
causes misresolutions due to multiple overlapping binary gem platforms
being available.
Ruby 2.6 decided to activate the default gem via `Kernel.gem` which makes it
subject to isolation, hence breaking under vendored mode.

This `Kernel.gem` activation happens in `gem_prelude` and can only be skipped
with the `--disable=did_you_mean` CLI flag.

Ruby 2.7 reverted that problematic behaviour, instead resorting to a plain
`require` which will either simply load from `$LOAD_PATH` or use a bundled
version.

To ensure no crash happens on Ruby 2.6 we package the corresponding
`did_you_mean` version.

See:
- gem_prelude.rb calls `Kernel.gem` on 2.6, but not 2.7:
  https://github.com/ruby/ruby/blob/ruby_2_6/gem_prelude.rb#L3-L7
  https://github.com/ruby/ruby/blob/ruby_2_7/gem_prelude.rb#L2
- gem_prelude.rb is included as a prelude script
  https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L158-L161
- prelude scripts get compiled in prelude.c:
  https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L1050-L1059
- prelude.c is generated from a template:
  https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L189
- the template embeds ISeqs of scripts
  https://github.com/ruby/ruby/blob/ruby_2_6/template/prelude.c.tmpl#L167-L168
- prelude targets are just prelude.c:
  https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L1081-L1082
@lloeki lloeki merged commit ae3671b into main Dec 16, 2025
41 checks passed
@lloeki lloeki deleted the lloeki/deployment-mode branch December 16, 2025 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants