Skip to content

Commit f5b19dc

Browse files
authored
fix: don't require system Python to perform bootstrapping (#1929)
This is a pretty major, but surprisingly not that invasive, overhaul of how binaries are started. It fixes several issues and lays ground work for future improvements. In brief: * A system Python is no longer needed to perform bootstrapping. * Errors due to `PYTHONPATH` exceeding environment variable size limits is no longer an issue. * Coverage integration is now cleaner and more direct. * The zipapp `__main__.py` entry point generation is separate from the Bazel binary bootstrap generation. * Self-executable zips now have actual bootstrap logic. The way all of this is accomplished is using a two stage bootstrap process. The first stage is responsible for locating the interpreter, and the second stage is responsible for configuring the runtime environment (e.g. import paths). This allows the first stage to be relatively simple (basically find a file in runfiles), so implementing it in cross-platform shell is feasible. The second stage, because it's running under the desired interpreter, can then do things like setting up import paths, and use the `runpy` module to call the program's real main. This also fixes the issue of long `PYTHONPATH` environment variables causing an error. Instead of passing the import paths using an environment variable, they are embedded into the second stage bootstrap, which can then add them to sys.path. This also switches from running coverage as a subprocess to using its APIs directly. This is possible because of the second stage bootstrap, which can rely on `import coverage` occurring in the correct environment. This new bootstrap method is disabled by default. It can be enabled by setting `--@rules_python//python/config_settings:bootstrap_impl=two_stage`. Once the new APIs are released, a subsequent release will make it the default. This is to allow easier upgrades for people defining their own toolchains. The two-stage bootstrap ignores errors during lcov report generation, which partially addresses #1434 Fixes #691 * Also fixes some doc cross references. * Also fixes the autodetecting toolchain and directs our alias to it
1 parent b4b52fc commit f5b19dc

27 files changed

+1440
-81
lines changed

CHANGELOG.md

+14-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
:::{default-domain} bzl
2+
:::
3+
14
# rules_python Changelog
25

36
This is a human-friendly changelog in a keepachangelog.com style format.
@@ -31,7 +34,7 @@ A brief description of the categories of changes:
3134
marked as `reproducible` and will not include any lock file entries from now
3235
on.
3336

34-
* (gazelle): Remove gazelle plugin's python deps and make it hermetic.
37+
* (gazelle): Remove gazelle plugin's python deps and make it hermetic.
3538
Introduced a new Go-based helper leveraging tree-sitter for syntax analysis.
3639
Implemented the use of `pypi/stdlib-list` for standard library module verification.
3740

@@ -80,6 +83,16 @@ A brief description of the categories of changes:
8083
invalid usage previously but we were not failing the build. From now on this
8184
is explicitly disallowed.
8285
* (toolchains) Added riscv64 platform definition for python toolchains.
86+
* (rules) A new bootstrap implementation that doesn't require a system Python
87+
is available. It can be enabled by setting
88+
{obj}`--@rules_python//python:config_settings:bootstrap_impl=two_phase`. It
89+
will become the default in a subsequent release.
90+
([#691](https://github.com/bazelbuild/rules_python/issues/691))
91+
* (providers) `PyRuntimeInfo` has two new attributes:
92+
{obj}`PyRuntimeInfo.stage2_bootstrap_template` and
93+
{obj}`PyRuntimeInfo.zip_main_template`.
94+
* (toolchains) A replacement for the Bazel-builtn autodetecting toolchain is
95+
available. The `//python:autodetecting_toolchain` alias now uses it.
8396

8497
[precompile-docs]: /precompiling
8598

CONTRIBUTING.md

+1
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ Issues should be triaged as follows:
175175
functionality, should also be filed in this repository but without the
176176
`core-rules` label.
177177

178+
(breaking-changes)=
178179
## Breaking Changes
179180

180181
Breaking changes are generally permitted, but we follow a 3-step process for

docs/sphinx/api/python/config_settings/index.md

+31
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
:::{default-domain} bzl
2+
:::
13
:::{bzl:currentfile} //python/config_settings:BUILD.bazel
24
:::
35

@@ -66,3 +68,32 @@ Values:
6668
* `include_pyc`: Include `PyInfo.transitive_pyc_files` as part of the binary.
6769
* `disabled`: Don't include `PyInfo.transitive_pyc_files` as part of the binary.
6870
:::
71+
72+
::::{bzl:flag} bootstrap_impl
73+
Determine how programs implement their startup process.
74+
75+
Values:
76+
* `system_python`: Use a bootstrap that requires a system Python available
77+
in order to start programs. This requires
78+
{obj}`PyRuntimeInfo.bootstrap_template` to be a Python program.
79+
* `script`: Use a bootstrap that uses an arbitrary executable script (usually a
80+
shell script) instead of requiring it be a Python program.
81+
82+
:::{note}
83+
The `script` bootstrap requires the toolchain to provide the `PyRuntimeInfo`
84+
provider from `rules_python`. This loosely translates to using Bazel 7+ with a
85+
toolchain created by rules_python. Most notably, WORKSPACE builds default to
86+
using a legacy toolchain built into Bazel itself which doesn't support the
87+
script bootstrap. If not available, the `system_python` bootstrap will be used
88+
instead.
89+
:::
90+
91+
:::{seealso}
92+
{obj}`PyRuntimeInfo.bootstrap_template` and
93+
{obj}`PyRuntimeInfo.stage2_bootstrap_template`
94+
:::
95+
96+
:::{versionadded} 0.33.0
97+
:::
98+
99+
::::

docs/sphinx/api/python/index.md

+20
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
:::{default-domain} bzl
2+
:::
13
:::{bzl:currentfile} //python:BUILD.bazel
24
:::
35

@@ -21,3 +23,21 @@ provides:
2123
* `PyRuntimeInfo`: The consuming target's target toolchain information
2224

2325
:::
26+
27+
::::{target} autodetecting_toolchain
28+
29+
A simple toolchain that simply uses `python3` from the runtime environment.
30+
31+
Note that this toolchain provides no build-time information, which makes it of
32+
limited utility.
33+
34+
This is only provided to aid migration off the builtin Bazel toolchain
35+
(`@bazel_tools//python:autodetecting_toolchain`), and is largely only applicable
36+
to WORKSPACE builds.
37+
38+
:::{deprecated} unspecified
39+
40+
Switch to using a hermetic toolchain or manual toolchain configuration instead.
41+
:::
42+
43+
::::

docs/sphinx/bazel_inventory.txt

+3-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ bool bzl:type 1 rules/lib/bool -
1010
int bzl:type 1 rules/lib/int -
1111
depset bzl:type 1 rules/lib/depset -
1212
dict bzl:type 1 rules/lib/dict -
13-
label bzl:doc 1 concepts/labels -
13+
label bzl:type 1 concepts/labels -
1414
attr.bool bzl:type 1 rules/lib/toplevel/attr#bool -
1515
attr.int bzl:type 1 rules/lib/toplevel/attr#int -
1616
attr.label bzl:type 1 rules/lib/toplevel/attr#label -
@@ -21,6 +21,7 @@ list bzl:type 1 rules/lib/list -
2121
python bzl:doc 1 reference/be/python -
2222
str bzl:type 1 rules/lib/string -
2323
struct bzl:type 1 rules/lib/builtins/struct -
24-
target-name bzl:doc 1 concepts/labels#target-names -
24+
Name bzl:type 1 concepts/labels#target-names -
2525
CcInfo bzl:provider 1 rules/lib/providers/CcInfo -
2626
CcInfo.linking_context bzl:provider-field 1 rules/lib/providers/CcInfo#linking_context -
27+
ToolchainInfo bzl:type 1 rules/lib/providers/ToolchainInfo.html -

docs/sphinx/pip.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ ARG=$1 # but we don't do anything with it as it's always "get"
150150
# formatting is optional
151151
echo '{'
152152
echo ' "headers": {'
153-
echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]
153+
echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]'
154154
echo ' }'
155155
echo '}'
156156
```

docs/sphinx/support.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,8 @@ incremental fashion.
4646

4747
Breaking changes are allowed, but follow a process to introduce them over
4848
a series of releases to so users can still incrementally upgrade. See the
49-
[Breaking Changes](contributing#breaking-changes) doc for the process.
49+
[Breaking Changes](#breaking-changes) doc for the process.
50+
5051

5152
## Experimental Features
5253

docs/sphinx/toolchains.md

+22-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
:::{default-domain} bzl
2+
:::
3+
14
# Configuring Python toolchains and runtimes
25

36
This documents how to configure the Python toolchain and runtimes for different
@@ -193,7 +196,7 @@ load("@rules_python//python:repositories.bzl", "py_repositories")
193196
py_repositories()
194197
```
195198

196-
#### Workspace toolchain registration
199+
### Workspace toolchain registration
197200

198201
To register a hermetic Python toolchain rather than rely on a system-installed interpreter for runtime execution, you can add to the `WORKSPACE` file:
199202

@@ -221,3 +224,21 @@ pip_parse(
221224
After registration, your Python targets will use the toolchain's interpreter during execution, but a system-installed interpreter
222225
is still used to 'bootstrap' Python targets (see https://github.com/bazelbuild/rules_python/issues/691).
223226
You may also find some quirks while using this toolchain. Please refer to [python-build-standalone documentation's _Quirks_ section](https://gregoryszorc.com/docs/python-build-standalone/main/quirks.html).
227+
228+
## Autodetecting toolchain
229+
230+
The autodetecting toolchain is a deprecated toolchain that is built into Bazel.
231+
It's name is a bit misleading: it doesn't autodetect anything. All it does is
232+
use `python3` from the environment a binary runs within. This provides extremely
233+
limited functionality to the rules (at build time, nothing is knowable about
234+
the Python runtime).
235+
236+
Bazel itself automatically registers `@bazel_tools//python:autodetecting_toolchain`
237+
as the lowest priority toolchain. For WORKSPACE builds, if no other toolchain
238+
is registered, that toolchain will be used. For bzlmod builds, rules_python
239+
automatically registers a higher-priority toolchain; it won't be used unless
240+
there is a toolchain misconfiguration somewhere.
241+
242+
To aid migration off the Bazel-builtin toolchain, rules_python provides
243+
{obj}`@rules_python//python:autodetecting_toolchain`. This is an equivalent
244+
toolchain, but is implemented using rules_python's objects.

examples/bzlmod/test.py

+38-6
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
import os
1616
import pathlib
17+
import re
1718
import sys
1819
import unittest
1920

@@ -63,16 +64,47 @@ def test_coverage_sys_path(self):
6364
first_item.endswith("coverage"),
6465
f"Expected the first item in sys.path '{first_item}' to not be related to coverage",
6566
)
67+
68+
# We're trying to make sure that the coverage library added by the
69+
# toolchain is _after_ any user-provided dependencies. This lets users
70+
# override what coverage version they're using.
71+
first_coverage_index = None
72+
last_user_dep_index = None
73+
for i, path in enumerate(sys.path):
74+
if re.search("rules_python.*~pip~", path):
75+
last_user_dep_index = i
76+
if first_coverage_index is None and re.search(
77+
".*rules_python.*~python~.*coverage.*", path
78+
):
79+
first_coverage_index = i
80+
6681
if os.environ.get("COVERAGE_MANIFEST"):
82+
self.assertIsNotNone(
83+
first_coverage_index,
84+
"Expected to find toolchain coverage, but "
85+
+ f"it was not found.\nsys.path:\n{all_paths}",
86+
)
87+
self.assertIsNotNone(
88+
first_coverage_index,
89+
"Expected to find at least one uiser dep, "
90+
+ "but none were found.\nsys.path:\n{all_paths}",
91+
)
6792
# we are running under the 'bazel coverage :test'
68-
self.assertTrue(
69-
"_coverage" in last_item,
70-
f"Expected {last_item} to be related to coverage",
93+
self.assertGreater(
94+
first_coverage_index,
95+
last_user_dep_index,
96+
"Expected coverage provided by the toolchain to be after "
97+
+ "user provided dependencies.\n"
98+
+ f"Found coverage at index: {first_coverage_index}\n"
99+
+ f"Last user dep at index: {last_user_dep_index}\n"
100+
+ f"Full sys.path:\n{all_paths}",
71101
)
72-
self.assertEqual(pathlib.Path(last_item).name, "coverage")
73102
else:
74-
self.assertFalse(
75-
"coverage" in last_item, f"Expected coverage tooling to not be present"
103+
self.assertIsNone(
104+
first_coverage_index,
105+
"Expected toolchain coverage to not be present\n"
106+
+ f"Found coverage at index: {first_coverage_index}\n"
107+
+ f"Full sys.path:\n{all_paths}",
76108
)
77109

78110
def test_main(self):

python/BUILD.bazel

+3-5
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ that @rules_python//python is only concerned with the core rules.
2424
"""
2525

2626
load("@bazel_skylib//:bzl_library.bzl", "bzl_library")
27+
load("//python/private:autodetecting_toolchain.bzl", "define_autodetecting_toolchain")
2728
load(":current_py_toolchain.bzl", "current_py_toolchain")
2829

2930
package(default_visibility = ["//visibility:public"])
@@ -318,14 +319,11 @@ toolchain_type(
318319
# safe if you know for a fact that your build is completely compatible with the
319320
# version of the `python` command installed on the target platform.
320321

321-
alias(
322-
name = "autodetecting_toolchain",
323-
actual = "@bazel_tools//tools/python:autodetecting_toolchain",
324-
)
322+
define_autodetecting_toolchain(name = "autodetecting_toolchain")
325323

326324
alias(
327325
name = "autodetecting_toolchain_nonstrict",
328-
actual = "@bazel_tools//tools/python:autodetecting_toolchain_nonstrict",
326+
actual = ":autodetecting_toolchain",
329327
)
330328

331329
# ========= Packaging rules =========

python/config_settings/BUILD.bazel

+9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
load("@bazel_skylib//rules:common_settings.bzl", "string_flag")
22
load(
33
"//python/private:flags.bzl",
4+
"BootstrapImplFlag",
45
"PrecompileAddToRunfilesFlag",
56
"PrecompileFlag",
67
"PrecompileSourceRetentionFlag",
@@ -52,3 +53,11 @@ string_flag(
5253
# NOTE: Only public because its an implicit dependency
5354
visibility = ["//visibility:public"],
5455
)
56+
57+
string_flag(
58+
name = "bootstrap_impl",
59+
build_setting_default = BootstrapImplFlag.SYSTEM_PYTHON,
60+
values = sorted(BootstrapImplFlag.__members__.values()),
61+
# NOTE: Only public because its an implicit dependency
62+
visibility = ["//visibility:public"],
63+
)

python/private/BUILD.bazel

+45
Original file line numberDiff line numberDiff line change
@@ -376,9 +376,54 @@ exports_files(
376376
visibility = ["//visibility:public"],
377377
)
378378

379+
filegroup(
380+
name = "stage1_bootstrap_template",
381+
srcs = ["stage1_bootstrap_template.sh"],
382+
# Not actually public. Only public because it's an implicit dependency of
383+
# py_runtime.
384+
visibility = ["//visibility:public"],
385+
)
386+
387+
filegroup(
388+
name = "stage2_bootstrap_template",
389+
srcs = ["stage2_bootstrap_template.py"],
390+
# Not actually public. Only public because it's an implicit dependency of
391+
# py_runtime.
392+
visibility = ["//visibility:public"],
393+
)
394+
395+
filegroup(
396+
name = "zip_main_template",
397+
srcs = ["zip_main_template.py"],
398+
# Not actually public. Only public because it's an implicit dependency of
399+
# py_runtime.
400+
visibility = ["//visibility:public"],
401+
)
402+
403+
# NOTE: Windows builds don't use this bootstrap. Instead, a native Windows
404+
# program locates some Python exe and runs `python.exe foo.zip` which
405+
# runs the __main__.py in the zip file.
406+
alias(
407+
name = "bootstrap_template",
408+
actual = select({
409+
":is_script_bootstrap_enabled": "stage1_bootstrap_template.sh",
410+
"//conditions:default": "python_bootstrap_template.txt",
411+
}),
412+
# Not actually public. Only public because it's an implicit dependency of
413+
# py_runtime.
414+
visibility = ["//visibility:public"],
415+
)
416+
379417
# Used to determine the use of `--stamp` in Starlark rules
380418
stamp_build_setting(name = "stamp")
381419

420+
config_setting(
421+
name = "is_script_bootstrap_enabled",
422+
flag_values = {
423+
"//python/config_settings:bootstrap_impl": "script",
424+
},
425+
)
426+
382427
print_toolchains_checksums(name = "print_toolchains_checksums")
383428

384429
# Used for py_console_script_gen rule

python/private/autodetecting_toolchain.bzl

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def define_autodetecting_toolchain(name):
3232
# buildifier: disable=native-py
3333
py_runtime(
3434
name = "_autodetecting_py3_runtime",
35-
interpreter = ":py3wrapper.sh",
35+
interpreter = "//python/private:autodetecting_toolchain_interpreter.sh",
3636
python_version = "PY3",
3737
stub_shebang = "#!/usr/bin/env python3",
3838
visibility = ["//visibility:private"],

python/private/common/common.bzl

+3-1
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ def create_cc_details_struct(
182182
cc_toolchain = cc_toolchain,
183183
)
184184

185-
def create_executable_result_struct(*, extra_files_to_build, output_groups):
185+
def create_executable_result_struct(*, extra_files_to_build, output_groups, extra_runfiles = None):
186186
"""Creates a `CreateExecutableResult` struct.
187187
188188
This is the return value type of the semantics create_executable function.
@@ -192,13 +192,15 @@ def create_executable_result_struct(*, extra_files_to_build, output_groups):
192192
included as default outputs.
193193
output_groups: dict[str, depset[File]]; additional output groups that
194194
should be returned.
195+
extra_runfiles: A runfiles object of additional runfiles to include.
195196
196197
Returns:
197198
A `CreateExecutableResult` struct.
198199
"""
199200
return struct(
200201
extra_files_to_build = extra_files_to_build,
201202
output_groups = output_groups,
203+
extra_runfiles = extra_runfiles,
202204
)
203205

204206
def union_attrs(*attr_dicts, allow_none = False):

0 commit comments

Comments
 (0)