Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable autodiff testcases in CI #855

Open
2 of 3 tasks
ZuseZ4 opened this issue Mar 27, 2025 · 2 comments
Open
2 of 3 tasks

Enable autodiff testcases in CI #855

ZuseZ4 opened this issue Mar 27, 2025 · 2 comments
Labels
final-comment-period The FCP has started, most (if not all) team members are in agreement major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team to-announce Announce this issue on triage meeting

Comments

@ZuseZ4
Copy link
Member

ZuseZ4 commented Mar 27, 2025

Proposal

Adding std::autodiff to nightly Rust was originally accepted as MCP here, and as a lang experiment on 2024-05-01. Autodiff is based on an experimental LLVM plugin called Enzyme.

The implementation of std::autodiff was upstreamed over the last year per tracking issue. We now have had a working autodiff implementation in rust-lang/rust for a few months, which is gradually getting improved. Autodiff is available in nightly if the compiler is being build after setting llvm.enzyme = true in bootstrap.toml. By default, the value is false.

In this pr I experimented succesfully with setting llvm.enzyme = true in CI, but the final PR merged still had it disabled to give it more time to mature. A few months later, I think it's getting time to enable it on nightly, to allow people to experiment with it. I intend to enable autodiff for all Tier 1 targets which are LLVM based, with the exception of MSVC targets, which are currently not supported to due to an Enzyme bug.
This would result in autodiff testcases running in CI, since they are currently guarded behind a needs-enzyme configuration, which is true if and only if we build rustc with autodiff enabled.

The way in which changes to the test infra are handled is currently under review. I have attached the template from rust-lang/rust-forge#813. This is likely the first usage of this template, so critique on the form should be directed there.

Copied from rust-lang/rust-forge#813

Ecosystem and Integration Test Job/Component Policy

The ecosystem/integration test job/component ("test job/component") proposed for the
rust-lang/rust CI must:

  • Be approved by the compiler team through a proposed MCP, where the MCP is seconded by a compiler
    team member, and the MCP is accepted with no blocking concerns.
  • Have no blocking concerns from the library team.
  • Have the implementation PR be reviewed and approved by the infrastructure team.
  • Be properly documented on rustc-dev-guide (preferably as part of the implementation PR).

Please complete the sections below so rust-lang/rust teams can have sufficient context about the
proposed test job/component.

Test job/component rationale

What does this test job/component do?

  • If an ecosystem test job/component is being proposed, can you briefly describe the intended
    ecosystem users?

Testing the std::autodiff module.

I wrote documentation on https://enzyme.mit.edu/index.fcgi/rust, and I'd be happy to move over documentation to the rustc-dev-guide. However, some guidance on which docs are desired and where in the dev guide to add them would be appreciated.

Ecosystem users: people working in the field of Machine Learning, High Performance Computing, Scientific Computing (e.g. simulations), Graphics, Quantum Computing, ...

What rust-lang/rust changes can potentially break the test job/component?

E.g. changes to rustc, standard library, bootstrap or tools (like clippy/rustfmt/cargo).

We have autodiff tests in tests/ui, tests/pretty, and tests/codegen. Only codegen tests actually invoke Enzyme, ui and pretty tests just test our macro frontend. ui and pretty tests test my rustc_builtin_macro expansion, so I think they should only break if someone reworks how such macros work?

The codgen tests are more likely to break, I would therefore suggest that we treat ui and pretty tests normally, thus if a PR refactors the frontend and they start failing, they should be fixed as part of the PR, I'm happy to help.

Why does this test job/component need to be part of the rust-lang/rust PR and/or Full Merge CI?

Per MCP and Lang experiment it was decided to allow automatic differentiation in nightly. We want to have tests even for nightly features.

If the test job/component will block on failure, why does it need to block?

Because someone changed the autodiff parsing code in a way that changed it's expansion output, without blessing the new output. In that case it's likely an unintended change that shouldn't be merged.

If the test job/component will not block on failure initially but is intended to eventually become
blocking:

  • Why will it become blocking?
  • When will it become blocking?

I am not convinced that Enzyme in the current state is reliable enough to block Rust CI. I would suggest that a change to this policy goes through a new MCP.

Test job/component maintainers

The proposed test job/component for rust-lang/rust CI must have at least one dedicated test
job/component maintainer. The test job/component maintainers understand that they will be pinged
or otherwise contacted about the custom test job/component, particularly for (but not limited to)
its failures.

Please list who will be maintaining this ecosystem/integration test job/component here. Please
format the github handles in the style:

[@github_handle_1](https://github.com/github_handle_2)
[@github_handle_2](https://github.com/github_handle_2)

NOTE: For future readers, you can paste the usernames without formatting them as links via
ctrl-shift-v.

@ZuzeZ4

NOTE: If an ecosystem/integration test job/component no longer has an active dedicated
maintainer (or maintainers), and if rust-lang/rust teams find the ecosystem/integration test
job/component causes significant burden or becomes irrelevant, then the ecosystem/integration test
job/component may be removed.

CI infrastructure considerations

You should ask the Infrastructure Team on the
#t-infra zulip channel when
proposing a new ecosystem/integration test job/component to check if there's capacity for the test
job/component.

  • Does the custom test job/component require substantial CI resources (storage and/or CI time)? In
    particular, will it require large runners?

Discussed in: https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/MCP.20-.20Enable.20autodiff.20in.20CI/with/508847008

Features and implementation details

Does the proposed test job/component intend to use any unstable features?

Yes, #![feature(autodiff)].

  • If so, are the unstable features ready for exposure (e.g. must an unstable feature be completely
    reworked)?

Yes, I intend to enable tests and shipping on nightly simultaneously (would something else even possible?)

  • For ecosystem test jobs/components, are the unstable features ready for such exposure to the
    ecosystem, and are the feature stakeholders ready for such usage?

Yes, I've been working on automatic differentiation for Rust for almost 5 years, it's getting time to see some more public testing.

Does the proposed test job/component intend to intentionally depend on any implementation details?
This may include but is not limited to: unstable/internal compiler/tool flags and behaviors,
RUSTC_BOOTSTRAP usages, standard library implementation details, etc.

It depends on the usage of Enzyme and thus LLVM, it would be great to have autodiff for cranelift (and gcc) too. We also alter the compilation pipeline and expose/use the RUSTFLAGS=-Zautodiff=<options> flag.

  • If so, are there plans to shrink or expand such dependencies in the future?

It is desired to replace Enzyme by an upstream autodiff feature directly in LLVM. It is also desired to implement autodiff based on a reflection system when possible.

Failure protocol: what to do if the job/component breaks/fails?

NOTE: If the artifacts of an ecosystem/integration test job/component are not shipped as part of
a distribution component/toolchain, the test job/component may be temporarily disabled to unblock
rust-lang/rust PR CI or Full Merge CI without receiving prior approval from the test
job/component maintainers. The test job/component maintainers will be pinged or otherwise notified
about the test job/component being disabled.

How can the test job/component maintainers be contacted in case of failure? By default, it is
assumed that the test job/component maintainer can be pinged via their GitHub handles.

pinging in the PR, zulip, discord

(If applicable) If the addition of an ecosystem/integration test job is being proposed:

  • How can the test job be run in CI? If so, is there a try job (try-job: ...) invocation? What's
    the job name?
  • Can the test job be run locally? If so, how?

N/A

(If applicable) If the addition of an ecosystem/integration test component is being proposed:

  • Which existing CI jobs will be building and testing this test component?

All Tier 1 targets building LLVM from source, with the exception of MSVC (due to an Enzyme bug).

  • Can they be built and ran as part of a try job? If so, what are the job names and the try job
    (try-job: ...) invocations?
  • Can the test component be built and run locally? If so, how?

By setting llvm.enzyme = true in bootstrap.toml, building rustc, and running the tests.

How can the test job/component be disabled in the event of spurious failures that are blocking PR
and/or Full Merge CI?

If Enzyme fails to build e.g. due to an LLVM update (not supposed to happen, but still), then llvm.enzyme can be set to false. This will disable nightly for some nightly versions until the build gets fixed. It will also cause autodiff tests to be skipped.

If a PR breaks the test job/component:

  • If the breakage seems spurious and retrying does not resolve the spurious breakage, the test
    job may be temporarily disabled (see below).
  • If the breakage is intentional, how will this be resolved?
  • If the breakage is unintentional, is the PR author expected to fix the breakage?
  • If rustc frontend changes break macro expansion for rustc_builtin_macros then autodiff should be updated/fixed along with all other rustc_builtin_macros.
  • If changes break an autodiff codegen test, then I would not recommend the PR author to try to fix the individual test and instead suggest disabling it.

Dependencies, build/test environments and reliability

Does the test job/component involve any custom build systems that are not used in the regular
rust-lang/rust CI jobs?

Enzyme uses cmake, which is already in use in CI jobs for building LLVM.

Does the test job/component depend on external resources (e.g. external servers) that may be
subject to network connectivity?

No, as of today we have an Enzyme fork under rust-lang/Enzyme.

  • If so, does the infrastructure team need to help maintain a mirror of the required assets?

I maintain the Enzyme fork (by following upstream closely).

Are there any potential sources of spurious failures due to the test job/component?

Cloning rust-lang/Enzyme might fail. Enzyme is much smaller than LLVM, so the chance of this happening is probably lower.

Are there any other unusual requirements (build environment, dependencies, etc.)?

We will start building rust-lang/Enzyme in CI.

Mentors or Reviewers

@oli-obk is mentoring autodiff on the compiler side,
@traviscross serves as a lang liaison.
Manuel Drehwald (@ZuseZ4) will continue with the implementation work.

Process

The main points of the Major Change Process are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

@ZuseZ4 ZuseZ4 added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Mar 27, 2025
@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2025

Important

This issue is not meant to be used for technical discussion. There is a Zulip stream for that.
Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections can formally be registered here by adding a comment.

@rfcbot concern reason-for-concern
<description of the concern>

Concerns can be lifted with:

@rfcbot resolve reason-for-concern

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler

@jieyouxu
Copy link
Member

jieyouxu commented Apr 1, 2025

@rustbot second

@rustbot rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period The FCP has started, most (if not all) team members are in agreement major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team to-announce Announce this issue on triage meeting
Projects
None yet
Development

No branches or pull requests

3 participants