Skip to content

Latest commit

 

History

History
59 lines (32 loc) · 4.56 KB

README.md

File metadata and controls

59 lines (32 loc) · 4.56 KB

Easy script for building Linux wheels on temporary EC2 instances

It's cibuildwheel, but running on whatever Linux EC2 instance you want.

This is particularly useful for building linux-aarch64 wheels, since no major CI provider has aarch64 support yet.

How to use

$ python3 -m venv myvenv
$ myvenv/bin/pip install -r requirements.txt
$ aws-vault exec ENV -- myvenv/bin/python go.py REPO TAG
# or if your AWS credentials are already in your environment some other way
$ myvenv/bin/python go.py REPO TAG

This uses the given AWS credentials to spawn an EC2 instance, connects to it over ssh and uses cibuildwheel to build wheels, and then copies the wheels back to your local host and tears down all the AWS infrastructure.

REPO can be an arbitrary git clone url, or if you put something like explosion/spacy then it will assume that's a github repo.

TAG can be anything you pass to git checkout.

For testing, does pytest --pyargs PACKAGE_IMPORT_NAME, so it needs to know your package's import name. It will try to guess it from the repo name, but if it guesses wrong, then use --package-name to override.

By default, uses an a1.xlarge instance to build aarch64 wheels. To change the instance size, or switch to an x86-64 instance, use --instance.

While this script tries hard to clean up after itself, it also has a failsafe: if the EC2 instance is somehow left running, then it will automatically self-destruct after 12 hours. So you don't need to worry about accidentally leaving an instance running for months without noticing.

Debugging failed builds

If the build fails, then the script will pause until you hit enter to continue. This is to give you a chance to log into the EC2 machine and poke around to debug things, before it gets torn down.

To facilitate this, we automatically check your local ssh-agent for any loaded keys, and put them into the authorized_keys on the EC2 machine. So as long as you have a running ssh-agent, then you should be able to do ssh ubuntu@<remote machine ip> to log in.

If you don't have a running ssh-agent, you can also do --add-ssh-pubkey ~/.ssh/id_ed25519.pub to manually set an authorized key on the remote machine.

Limitations

  • Currently the cibuildwheel configuration is hardcoded to match how Explosion has traditionally built wheels. (See the CIBW_* settings in remote.sh.) The tool would be more generally useful if we could get rid of these (e.g. by moving them into the individual projects' pyproject.toml), and then just passed through any CIBW_* settings from the local environment, so this tool had the same features and interface as cibuildwheel proper.

    One of those hard-coded configuration settings in particular is to unconditionally install the Rust toolchain, which isn't as elegant as wheelwright's conditional installation, but is simpler and only adds a few seconds to the build time.

  • Currently if you pass a owner/repo-style github URL on the command line, we do git clone https://github.com/owner/repo. This works fine if the repo is public, but if it's private then it will fail, because there's no way to authenticate. And, if you explicitly pass in a ssh URL like [email protected]:owner/repo, then it will fail because the EC2 machine doesn't have any access to credentials.

    Maybe we don't care because all the repos we care about are public anyway? But if we do, then paramiko does have some support for ssh-agent forwarding, so it might be pretty easy to support private repos too.

Comparison to wheelwright

The main differences are:

  • This can only build Linux wheels, while wheelwright also supports macOS and Windows.
  • This supports any EC2 instance types, so you can get native ARM64 builds.
  • This uses cibuildwheel, while wheelwright uses multibuild. One consequence is that by default, this builds a wider variety of Linux wheels (not just manylinux, but also musllinux, PyPy wheels, etc.)

General observation: wheelwright was designed for a world where you needed multiple CI providers to get access to a wide range of platforms, mature and widely-used tools like cibuildwheel didn't exist yet, and the only convenient way to collect up CI artifacts was through Github Releases, for some reason. It all works, but the complexity may not be justified now that you can get most of the same results with something like this.