Skip to content

Releases: huggingface/kernels

v0.10.4

16 Oct 18:24

Choose a tag to compare

What's Changed

Full Changelog: v0.10.3...v0.10.4

v0.10.3

13 Oct 15:26

Choose a tag to compare

New features

kernels check

This release adds a check subcommand to the kernels command. This subcommand can be used to check the ABI compatibility of a kernel on the Hub. For example:

$ kernels check kernels-community/activation
Checking variant: torch29-cxx11-cu130-x86_64-linux                                                                                                                                                                                   
  Dynamic library activation/_activation_beeaae6.abi3.so:                                                                                                                                                                            
    🐍 Python ABI 3.9 compatible                                                                                                                                                                                                     
    🐧 manylinux_2_28 compatible
[...]
Checking variant: torch29-cxx11-cu126-x86_64-linux                                                                                                                                                                                   
  Dynamic library activation/_activation_beeaae6.abi3.so:                                                                                                                                                                            
    🐍 Python ABI 3.9 compatible                                                                                                                                                                                                     
    🐧 manylinux_2_28 compatible

Upload a kernel to a branch

kernels upload now has an additional --branch option to upload a kernel to a branch.

What's Changed

Full Changelog: v0.10.2...v0.10.3

v0.10.2

22 Sep 18:16

Choose a tag to compare

New Features

XPU support

This release adds full support for Intel XPU devices, including kernel layers. XPU variants use the form: torch<torch-version>-cxx<C++-ABI>-xpu<OneAPI-version>-x86_64-linux.

kernel upload utility

Upload kernels to the Hub in a single command. For example, to upload the kernel in the current directory:

$ kernels upload . --repo_id="username/kernelname"

The repository will also be created (publicly) if it does not exist yet. For more information, see the documentation.

What's Changed

New Contributors

Full Changelog: v0.10.1...v0.10.2

v0.10.1

10 Sep 07:43

Choose a tag to compare

What's Changed

Full Changelog: v0.10.0...v0.10.1

v0.10.0

05 Sep 08:54

Choose a tag to compare

New features

Before this release, get_local_kernel would only work with the top-level kernel directory (that contains build). This function will now also work with the top-level directory (mykernel), the build directory (mykernel/build), and the build variant directory (mykernel/build/torch28-cxx11-cu128-x86_64-linux).

Breaking API changes

The default for the mode argument of kernelize is removed.

Before this change, the default mode was Mode.TRAIN | Mode.COMPILE. This had the benefit that by default, kernelize would use kernels that support all use cases. However, it would skip e.g. inference-only kernels, which degrades performance when the user forgets to set mode when using kernels for inference.

What's Changed

  • Small markup fixes of the local kernel repo example by @danieldk in #127
  • feat: improve get local kernel importing by @drbh in #129
  • fix: add get local tests by @drbh in #134
  • cpu is not (yet) a supported device type by @danieldk in #132
  • Remove default for mode argument of kernelize by @danieldk in #136
  • Set version to v0.10.0.dev0 by @danieldk in #137

v0.9.0

01 Aug 14:46

Choose a tag to compare

New features

Initial ROCm support

This release adds the rocm device type. For instance to register a kernel that supports both CUDA and ROCm, you can use:

kernel_layer_mapping = {
    "SiluAndMul": {
        "cuda": LayerRepository(
            repo_id="kernels-community/activation",
            layer_name="SiluAndMul",
        ),
        "rocm": LayerRepository(
            repo_id="kernels-community/activation",
            layer_name="SiluAndMul",
        )
    }
}

register_kernel_mapping(kernel_layer_mapping)

Support for loading local kernel layers

For development and debugging it can often be useful to load kernel layers from a local directory. This is supported by the new LocalLayerRepository class. You can directly use the output of kernel-builder. For example:

kernel_layer_mapping = {
    "SiluAndMul": {
        "cuda": LocalLayerRepository(
            repo_path="/home/daniel/kernels/activation",
            package_name="activation",
            layer_name="SiluAndMul",
        )
    }
}

register_kernel_mapping(kernel_layer_mapping)

What's Changed

New Contributors

Full Changelog: v0.8.1...v0.9.0

v0.8.1

23 Jul 12:47

Choose a tag to compare

New features

Kernel version bounds

get_kernel adds a version argument, which you can use to fetch the latest version compatible with the given version specifier:

activation = kernels.get_kernel("kernels-community/activation", version=">=0.0.3,<0.1")

Version bounds are now also supported when registering layers:

kernel_layer_mapping = {
    "SiluAndMul": {
        "cuda": LayerRepository(
            repo_id="kernels-community/activation",
            version=">=0.0.3,<0.1",
            layer_name="SiluAndMul",
        )
    }
}

Kernel layer locking

Layers can now also use version locks from kernels.lock by using LockedLayerRepository:

kernel_layer_mapping = {
    "SiluAndMul": {
        "cuda": LockedLayerRepository(
            repo_id="kernels-community/activation",
            layer_name="SiluAndMul",
        )
    }
}

See the kernel locking documentation for more information.

What's Changed

New Contributors

Full Changelog: v0.8.0...v0.8.1

v0.8.0

15 Jul 16:47

Choose a tag to compare

New features

Kernel mode fallbacks

Before this release, when using kernelize with a mode, it would only look up the exact match in the kernel mapping. Starting with this release, kernelize will fall back to other compatible modes. For instance, when a model is kernelized as

model = kernelize(model, mode=Mode.INFERENCE)

kernelize will try the following kernel mappings (in-order):

  • Mode.INFERENCE
  • Mode.INFERENCE | Mode.TORCH_COMPILE
  • Mode.TRAINING
  • Mode.TRAINING | Mode.TORCH_COMPILE
  • Mode.FALLBACK

since all these modes are compatible with inference. See the kernel modes documentation for more information and a list per mode of the possible fallbacks.

Support for registering kernels by compute capability

It is now possible to register multiple CUDA kernels with different capabilities. This will allow you to provide e.g. different kernels for Ada, Hopper, and Blackwell GPUs. See the docs on Registering kernels for specific CUDA capabilities for more information.

API-breaking changes

Mode.DEFAULT has been renamed to Mode.FALLBACK for clarity.

What's Changed

Full Changelog: v0.7.0...v0.8.0

v0.7.0

07 Jul 13:11

Choose a tag to compare

API changes

This version contains an API change to the kernelize function that makes it possible to use different kernels for inference/training/torch.compile. This requires a small adjustment to how kernelize is called, see the kernelize documentation for more information. In short, to kernelize a model for inference, use:

model = MyModel(...)
model = kernelize(model, mode=Mode.INFERENCE)

For training:

model = MyModel(...)
model = kernelize(model, mode=Mode.TRAINING)

What's Changed

Full Changelog: v0.6.2...v0.7.0

v0.6.2

25 Jun 08:10

Choose a tag to compare

What's Changed

Full Changelog: v0.6.1...v0.6.2