Releases: huggingface/kernels
v0.10.4
What's Changed
- start to build xpu kernels from torch 2.7 by @sywangyi in #165
- fix: kernels upload to a repo branch by @sayakpaul in #168
- feat: allow get_kernel to log telemetry. by @sayakpaul in #167
- Set version to 0.10.4.dev0 by @danieldk in #169
Full Changelog: v0.10.3...v0.10.4
v0.10.3
New features
kernels check
This release adds a check subcommand to the kernels command. This subcommand can be used to check the ABI compatibility of a kernel on the Hub. For example:
$ kernels check kernels-community/activation
Checking variant: torch29-cxx11-cu130-x86_64-linux
Dynamic library activation/_activation_beeaae6.abi3.so:
π Python ABI 3.9 compatible
π§ manylinux_2_28 compatible
[...]
Checking variant: torch29-cxx11-cu126-x86_64-linux
Dynamic library activation/_activation_beeaae6.abi3.so:
π Python ABI 3.9 compatible
π§ manylinux_2_28 compatible
Upload a kernel to a branch
kernels upload now has an additional --branch option to upload a kernel to a branch.
What's Changed
- Add support for NPU kernelize/layers by @zheliuyu in #155
- Only run staging tests in one configuration by @danieldk in #156
- Add a
Makefileto run formatting one-shot by @sayakpaul in #157 - Add the
kernels checksubcommand by @danieldk in #158 - Add a note on
torch.compileby @sayakpaul in #159 - Link local kernel and local/locked kernel API docs by @danieldk in #160
- Bump torch version in runner by @MekkCyber in #162
- feat: allow kernels to be uploaded to a revision by @sayakpaul in #161
- Set version to 0.10.3.dev0 by @danieldk in #164
Full Changelog: v0.10.2...v0.10.3
v0.10.2
New Features
XPU support
This release adds full support for Intel XPU devices, including kernel layers. XPU variants use the form: torch<torch-version>-cxx<C++-ABI>-xpu<OneAPI-version>-x86_64-linux.
kernel upload utility
Upload kernels to the Hub in a single command. For example, to upload the kernel in the current directory:
$ kernels upload . --repo_id="username/kernelname"The repository will also be created (publicly) if it does not exist yet. For more information, see the documentation.
What's Changed
- Add support for XPU layer repostories by @danieldk in #142
- [feat] add an uploading utility. by @sayakpaul in #138
- Improve errors for layer validation by @danieldk in #145
- Describe the
get_kernel/LayerRepositoryversion argument by @danieldk in #147 - Removing unexisting link in README by @MekkCyber in #148
- Fix some spelling errors to check docs CI is working by @danieldk in #120
- Document the
to-wheelsubcommand by @danieldk in #149 - Bump huggingface_hub upper bound <2.0 by @Wauplin in #151
- faq: why only replace
forwardmethods? by @danieldk in #153 - [tests] turn the
kernels uploadtests to be staging tests by @sayakpaul in #152 - Set version to 0.10.2.dev0 by @danieldk in #154
New Contributors
- @sayakpaul made their first contribution in #138
- @Wauplin made their first contribution in #151
Full Changelog: v0.10.1...v0.10.2
v0.10.1
v0.10.0
New features
Before this release, get_local_kernel would only work with the top-level kernel directory (that contains build). This function will now also work with the top-level directory (mykernel), the build directory (mykernel/build), and the build variant directory (mykernel/build/torch28-cxx11-cu128-x86_64-linux).
Breaking API changes
The default for the mode argument of kernelize is removed.
Before this change, the default mode was Mode.TRAIN | Mode.COMPILE. This had the benefit that by default, kernelize would use kernels that support all use cases. However, it would skip e.g. inference-only kernels, which degrades performance when the user forgets to set mode when using kernels for inference.
What's Changed
- Small markup fixes of the local kernel repo example by @danieldk in #127
- feat: improve get local kernel importing by @drbh in #129
- fix: add get local tests by @drbh in #134
cpuis not (yet) a supported device type by @danieldk in #132- Remove default for
modeargument ofkernelizeby @danieldk in #136 - Set version to v0.10.0.dev0 by @danieldk in #137
v0.9.0
New features
Initial ROCm support
This release adds the rocm device type. For instance to register a kernel that supports both CUDA and ROCm, you can use:
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
),
"rocm": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
)
}
}
register_kernel_mapping(kernel_layer_mapping)Support for loading local kernel layers
For development and debugging it can often be useful to load kernel layers from a local directory. This is supported by the new LocalLayerRepository class. You can directly use the output of kernel-builder. For example:
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": LocalLayerRepository(
repo_path="/home/daniel/kernels/activation",
package_name="activation",
layer_name="SiluAndMul",
)
}
}
register_kernel_mapping(kernel_layer_mapping)What's Changed
- Fix typo in layers documentation by @shadeMe in #116
- Update documentation for compatibility with doc-builder by @danieldk in #117
- Test examples in docstrings using mktestdocs by @danieldk in #118
- Add doc build to CI by @danieldk in #119
- Log when using fallback layer by @danieldk in #121
- Add
LocalLayerRepositoryto load from a local repo by @danieldk in #123 - Run black check by @danieldk in #124
- Nix: go back to hf-nix main by @danieldk in #125
- Add ROCm device discovery by @ahadnagy in #122
- Set version to 0.9.0.dev0 by @danieldk in #126
New Contributors
Full Changelog: v0.8.1...v0.9.0
v0.8.1
New features
Kernel version bounds
get_kernel adds a version argument, which you can use to fetch the latest version compatible with the given version specifier:
activation = kernels.get_kernel("kernels-community/activation", version=">=0.0.3,<0.1")Version bounds are now also supported when registering layers:
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": LayerRepository(
repo_id="kernels-community/activation",
version=">=0.0.3,<0.1",
layer_name="SiluAndMul",
)
}
}Kernel layer locking
Layers can now also use version locks from kernels.lock by using LockedLayerRepository:
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": LockedLayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
)
}
}See the kernel locking documentation for more information.
What's Changed
get_kernel: allow Python-style version specifiers by @danieldk in #111- triton based kernel could also run in xpu by @sywangyi in #112
- Add version support to
LayerRepositoryby @danieldk in #113 - Add support for project-wide locking of layers by @danieldk in #114
- Set version to 0.8.1.dev0 by @danieldk in #115
New Contributors
Full Changelog: v0.8.0...v0.8.1
v0.8.0
New features
Kernel mode fallbacks
Before this release, when using kernelize with a mode, it would only look up the exact match in the kernel mapping. Starting with this release, kernelize will fall back to other compatible modes. For instance, when a model is kernelized as
model = kernelize(model, mode=Mode.INFERENCE)kernelize will try the following kernel mappings (in-order):
Mode.INFERENCEMode.INFERENCE | Mode.TORCH_COMPILEMode.TRAININGMode.TRAINING | Mode.TORCH_COMPILEMode.FALLBACK
since all these modes are compatible with inference. See the kernel modes documentation for more information and a list per mode of the possible fallbacks.
Support for registering kernels by compute capability
It is now possible to register multiple CUDA kernels with different capabilities. This will allow you to provide e.g. different kernels for Ada, Hopper, and Blackwell GPUs. See the docs on Registering kernels for specific CUDA capabilities for more information.
API-breaking changes
Mode.DEFAULT has been renamed to Mode.FALLBACK for clarity.
What's Changed
- Fix macOS tests by marking some CUDA-only tests by @danieldk in #105
- Support registering layers with a range of CUDA capabilities by @danieldk in #106
- Improve mode handling by @danieldk in #108
- Log kernel layer selection by @danieldk in #109
- Set version to 0.8.0.dev0 by @danieldk in #110
Full Changelog: v0.7.0...v0.8.0
v0.7.0
API changes
This version contains an API change to the kernelize function that makes it possible to use different kernels for inference/training/torch.compile. This requires a small adjustment to how kernelize is called, see the kernelize documentation for more information. In short, to kernelize a model for inference, use:
model = MyModel(...)
model = kernelize(model, mode=Mode.INFERENCE)For training:
model = MyModel(...)
model = kernelize(model, mode=Mode.TRAINING)What's Changed
- Add
get_local_kernelfunction by @danieldk in #102 - Support registering inference/training-specific layers by @danieldk in #103
- Set version to 0.7.0.dev0 by @danieldk in #104
Full Changelog: v0.6.2...v0.7.0