|
| 1 | +- Feature Name: `simd_ffi` |
| 2 | +- Start Date: 2018-10-12 |
| 3 | +- RFC PR: [rust-lang/rfcs#2574](https://github.com/rust-lang/rfcs/pull/2574) |
| 4 | +- Rust Issue: [rust-lang/rust#63068](https://github.com/rust-lang/rust/issues/63068) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC allows using SIMD types in C FFI. |
| 10 | + |
| 11 | +# Motivation |
| 12 | +[motivation]: #motivation |
| 13 | + |
| 14 | +The architecture-specific SIMD types provided in [`core::arch`] cannot currently |
| 15 | +be used in C FFI. That is, Rust programs cannot interface with C libraries that |
| 16 | +use these in their APIs. |
| 17 | + |
| 18 | +One notable example would be calling into vectorized [`libm`] implementations |
| 19 | +like [`sleef`], [`libmvec`], or Intel's [`SVML`]. The [`packed_simd`] crate |
| 20 | +relies on C FFI with these fundamental libraries to offer competitive |
| 21 | +performance. |
| 22 | + |
| 23 | +[`core::arch`]: https://doc.rust-lang.org/stable/core/arch/index.html |
| 24 | +[`libm`]: https://sourceware.org/glibc/wiki/libm |
| 25 | +[`sleef`]: https://sleef.org/ |
| 26 | +[`libmvec`]: https://sourceware.org/glibc/wiki/libm |
| 27 | +[`SVML`]: https://software.intel.com/en-us/node/524289 |
| 28 | +[`packed_simd`]: https://github.com/rust-lang-nursery/packed_simd |
| 29 | + |
| 30 | +## Why is using SIMD vectors in C FFI currently disallowed? |
| 31 | + |
| 32 | +Consider the following example |
| 33 | +([playground](https://play.rust-lang.org/?gist=b8cfb63bb4e7fb00bb293f6e27061c52&version=nightly&mode=debug&edition=2015)): |
| 34 | + |
| 35 | +```rust |
| 36 | +extern "C" fn foo(x: __m256); |
| 37 | + |
| 38 | +fn main() { |
| 39 | + unsafe { |
| 40 | + union U { v: __m256, a: [u64; 4] } |
| 41 | + foo(U { a: [0; 4] }.v); |
| 42 | + } |
| 43 | +} |
| 44 | +``` |
| 45 | + |
| 46 | +In this example, a 256-bit wide vector type, `__m256`, is passed to an `extern |
| 47 | +"C"` function via C FFI. Is the behavior of passing `__m256` to the C function |
| 48 | +defined? |
| 49 | + |
| 50 | +That depends on both the platform and how the Rust program was compiled! |
| 51 | + |
| 52 | +First, let's make the platform concrete and assume that it follows the [x64 SysV |
| 53 | +ABI][sysv_abi] which states: |
| 54 | + |
| 55 | +> **3.2.1 Registers and the Stack Frame** |
| 56 | +> |
| 57 | +> Intel AVX (Advanced Vector Extensions) provides 16 256-bit wide AVX registers |
| 58 | +> (`%ymm0` - `%ymm15`). The lower 128-bits of `%ymm0` - `%ymm15` are aliased to |
| 59 | +> the respective 128b-bit SSE registers (`%xmm0` - `%xmm15`). For purposes of |
| 60 | +> parameter passing and function return, `%xmmN` and `%ymmN` refer to the same |
| 61 | +> register. Only one of them can be used at the same time. |
| 62 | +> |
| 63 | +> **3.2.3 Parameter Passing** |
| 64 | +> |
| 65 | +> **SSE** The class consists of types that fit into a vector register. |
| 66 | +> |
| 67 | +> **SSEUP** The class consists of types that fit into a vector register and can |
| 68 | +> be passed and returned in the upper bytes of it. |
| 69 | +
|
| 70 | +[sysv_abi]: https://www.uclibc.org/docs/psABI-x86_64.pdf |
| 71 | + |
| 72 | +Second, in `C`, the `__m256` type is only available if the current translation |
| 73 | +unit is being compiled with `AVX` enabled. |
| 74 | + |
| 75 | +Back to the example: `__m256` is a 256-bit wide vector type, that is, wider than |
| 76 | +128-bit, but it can be passed through a vector register using the lower and |
| 77 | +upper 128-bits of a 256-bit wide register, and in C, if `__m256` can be used, |
| 78 | +these registers are always available. |
| 79 | + |
| 80 | +That is, the C ABI requires two things: |
| 81 | + |
| 82 | +* that Rust passes `__m256` via a 256-bit wide register |
| 83 | +* that `foo` has the `#[target_feature(enable = "avx")]` attribute ! |
| 84 | + |
| 85 | +And this is where things went wrong: in Rust, `__m256` is always available |
| 86 | +independently of whether `AVX` is available or not<sup>[1](#layout_unspecified)</sup>, |
| 87 | +but we haven't specified how we are actually compiling our Rust program above: |
| 88 | + |
| 89 | +* if we compile it with `AVX` globally enabled, e.g., via `-C |
| 90 | + target-feature=+avx`, then the behavior of calling `foo` is defined because |
| 91 | + `__m256` will be passed to C in a single 256-bit wide register, which is what |
| 92 | + the C ABI requires. |
| 93 | + |
| 94 | +* if we compile our program without `AVX` enabled, then the Rust program cannot |
| 95 | + use 256-bit wide registers because they are not available, so independently of |
| 96 | + how `__m256` will be passed to C, it won't be passed in a 256-bit wide |
| 97 | + register, and the behavior is undefined because of an ABI mismatch. |
| 98 | + |
| 99 | +<a name="layout_unspecified">1</a>: its layout is currently unspecified but that |
| 100 | +is not relevant for this issue - what matters is that 256-bit registers are not |
| 101 | +available and therefore they cannot be used. |
| 102 | + |
| 103 | +You might be wondering: why is `__m256` available even if `AVX` is not |
| 104 | +available? The reason is that we want to use `__m256` in some parts of |
| 105 | +Rust's programs even if `AVX` is not globally enabled, and currently we don't |
| 106 | +have great infrastructure for conditionally allowing it in some parts of the |
| 107 | +program and not others. |
| 108 | + |
| 109 | +Ideally, one should only be able to use `__m256` and operations on it if `AVX` |
| 110 | +is available, and this is exactly what this RFC proposes for using vector types |
| 111 | +in C FFI: to always require `#[target_feature(enable = X)]` in C FFI functions |
| 112 | +using SIMD types, where "unblocking" the use of each type requires some |
| 113 | +particular feature to be enabled, e.g., `avx` or `avx2` in the case of `__m256`. |
| 114 | + |
| 115 | +That is, the compiler would reject the example above with an error: |
| 116 | + |
| 117 | +``` |
| 118 | +error[E1337]: `__m256` on C FFI requires `#[target_feature(enable = "avx")]` |
| 119 | + --> src/main.rs:7:15 |
| 120 | + | |
| 121 | +7 | fn foo(x: __m256) -> __m256; |
| 122 | + | ^^^^^^ |
| 123 | +``` |
| 124 | + |
| 125 | +And the following program would always have defined behavior |
| 126 | +([playground](https://play.rust-lang.org/?gist=db651d09441fd16172a5c94711b2ab97&version=nightly&mode=debug&edition=2015)): |
| 127 | + |
| 128 | +```rust |
| 129 | +#[target_feature(enable = "avx")] |
| 130 | +extern "C" fn foo(x: __m256) -> __m256; |
| 131 | + |
| 132 | +fn main() { |
| 133 | + unsafe { |
| 134 | + #[repr(C)] union U { v: __m256, a: [u64; 4] } |
| 135 | + if is_x86_feature_detected!("avx") { |
| 136 | + // note: this operation is used here for readability |
| 137 | + // but its behavior is currently unspecified (see note above). |
| 138 | + let vec = U { a: [0; 4] }.v; |
| 139 | + foo(vec); |
| 140 | + } |
| 141 | + } |
| 142 | +} |
| 143 | +``` |
| 144 | + |
| 145 | +independently of the `-C target-feature`s used globally to compile the whole |
| 146 | +binary. Note that: |
| 147 | + |
| 148 | +* `extern "C" foo` is compiled with `AVX` enabled, so `foo` takes an `__m256` |
| 149 | + like the C ABI expects |
| 150 | +* the call to `foo` is guarded with an `is_x86_feature_detected`, that is, `foo` |
| 151 | + will only be called if `AVX` is available at run-time |
| 152 | +* if the Rust calling convention differs from the calling convention of the |
| 153 | + `extern` function, Rust has to adapt these. |
| 154 | + |
| 155 | +# Guide-level and reference-level explanation |
| 156 | +[reference-level-explanation]: #reference-level-explanation |
| 157 | + |
| 158 | +Architecture-specific vector types require `#[target_feature]`s to be FFI safe. |
| 159 | +That is, they are only safely usable as part of the signature of `extern` |
| 160 | +functions if the function has certain `#[target_feature]`s enabled. |
| 161 | + |
| 162 | +Which `#[target_feature]`s must be enabled depends on the vector types being |
| 163 | +used. |
| 164 | + |
| 165 | +For the stable architecture-specific vector types the following target features |
| 166 | +must be enabled: |
| 167 | + |
| 168 | +* `x86`/`x86_64`: |
| 169 | + * `__m128`, `__m128i`, `__m128d`: `"sse"` |
| 170 | + * `__m256`, `__m256i`, `__m256d`: `"avx"` |
| 171 | + |
| 172 | + |
| 173 | +Future stabilizations of architecture-specific vector types must specify the |
| 174 | +target features required to use them in `extern` functions. |
| 175 | + |
| 176 | +# Drawbacks |
| 177 | +[drawbacks]: #drawbacks |
| 178 | + |
| 179 | +None. |
| 180 | + |
| 181 | +# Rationale and alternatives |
| 182 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 183 | + |
| 184 | +This is an adhoc solution to the problem, but sufficient for FFI purposes. |
| 185 | + |
| 186 | +## Future architecture-specific vector types |
| 187 | + |
| 188 | +In the future, we might want to stabilize some of the following vector types. |
| 189 | +This section explores which target features would they require: |
| 190 | + |
| 191 | +* `x86`/`x86_64`: |
| 192 | + * `__m64`: `mmx` |
| 193 | + * `__m512`, `__m512i`, `__m512f`: "avx512f" |
| 194 | +* `arm`: `neon` |
| 195 | +* `aarch64`: `neon` |
| 196 | +* `ppc64`: `altivec` / `vsx` |
| 197 | +* `wasm32`: `simd128` |
| 198 | + |
| 199 | +## Require the feature to be enabled globally for the binary |
| 200 | + |
| 201 | +Instead of using `#[target_feature]` we could allow vector types on C FFI only |
| 202 | +behind `#[cfg(target_feature)]`, e.g., via something like the portability check. |
| 203 | + |
| 204 | +This would not allow calling C FFI functions with vector types conditionally on, |
| 205 | +e.g., run-time feature detection. |
| 206 | + |
| 207 | +# Prior art |
| 208 | +[prior-art]: #prior-art |
| 209 | + |
| 210 | +In C, the architecture specific vector types are only available if the required |
| 211 | +target features are enabled at compile-time. |
| 212 | + |
| 213 | +# Unresolved questions |
| 214 | +[unresolved-questions]: #unresolved-questions |
| 215 | + |
| 216 | +* Should it be possible to use, e.g., `__m128` on C FFI when the `avx` feature |
| 217 | + is enabled? Does that change the calling convention and make doing so unsafe ? |
| 218 | + We could extend this RFC to also require that to use certain types certain |
| 219 | + features must be disabled. |
0 commit comments