Skip to content

Commit 638af43

Browse files
authored
Merge pull request #2574 from gnzlbg/simd_ffi
RFC: SIMD vectors in FFI
2 parents dce276e + 2ba7466 commit 638af43

File tree

1 file changed

+219
-0
lines changed

1 file changed

+219
-0
lines changed

text/2574-simd-ffi.md

+219
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
- Feature Name: `simd_ffi`
2+
- Start Date: 2018-10-12
3+
- RFC PR: [rust-lang/rfcs#2574](https://github.com/rust-lang/rfcs/pull/2574)
4+
- Rust Issue: [rust-lang/rust#63068](https://github.com/rust-lang/rust/issues/63068)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC allows using SIMD types in C FFI.
10+
11+
# Motivation
12+
[motivation]: #motivation
13+
14+
The architecture-specific SIMD types provided in [`core::arch`] cannot currently
15+
be used in C FFI. That is, Rust programs cannot interface with C libraries that
16+
use these in their APIs.
17+
18+
One notable example would be calling into vectorized [`libm`] implementations
19+
like [`sleef`], [`libmvec`], or Intel's [`SVML`]. The [`packed_simd`] crate
20+
relies on C FFI with these fundamental libraries to offer competitive
21+
performance.
22+
23+
[`core::arch`]: https://doc.rust-lang.org/stable/core/arch/index.html
24+
[`libm`]: https://sourceware.org/glibc/wiki/libm
25+
[`sleef`]: https://sleef.org/
26+
[`libmvec`]: https://sourceware.org/glibc/wiki/libm
27+
[`SVML`]: https://software.intel.com/en-us/node/524289
28+
[`packed_simd`]: https://github.com/rust-lang-nursery/packed_simd
29+
30+
## Why is using SIMD vectors in C FFI currently disallowed?
31+
32+
Consider the following example
33+
([playground](https://play.rust-lang.org/?gist=b8cfb63bb4e7fb00bb293f6e27061c52&version=nightly&mode=debug&edition=2015)):
34+
35+
```rust
36+
extern "C" fn foo(x: __m256);
37+
38+
fn main() {
39+
unsafe {
40+
union U { v: __m256, a: [u64; 4] }
41+
foo(U { a: [0; 4] }.v);
42+
}
43+
}
44+
```
45+
46+
In this example, a 256-bit wide vector type, `__m256`, is passed to an `extern
47+
"C"` function via C FFI. Is the behavior of passing `__m256` to the C function
48+
defined?
49+
50+
That depends on both the platform and how the Rust program was compiled!
51+
52+
First, let's make the platform concrete and assume that it follows the [x64 SysV
53+
ABI][sysv_abi] which states:
54+
55+
> **3.2.1 Registers and the Stack Frame**
56+
>
57+
> Intel AVX (Advanced Vector Extensions) provides 16 256-bit wide AVX registers
58+
> (`%ymm0` - `%ymm15`). The lower 128-bits of `%ymm0` - `%ymm15` are aliased to
59+
> the respective 128b-bit SSE registers (`%xmm0` - `%xmm15`). For purposes of
60+
> parameter passing and function return, `%xmmN` and `%ymmN` refer to the same
61+
> register. Only one of them can be used at the same time.
62+
>
63+
> **3.2.3 Parameter Passing**
64+
>
65+
> **SSE** The class consists of types that fit into a vector register.
66+
>
67+
> **SSEUP** The class consists of types that fit into a vector register and can
68+
> be passed and returned in the upper bytes of it.
69+
70+
[sysv_abi]: https://www.uclibc.org/docs/psABI-x86_64.pdf
71+
72+
Second, in `C`, the `__m256` type is only available if the current translation
73+
unit is being compiled with `AVX` enabled.
74+
75+
Back to the example: `__m256` is a 256-bit wide vector type, that is, wider than
76+
128-bit, but it can be passed through a vector register using the lower and
77+
upper 128-bits of a 256-bit wide register, and in C, if `__m256` can be used,
78+
these registers are always available.
79+
80+
That is, the C ABI requires two things:
81+
82+
* that Rust passes `__m256` via a 256-bit wide register
83+
* that `foo` has the `#[target_feature(enable = "avx")]` attribute !
84+
85+
And this is where things went wrong: in Rust, `__m256` is always available
86+
independently of whether `AVX` is available or not<sup>[1](#layout_unspecified)</sup>,
87+
but we haven't specified how we are actually compiling our Rust program above:
88+
89+
* if we compile it with `AVX` globally enabled, e.g., via `-C
90+
target-feature=+avx`, then the behavior of calling `foo` is defined because
91+
`__m256` will be passed to C in a single 256-bit wide register, which is what
92+
the C ABI requires.
93+
94+
* if we compile our program without `AVX` enabled, then the Rust program cannot
95+
use 256-bit wide registers because they are not available, so independently of
96+
how `__m256` will be passed to C, it won't be passed in a 256-bit wide
97+
register, and the behavior is undefined because of an ABI mismatch.
98+
99+
<a name="layout_unspecified">1</a>: its layout is currently unspecified but that
100+
is not relevant for this issue - what matters is that 256-bit registers are not
101+
available and therefore they cannot be used.
102+
103+
You might be wondering: why is `__m256` available even if `AVX` is not
104+
available? The reason is that we want to use `__m256` in some parts of
105+
Rust's programs even if `AVX` is not globally enabled, and currently we don't
106+
have great infrastructure for conditionally allowing it in some parts of the
107+
program and not others.
108+
109+
Ideally, one should only be able to use `__m256` and operations on it if `AVX`
110+
is available, and this is exactly what this RFC proposes for using vector types
111+
in C FFI: to always require `#[target_feature(enable = X)]` in C FFI functions
112+
using SIMD types, where "unblocking" the use of each type requires some
113+
particular feature to be enabled, e.g., `avx` or `avx2` in the case of `__m256`.
114+
115+
That is, the compiler would reject the example above with an error:
116+
117+
```
118+
error[E1337]: `__m256` on C FFI requires `#[target_feature(enable = "avx")]`
119+
--> src/main.rs:7:15
120+
|
121+
7 | fn foo(x: __m256) -> __m256;
122+
| ^^^^^^
123+
```
124+
125+
And the following program would always have defined behavior
126+
([playground](https://play.rust-lang.org/?gist=db651d09441fd16172a5c94711b2ab97&version=nightly&mode=debug&edition=2015)):
127+
128+
```rust
129+
#[target_feature(enable = "avx")]
130+
extern "C" fn foo(x: __m256) -> __m256;
131+
132+
fn main() {
133+
unsafe {
134+
#[repr(C)] union U { v: __m256, a: [u64; 4] }
135+
if is_x86_feature_detected!("avx") {
136+
// note: this operation is used here for readability
137+
// but its behavior is currently unspecified (see note above).
138+
let vec = U { a: [0; 4] }.v;
139+
foo(vec);
140+
}
141+
}
142+
}
143+
```
144+
145+
independently of the `-C target-feature`s used globally to compile the whole
146+
binary. Note that:
147+
148+
* `extern "C" foo` is compiled with `AVX` enabled, so `foo` takes an `__m256`
149+
like the C ABI expects
150+
* the call to `foo` is guarded with an `is_x86_feature_detected`, that is, `foo`
151+
will only be called if `AVX` is available at run-time
152+
* if the Rust calling convention differs from the calling convention of the
153+
`extern` function, Rust has to adapt these.
154+
155+
# Guide-level and reference-level explanation
156+
[reference-level-explanation]: #reference-level-explanation
157+
158+
Architecture-specific vector types require `#[target_feature]`s to be FFI safe.
159+
That is, they are only safely usable as part of the signature of `extern`
160+
functions if the function has certain `#[target_feature]`s enabled.
161+
162+
Which `#[target_feature]`s must be enabled depends on the vector types being
163+
used.
164+
165+
For the stable architecture-specific vector types the following target features
166+
must be enabled:
167+
168+
* `x86`/`x86_64`:
169+
* `__m128`, `__m128i`, `__m128d`: `"sse"`
170+
* `__m256`, `__m256i`, `__m256d`: `"avx"`
171+
172+
173+
Future stabilizations of architecture-specific vector types must specify the
174+
target features required to use them in `extern` functions.
175+
176+
# Drawbacks
177+
[drawbacks]: #drawbacks
178+
179+
None.
180+
181+
# Rationale and alternatives
182+
[rationale-and-alternatives]: #rationale-and-alternatives
183+
184+
This is an adhoc solution to the problem, but sufficient for FFI purposes.
185+
186+
## Future architecture-specific vector types
187+
188+
In the future, we might want to stabilize some of the following vector types.
189+
This section explores which target features would they require:
190+
191+
* `x86`/`x86_64`:
192+
* `__m64`: `mmx`
193+
* `__m512`, `__m512i`, `__m512f`: "avx512f"
194+
* `arm`: `neon`
195+
* `aarch64`: `neon`
196+
* `ppc64`: `altivec` / `vsx`
197+
* `wasm32`: `simd128`
198+
199+
## Require the feature to be enabled globally for the binary
200+
201+
Instead of using `#[target_feature]` we could allow vector types on C FFI only
202+
behind `#[cfg(target_feature)]`, e.g., via something like the portability check.
203+
204+
This would not allow calling C FFI functions with vector types conditionally on,
205+
e.g., run-time feature detection.
206+
207+
# Prior art
208+
[prior-art]: #prior-art
209+
210+
In C, the architecture specific vector types are only available if the required
211+
target features are enabled at compile-time.
212+
213+
# Unresolved questions
214+
[unresolved-questions]: #unresolved-questions
215+
216+
* Should it be possible to use, e.g., `__m128` on C FFI when the `avx` feature
217+
is enabled? Does that change the calling convention and make doing so unsafe ?
218+
We could extend this RFC to also require that to use certain types certain
219+
features must be disabled.

0 commit comments

Comments
 (0)