-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
rsqrt as builtin #19302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
do you have docs on what this function is? |
|
does it always, for vectors as well? Code export fn rsqrt(num: f32) f32 {
return 1/@sqrt(num);
} Godbolt default # Compilation provided by Compiler Explorer at https://godbolt.org/
.LCPI0_0:
.long 0x3f800000
rsqrt:
push rbp
mov rbp, rsp
sub rsp, 4
vmovaps xmm1, xmm0
vmovss dword ptr [rbp - 4], xmm1
vsqrtss xmm1, xmm0, xmm1
vmovss xmm0, dword ptr [rip + .LCPI0_0]
vdivss xmm0, xmm0, xmm1
add rsp, 4
pop rbp
ret Godbolt # Compilation provided by Compiler Explorer at https://godbolt.org/
.LCPI0_0:
.long 0x3f800000
rsqrt:
vsqrtss xmm0, xmm0, xmm0
vmovss xmm1, dword ptr [rip + .LCPI0_0]
vdivss xmm0, xmm1, xmm0
ret |
at a glance it looks like it should be using (V)RSQRTPS instead |
Somewhat related: #767
Unless |
Continue tracking in #767 (comment) |
Uh oh!
There was an error while loading. Please reload this page.
for the hardware instructions where available
The text was updated successfully, but these errors were encountered: