Skip to content

Simd & fill optimizations #1628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dhardy opened this issue Apr 16, 2025 · 0 comments
Open

Simd & fill optimizations #1628

dhardy opened this issue Apr 16, 2025 · 0 comments
Labels
B-compiler Breakage: needs compiler upgrade B-value Breakage: changes output values C-optimisation P-low Priority: Low

Comments

@dhardy
Copy link
Member

dhardy commented Apr 16, 2025

#1579 notes some unfinished business:

The Simd and m128i etc. type generation should be equivalent, but they're not in terms of code; the Simd impls currently use fill to avoid more unsafe code here.

Notice from the above that u32x4, u16x8 and u8x16 are the same size as u128 and m128i but cost about twice as much to generate here. This indicates the fill code may be sub-optimal.

Additionally, the m128i impl performed even worse when transmuting a u128 value (~4.3ns or +%130) which, as far as I can tell, is purely because the u128 value is returned via rax, rdx while the __m128i value is returned via rdx, r10 (with rax equal to the struct address). I don't understand this.

Optimizing Fill for such cases may not be possible without specialization, and even then it's unclear if we'd want to due to the implied value-breaking changes.

Optimizing SIMD impls would require either specialization or replacing the generic Simd<$ty, LANES> impls with a (large) number of specific impls.

@dhardy dhardy added B-compiler Breakage: needs compiler upgrade B-value Breakage: changes output values C-optimisation P-low Priority: Low labels Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-compiler Breakage: needs compiler upgrade B-value Breakage: changes output values C-optimisation P-low Priority: Low
Projects
None yet
Development

No branches or pull requests

1 participant