Skip to content

Enable vectorisation for ZIP reconstruct stage on Windows #2043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nikos-foundry
Copy link

The SSE4_1 macro that is used to control the use of the vectorised version of the reconstruct() function, is only defined for GCC and clang. As a result, Windows were using the scalar version, which is less performant.
The above issue is now fixed by adding a check for MSVC, similar to what we have on the SSE2 macro check (and in other places in the code base).
At the same time, some basic const correctness is applied to the same file, which could help the compiler apply more optimisations.

The __SSE4_1__ macro that is used to control the use of the vectorised version
of the reconstruct() function, is only defined for GCC and clang. As a result,
Windows were using the scalar version, which is less performant.
This commit fixes the above issue by adding a check for MSVC, similar to what
we have on the SSE2 macro check (and in other places in the code base).

Signed-off-by: Nikolaos Koutsikos <[email protected]>
@kdt3rd
Copy link
Contributor

kdt3rd commented May 21, 2025

Did you experiment with any use of the restrict pointer qualifier (in it's forms, I believe I added a macro for that)? I have seen some evidence that that can help the optimisation as well to remove pointer aliasing, but it's an added complexity if the compiler can already notice that...

Copy link
Contributor

@kdt3rd kdt3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants