You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project looks really nice, thanks for that!
I have a really simple question regarding compatibility of batch_fitness and SIMD computation due to the layout of the input/decision vector (and the output/fitness one).
From the docs:
for a problem with dimension n the first decision vector in dvs occupies the index range [0, n), the second decision vector
occupies the range [n, 2n), and so on.
Is it really possible to do vectorized operations when concatenating input as described without requiring to allocate a new vector and reorder it internally before calling some simd intrinsics (probably lowering the benefits of the vectorization, or even making it slower than a naive sequential impl)?
I was expecting a contiguous storage of each input element: for a batch of size b, first component of the decision vector occupies the index range [0, b), etc.
I understand this layout is handy for a multithreaded BFE to be able to work concurrently on different portions of the input vector. Is it really compatible with SIMD?
Thanks for your help, and sorry if I missed something :)
The text was updated successfully, but these errors were encountered:
Indeed you are correct, the data layout was thought with thread/process-based BFEs in mind.
I suppose that, as the adoption of AVX512 increases, the availability of gather/scatter instructions would at least alleviate the issue.
As an alternative, we could think about extending the BFE API to give the user the ability to signal how the data is stored (i.e., row-major vs column-major). Of course we would need to ensure that such extension does not break existing uses of the BFE API, which could be tricky.
I love the idea of trying to signal/flag the layout.
I'll have to take a deeper look at the project to see how I could make some relevant PR on that, it will probably take few week before I can find time to really investigate further.. but if you're fine with contributions I can give it a try !
Description
This project looks really nice, thanks for that!
I have a really simple question regarding compatibility of
batch_fitness
andSIMD
computation due to the layout of the input/decision vector (and the output/fitness one).From the docs:
Is it really possible to do vectorized operations when concatenating input as described without requiring to allocate a new vector and reorder it internally before calling some simd intrinsics (probably lowering the benefits of the vectorization, or even making it slower than a naive sequential impl)?
I was expecting a contiguous storage of each input element: for a batch of size
b
, first component of the decision vector occupies the index range[0, b)
, etc.I understand this layout is handy for a multithreaded
BFE
to be able to work concurrently on different portions of the input vector. Is it really compatible withSIMD
?Thanks for your help, and sorry if I missed something :)
The text was updated successfully, but these errors were encountered: