You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Brandon Holt edited this page Nov 27, 2012
·
3 revisions
Situations where giving no control over distribution of data poses performance concerns.
List solutions on top of Grappa and/or solutions within Grappa memory allocation.
Computation on an array is mostly data parallel but involves consecutive elements.
A forall_local approach will preserve ordering but not consecutive elements on one node.
Could cache adjacent elements in computation--they are not guaranteed to be local, but in most cases would be. This is similar to the future story for forall_local where we'll cache items so that if an object spans more than one block, it'll still work out.
e.g., prefix sum
Computation on 2+ arrays is data parallel but involves element-wise operations.
The arrays may be distributed differently, disabling straightforward forall_local approach.
Within-Grappa solutions
Provide construct to malloc an array of same type T with the same start node as another array. For allocator simplicity this might be done in a way that wastes space.
Atop-Grappa solutions
Allocate the second array to be large enough so that the effective start pointer can be to the same node as the first array.
Struct-of-arrays to array-of-structs transformation, to force the two arrays to be distributed alike.