-
Notifications
You must be signed in to change notification settings - Fork 20
Description
The current default for these operations on Spark arrays is axis=(0,), which may incur a swap to distribute along that axis (if it isn't already). The default could instead be axis=None which would mean apply over the distributed axes (whatever they are) and would never incur a swap.
Suggested by @shoyer, thanks!
This generally seems like a more friendly default, the only issue arises not with map but with reduce, when considering sequences of mixed operations. For example, in the following two cases where the map is a no-op,
data = ones((2, 3, 4), sc)
data.map(lambda x: x, axis=(0,)).reduce(add)
data.map(lambda x: x, axis=(0,1)).reduce(add)
if the default for reduce is over the partitioned axes, the answer will be different in the two cases, whereas if the default is over axis=(0,) it will be the same.
I can see an argument that these really should be the same with the default parameters, but curious to get other opinions. Another option is using different defaults for map/filter and reduce.
cc @andrewosh