Skip to content

Consider changing default axes for map / reduce / filter #66

@freeman-lab

Description

@freeman-lab

The current default for these operations on Spark arrays is axis=(0,), which may incur a swap to distribute along that axis (if it isn't already). The default could instead be axis=None which would mean apply over the distributed axes (whatever they are) and would never incur a swap.

Suggested by @shoyer, thanks!

This generally seems like a more friendly default, the only issue arises not with map but with reduce, when considering sequences of mixed operations. For example, in the following two cases where the map is a no-op,

data = ones((2, 3, 4), sc)
data.map(lambda x: x, axis=(0,)).reduce(add)
data.map(lambda x: x, axis=(0,1)).reduce(add)

if the default for reduce is over the partitioned axes, the answer will be different in the two cases, whereas if the default is over axis=(0,) it will be the same.

I can see an argument that these really should be the same with the default parameters, but curious to get other opinions. Another option is using different defaults for map/filter and reduce.

cc @andrewosh

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions