Skip to content
This repository was archived by the owner on Aug 19, 2024. It is now read-only.

Commit 4179cbb

Browse files
committed
adding get callable
1 parent c4189e4 commit 4179cbb

File tree

1 file changed

+31
-19
lines changed

1 file changed

+31
-19
lines changed

Partitioned.md

Lines changed: 31 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,12 @@ While the interface is designed to be generic enough to cover any distribution b
1414
A conforming implementation of the partitioned-interface standard must provide and support a data structure object having a `__partitioned_interface__` method which returns a Python dictionary with the following fields:
1515
* `shape`: a tuple defining the number of partitions per dimension of the container's global data-(index-)space.
1616
* `partitions`: a dictionary mapping a position in the partition grid (as defined by `shape`) to a dictionary providing the partition object, the partition shape and locality information.
17-
* `locals`: Only for SPMD/MPI-like: list of the positions of the locally owned partitions. The positions serve as lookup keys in the `partitions` dictionary. Must not be available if not SPMD/MPI-like.
17+
* `locals`: Only for SPMD/MPI-like: list of the positions of the locally owned partitions. The positions serve as look-up keys in the `partitions` dictionary. Must not be available if not SPMD/MPI-like.
18+
* `get`: A callable converting a handle into a data object.
1819

19-
In addition to the above required keys a container is encouraged to provide more information that could be potentially benefitial for consuming the distributed data structure.
20+
In addition to the above
21+
* required keys a container is encouraged to provide more information that could be potentially beneficial for consuming the distributed data structure.
22+
* the dictionary must be pickle'able
2023

2124
## `shape`
2225
The shape of the partition grid must be of the same dimensionality as the underlying data-structure. `shape` provides the number of partitions along each dimension. Specifying `1` in a given dimension means the dimension is not cut.
@@ -47,78 +50,86 @@ In addition to the above required keys a container is encouraged to provide more
4750
* When the underlying backend supports it and for all non-SPMD backends partitions must be provided as references. This avoids unnecessary data movement.
4851
* Ray: ray.ObjectRef
4952
* Dask: dask.Future
53+
* It is recommended to access the actual data through the callable in the 'get' field of `__partitioned__` differentiating between different handle types can be avoided and type checks can be limited to basic types like pandas.DataFrame and numpy.ndarray.
54+
5055
* For SPMD-MPI-like backends: partitions which are not locally available may be `None`. This is the recommended behavior unless the underlying backend supports references such as promises to avoid unnecessary data movement.
5156
* `location`
5257
* The location information must include all necessary data to uniquely identify the location of the partition/data. The exact information depends on the underlying distribution system:
5358
* Ray: ip-address
5459
* Dask: worker-Id (name, ip, or ip:port)
55-
* SPMD/MPI-like frameworks such as MPI, SHMEM etc: rank
60+
* SPMD/MPI-like frameworks such as MPI, SHMEM etc.: rank
61+
62+
## `get`
63+
This provides a callable which returns raw data object when called with a handle provided in the `data` field of an entry in `partition`. Raw data objects are standard data structures like pandas.DataFrame and numpy.ndarray.
5664

5765
## `locals`
5866
This is basically a short-cut for SPMD environments which allows processes/ranks to quickly extract the local partition. It saves processes from parsing the `partitions` dictionary for the local rank/address which is helpful when the number of ranks/processes/PEs is large.
5967

68+
6069
## Examples
6170
### 1d-data-structure (64 elements), 1d-partition-grid, 4 partitions on 4 nodes, blocked distribution, partitions are of type `Ray.ObjRef`, Ray
6271
```python
6372
__partitioned_interface__ = {
64-
shape: (4,),
65-
partitions: {
73+
'shape': (4,),
74+
'partitions': {
6675
(0,): {
6776
'start': (0,),
6877
'shape': (16,),
6978
'data': ObjRef0,
70-
'location': 1.1.1.1’, }
79+
'location': '1.1.1.1’, }
7180
(1,): {
7281
'start': (16,),
7382
'shape': (16,),
7483
'data': ObjRef1,
75-
'location': 1.1.1.2’, }
84+
'location': '1.1.1.2’, }
7685
(2,): {
7786
'start': (32,),
7887
'shape': (16,),
7988
'data': ObjRef2,
80-
'location': 1.1.1.3’, }
89+
'location': '1.1.1.3’, }
8190
(3,): {
8291
'start': (48,),
8392
'shape': (16,),
8493
'data': ObjRef3,
85-
'location': ‘1.1.1.4’, }
86-
}
94+
'location': '1.1.1.4’, }
95+
},
96+
'get': lambda x: ray.get(x)
8797
}
8898
```
8999
### 2d-structure (64 elements), 2d-partition-grid, 4 partitions on 2 nodes, block-cyclic distribution, partitions are of type `dask.Future`, dask
90100
```python
91101
__partitioned_interface__ = {
92-
shape’: (2,2),
93-
partitions’: {
102+
'shape’: (2,2),
103+
'partitions’: {
94104
(1,1): {
95105
'start': (4, 4),
96106
'shape': (4, 4),
97107
'data': future0,
98-
'location': Alice’, },
108+
'location': 'Alice’, },
99109
(1,0): {
100110
'start': (4, 0),
101111
'shape': (4, 4),
102112
'data': future1,
103-
'location': 1.1.1.2:55667’, },
113+
'location': '1.1.1.2:55667’, },
104114
(0,1): {
105115
'start': (0, 4),
106116
'shape': (4, 4),
107117
'data': future2,
108-
'location': Alice’, },
118+
'location': 'Alice’, },
109119
(0,0): {
110120
'start': (0,0),
111121
'shape': (4, 4),
112122
'data': future3,
113-
'location': 1.1.1.2:55667’, },
123+
'location': '1.1.1.2:55667’, },
114124
}
125+
'get': lambda x: x.result()
115126
}
116127
```
117128
### 2d-structure (64 elements), 1d-partition-grid, 4 partitions on 2 ranks, row-block-cyclic distribution, partitions are of type `pandas.DataFrame`, MPI
118129
```python
119130
__partitioned_interface__ = {
120-
shape’: (4,1),
121-
partitions’: {
131+
'shape’: (4,1),
132+
'partitions’: {
122133
(0,0): {
123134
'start': (0, 0),
124135
'shape': (2, 8),
@@ -139,7 +150,8 @@ __partitioned_interface__ = {
139150
'shape': (2, 8),
140151
'data': None, # this is for rank 0, for rank 1 it'd be df3
141152
'location': 1, },
142-
}
153+
},
154+
'get': lambda x: x,
143155
'locals': [(0,0), (2,0)] # this is for rank 0, for rank 1 it'd be [(1,0), (3,0)]
144156
}
145157
```

0 commit comments

Comments
 (0)