You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 19, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: Partitioned.md
+31-19Lines changed: 31 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,9 +14,12 @@ While the interface is designed to be generic enough to cover any distribution b
14
14
A conforming implementation of the partitioned-interface standard must provide and support a data structure object having a `__partitioned_interface__` method which returns a Python dictionary with the following fields:
15
15
*`shape`: a tuple defining the number of partitions per dimension of the container's global data-(index-)space.
16
16
*`partitions`: a dictionary mapping a position in the partition grid (as defined by `shape`) to a dictionary providing the partition object, the partition shape and locality information.
17
-
*`locals`: Only for SPMD/MPI-like: list of the positions of the locally owned partitions. The positions serve as lookup keys in the `partitions` dictionary. Must not be available if not SPMD/MPI-like.
17
+
*`locals`: Only for SPMD/MPI-like: list of the positions of the locally owned partitions. The positions serve as look-up keys in the `partitions` dictionary. Must not be available if not SPMD/MPI-like.
18
+
*`get`: A callable converting a handle into a data object.
18
19
19
-
In addition to the above required keys a container is encouraged to provide more information that could be potentially benefitial for consuming the distributed data structure.
20
+
In addition to the above
21
+
* required keys a container is encouraged to provide more information that could be potentially beneficial for consuming the distributed data structure.
22
+
* the dictionary must be pickle'able
20
23
21
24
## `shape`
22
25
The shape of the partition grid must be of the same dimensionality as the underlying data-structure. `shape` provides the number of partitions along each dimension. Specifying `1` in a given dimension means the dimension is not cut.
@@ -47,78 +50,86 @@ In addition to the above required keys a container is encouraged to provide more
47
50
* When the underlying backend supports it and for all non-SPMD backends partitions must be provided as references. This avoids unnecessary data movement.
48
51
* Ray: ray.ObjectRef
49
52
* Dask: dask.Future
53
+
* It is recommended to access the actual data through the callable in the 'get' field of `__partitioned__` differentiating between different handle types can be avoided and type checks can be limited to basic types like pandas.DataFrame and numpy.ndarray.
54
+
50
55
* For SPMD-MPI-like backends: partitions which are not locally available may be `None`. This is the recommended behavior unless the underlying backend supports references such as promises to avoid unnecessary data movement.
51
56
*`location`
52
57
* The location information must include all necessary data to uniquely identify the location of the partition/data. The exact information depends on the underlying distribution system:
53
58
* Ray: ip-address
54
59
* Dask: worker-Id (name, ip, or ip:port)
55
-
* SPMD/MPI-like frameworks such as MPI, SHMEM etc: rank
60
+
* SPMD/MPI-like frameworks such as MPI, SHMEM etc.: rank
61
+
62
+
## `get`
63
+
This provides a callable which returns raw data object when called with a handle provided in the `data` field of an entry in `partition`. Raw data objects are standard data structures like pandas.DataFrame and numpy.ndarray.
56
64
57
65
## `locals`
58
66
This is basically a short-cut for SPMD environments which allows processes/ranks to quickly extract the local partition. It saves processes from parsing the `partitions` dictionary for the local rank/address which is helpful when the number of ranks/processes/PEs is large.
59
67
68
+
60
69
## Examples
61
70
### 1d-data-structure (64 elements), 1d-partition-grid, 4 partitions on 4 nodes, blocked distribution, partitions are of type `Ray.ObjRef`, Ray
62
71
```python
63
72
__partitioned_interface__ = {
64
-
‘shape’: (4,),
65
-
‘partitions’: {
73
+
'shape': (4,),
74
+
'partitions': {
66
75
(0,): {
67
76
'start': (0,),
68
77
'shape': (16,),
69
78
'data': ObjRef0,
70
-
'location': ‘1.1.1.1’, }
79
+
'location': '1.1.1.1’, }
71
80
(1,): {
72
81
'start': (16,),
73
82
'shape': (16,),
74
83
'data': ObjRef1,
75
-
'location': ‘1.1.1.2’, }
84
+
'location': '1.1.1.2’, }
76
85
(2,): {
77
86
'start': (32,),
78
87
'shape': (16,),
79
88
'data': ObjRef2,
80
-
'location': ‘1.1.1.3’, }
89
+
'location': '1.1.1.3’, }
81
90
(3,): {
82
91
'start': (48,),
83
92
'shape': (16,),
84
93
'data': ObjRef3,
85
-
'location': ‘1.1.1.4’, }
86
-
}
94
+
'location': '1.1.1.4’, }
95
+
},
96
+
'get': lambdax: ray.get(x)
87
97
}
88
98
```
89
99
### 2d-structure (64 elements), 2d-partition-grid, 4 partitions on 2 nodes, block-cyclic distribution, partitions are of type `dask.Future`, dask
90
100
```python
91
101
__partitioned_interface__ = {
92
-
‘shape’: (2,2),
93
-
‘partitions’: {
102
+
'shape’: (2,2),
103
+
'partitions’: {
94
104
(1,1): {
95
105
'start': (4, 4),
96
106
'shape': (4, 4),
97
107
'data': future0,
98
-
'location': ‘Alice’, },
108
+
'location': 'Alice’, },
99
109
(1,0): {
100
110
'start': (4, 0),
101
111
'shape': (4, 4),
102
112
'data': future1,
103
-
'location': ‘1.1.1.2:55667’, },
113
+
'location': '1.1.1.2:55667’, },
104
114
(0,1): {
105
115
'start': (0, 4),
106
116
'shape': (4, 4),
107
117
'data': future2,
108
-
'location': ‘Alice’, },
118
+
'location': 'Alice’, },
109
119
(0,0): {
110
120
'start': (0,0),
111
121
'shape': (4, 4),
112
122
'data': future3,
113
-
'location': ‘1.1.1.2:55667’, },
123
+
'location': '1.1.1.2:55667’, },
114
124
}
125
+
'get': lambdax: x.result()
115
126
}
116
127
```
117
128
### 2d-structure (64 elements), 1d-partition-grid, 4 partitions on 2 ranks, row-block-cyclic distribution, partitions are of type `pandas.DataFrame`, MPI
118
129
```python
119
130
__partitioned_interface__ = {
120
-
‘shape’: (4,1),
121
-
‘partitions’: {
131
+
'shape’: (4,1),
132
+
'partitions’: {
122
133
(0,0): {
123
134
'start': (0, 0),
124
135
'shape': (2, 8),
@@ -139,7 +150,8 @@ __partitioned_interface__ = {
139
150
'shape': (2, 8),
140
151
'data': None, # this is for rank 0, for rank 1 it'd be df3
141
152
'location': 1, },
142
-
}
153
+
},
154
+
'get': lambdax: x,
143
155
'locals': [(0,0), (2,0)] # this is for rank 0, for rank 1 it'd be [(1,0), (3,0)]
0 commit comments