|
| 1 | +.. _python-spec: |
| 2 | + |
| 3 | +Python Specification for DLPack |
| 4 | +=============================== |
| 5 | + |
| 6 | +The Python specification for DLPack is a part of the |
| 7 | +`Python array API standard <https://data-apis.org/array-api/latest/index.html>`_. |
| 8 | +More details about the spec can be found under the :ref:`data-interchange` page. |
| 9 | + |
| 10 | + |
| 11 | +Syntax for data interchange with DLPack |
| 12 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 13 | + |
| 14 | +The array API will offer the following syntax for data interchange: |
| 15 | + |
| 16 | +1. A ``from_dlpack(x, ...)`` function, which accepts any (array) object with |
| 17 | + the two DLPack methods implemented (see below) and uses them to construct |
| 18 | + a new array containing the data from ``x``. |
| 19 | +2. ``__dlpack__`` and ``__dlpack_device__`` methods on the |
| 20 | + array object, which will be called from within ``from_dlpack``, to query |
| 21 | + what device the array is on (may be needed to pass in the correct |
| 22 | + stream, e.g. in the case of multiple GPUs) and to access the data. |
| 23 | + |
| 24 | + |
| 25 | +Semantics |
| 26 | +~~~~~~~~~ |
| 27 | + |
| 28 | +DLPack describes the memory layout of dense, strided, n-dimensional arrays. |
| 29 | +When a user calls ``y = from_dlpack(x)``, the library implementing ``x`` (the |
| 30 | +"producer") will provide access to the data from ``x`` to the library |
| 31 | +containing ``from_dlpack`` (the "consumer"). If possible, this must be |
| 32 | +zero-copy (i.e. ``y`` will be a *view* on ``x``). If not possible, that library |
| 33 | +may flag this and make a copy of the data. In both cases: |
| 34 | + |
| 35 | +- The producer keeps owning the memory of ``x`` (and ``y`` if a copy is made) |
| 36 | +- ``y`` may or may not be a view, therefore the user must keep the recommendation to |
| 37 | + avoid mutating ``y`` in mind - see :ref:`copyview-mutability`. |
| 38 | +- Both ``x`` and ``y`` may continue to be used just like arrays created in other ways. |
| 39 | + |
| 40 | +If an array that is accessed via the interchange protocol lives on a device that |
| 41 | +the requesting (consumer) library does not support, it is recommended to raise a |
| 42 | +``BufferError``, unless an explicit copy is requested (see below) and the producer |
| 43 | +can support the request. |
| 44 | + |
| 45 | +Stream handling through the ``stream`` keyword applies to CUDA and ROCm (perhaps |
| 46 | +to other devices that have a stream concept as well, however those haven't been |
| 47 | +considered in detail). The consumer must pass the stream it will use to the |
| 48 | +producer; the producer must synchronize or wait on the stream when necessary. |
| 49 | +In the common case of the default stream being used, synchronization will be |
| 50 | +unnecessary so asynchronous execution is enabled. |
| 51 | + |
| 52 | +Starting Python array API standard v2023, a copy can be explicitly requested (or |
| 53 | +disabled) through the new ``copy`` argument of ``from_dlpack()``. When a copy is |
| 54 | +made, the producer must set the ``DLPACK_FLAG_BITMASK_IS_COPIED`` bit flag. |
| 55 | +It is also possible to request cross-device copies through the new ``device`` |
| 56 | +argument, though the v2023 standard only mandates the support of ``kDLCPU``. |
| 57 | + |
| 58 | +Implementation |
| 59 | +~~~~~~~~~~~~~~ |
| 60 | + |
| 61 | +*Note that while this API standard largely tries to avoid discussing |
| 62 | +implementation details, some discussion and requirements are needed |
| 63 | +here because data interchange requires coordination between |
| 64 | +implementers on, e.g., memory management.* |
| 65 | + |
| 66 | +.. image:: /_static/images/DLPack_diagram.png |
| 67 | + :alt: Diagram of DLPack structs |
| 68 | + |
| 69 | +*DLPack diagram. Dark blue are the structs it defines, light blue |
| 70 | +struct members, gray text enum values of supported devices and data |
| 71 | +types.* |
| 72 | + |
| 73 | +Starting Python array API standard v2023, a new ``max_version`` argument |
| 74 | +is added to ``__dlpack__`` for the consumer to signal the producer the |
| 75 | +maximal supported DLPack version. Starting DLPack 1.0, the ``DLManagedTensorVersioned`` |
| 76 | +struct should be used and the existing ``DLManagedTensor`` struct is considered |
| 77 | +deprecated, though a library should try to support both during the transition |
| 78 | +period if possible. |
| 79 | + |
| 80 | +In the rest of this document, ``DLManagedTensorVersioned`` and ``DLManagedTensor`` |
| 81 | +are treated as synonyms, assuming a proper handling of ``max_version`` has been |
| 82 | +done to choose the right struct. As far as the capsule name is concerned, |
| 83 | +when ``DLManagedTensorVersioned`` is in use the capsule names ``dltensor`` |
| 84 | +and ``used_dltensor`` will need a ``_versioned`` suffix. |
| 85 | + |
| 86 | +The ``__dlpack__`` method will produce a ``PyCapsule`` containing a |
| 87 | +``DLManagedTensor``, which will be consumed immediately within |
| 88 | +``from_dlpack`` - therefore it is consumed exactly once, and it will not be |
| 89 | +visible to users of the Python API. |
| 90 | + |
| 91 | +The producer must set the ``PyCapsule`` name to ``"dltensor"`` so that |
| 92 | +it can be inspected by name, and set ``PyCapsule_Destructor`` that calls |
| 93 | +the ``deleter`` of the ``DLManagedTensor`` when the ``"dltensor"``-named |
| 94 | +capsule is no longer needed. |
| 95 | + |
| 96 | +The consumer must transer ownership of the ``DLManagedTensor`` from the |
| 97 | +capsule to its own object. It does so by renaming the capsule to |
| 98 | +``"used_dltensor"`` to ensure that ``PyCapsule_Destructor`` will not get |
| 99 | +called (ensured if ``PyCapsule_Destructor`` calls ``deleter`` only for |
| 100 | +capsules whose name is ``"dltensor"``), but the ``deleter`` of the |
| 101 | +``DLManagedTensor`` will be called by the destructor of the consumer |
| 102 | +library object created to own the ``DLManagedTensor`` obtained from the |
| 103 | +capsule. Below is an example of the capsule deleter written in the Python |
| 104 | +C API which is called either when the refcount on the capsule named |
| 105 | +``"dltensor"`` reaches zero or the consumer decides to deallocate its array: |
| 106 | + |
| 107 | +.. code-block:: C |
| 108 | +
|
| 109 | + static void dlpack_capsule_deleter(PyObject *self){ |
| 110 | + if (PyCapsule_IsValid(self, "used_dltensor")) { |
| 111 | + return; /* Do nothing if the capsule has been consumed. */ |
| 112 | + } |
| 113 | +
|
| 114 | + DLManagedTensor *managed = (DLManagedTensor *)PyCapsule_GetPointer(self, "dltensor"); |
| 115 | + if (managed == NULL) { |
| 116 | + PyErr_WriteUnraisable(self); |
| 117 | + return; |
| 118 | + } |
| 119 | + /* the spec says the deleter can be NULL if there is no way for the caller to provide a reasonable destructor. */ |
| 120 | + if (managed->deleter) { |
| 121 | + managed->deleter(managed); |
| 122 | + } |
| 123 | + } |
| 124 | +
|
| 125 | +Note: the capsule names ``"dltensor"`` and ``"used_dltensor"`` must be |
| 126 | +statically allocated. |
| 127 | + |
| 128 | +The ``DLManagedTensor`` deleter must ensure that sharing beyond Python |
| 129 | +boundaries is possible, this means that the GIL must be acquired explicitly |
| 130 | +if it uses Python objects or API. |
| 131 | +In Python, the deleter usually needs to ``Py_DECREF()`` the original owner |
| 132 | +and free the ``DLManagedTensor`` allocation. |
| 133 | +For example, NumPy uses the following code to ensure sharing with arbitrary |
| 134 | +non-Python code is safe: |
| 135 | + |
| 136 | +.. code-block:: C |
| 137 | +
|
| 138 | + static void array_dlpack_deleter(DLManagedTensor *self) |
| 139 | + { |
| 140 | + /* |
| 141 | + * Leak the Python object if the Python runtime is not available. |
| 142 | + * This can happen if the DLPack consumer destroys the tensor late |
| 143 | + * after Python runtime finalization (for example in case the tensor |
| 144 | + * was indirectly kept alive by a C++ static variable). |
| 145 | + */ |
| 146 | + if (!Py_IsInitialized()) { |
| 147 | + return; |
| 148 | + } |
| 149 | +
|
| 150 | + PyGILState_STATE state = PyGILState_Ensure(); |
| 151 | +
|
| 152 | + PyObject *array = (PyObject *)self->manager_ctx; |
| 153 | + // This will also free the shape and strides as it's one allocation. |
| 154 | + PyMem_Free(self); |
| 155 | + Py_XDECREF(array); |
| 156 | +
|
| 157 | + PyGILState_Release(state); |
| 158 | + } |
| 159 | +
|
| 160 | +When the ``strides`` field in the ``DLTensor`` struct is ``NULL``, it indicates a |
| 161 | +row-major compact array. If the array is of size zero, the data pointer in |
| 162 | +``DLTensor`` should be set to either ``NULL`` or ``0``. |
| 163 | + |
| 164 | +For further details on DLPack design and how to implement support for it, |
| 165 | +refer to `github.com/dmlc/dlpack <https://github.com/dmlc/dlpack>`_. |
| 166 | + |
| 167 | +.. warning:: |
| 168 | + DLPack contains a ``device_id``, which will be the device |
| 169 | + ID (an integer, ``0, 1, ...``) which the producer library uses. In |
| 170 | + practice this will likely be the same numbering as that of the |
| 171 | + consumer, however that is not guaranteed. Depending on the hardware |
| 172 | + type, it may be possible for the consumer library implementation to |
| 173 | + look up the actual device from the pointer to the data - this is |
| 174 | + possible for example for CUDA device pointers. |
| 175 | + |
| 176 | + It is recommended that implementers of this array API consider and document |
| 177 | + whether the ``.device`` attribute of the array returned from ``from_dlpack`` is |
| 178 | + guaranteed to be in a certain order or not. |
| 179 | + |
| 180 | + |
| 181 | +Reference Implementations |
| 182 | +~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 183 | + |
| 184 | +Several Python libraries have adopted this standard using Python C API, C++, Cython, |
| 185 | +ctypes, cffi, etc: |
| 186 | + |
| 187 | +* NumPy: `Python C API <https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dlpack.c>`__ |
| 188 | +* CuPy: `Cython <https://github.com/cupy/cupy/blob/master/cupy/_core/dlpack.pyx>`__ |
| 189 | +* Tensorflow: `C++ <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/eager/dlpack.cc>`__, |
| 190 | + `Python wrapper using Python C API <https://github.com/tensorflow/tensorflow/blob/a97b01a4ff009ed84a571c138837130a311e74a7/tensorflow/python/tfe_wrapper.cc#L1562>`__, |
| 191 | + `XLA <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/python/dlpack.cc>`__ |
| 192 | +* PyTorch: `C++ <https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/DLConvertor.cpp>`__, |
| 193 | + `Python wrapper using Python C API <https://github.com/pytorch/pytorch/blob/c22b8a42e6038ed2f6a161114cf3d8faac3f6e9a/torch/csrc/Module.cpp#L355>`__ |
| 194 | +* MXNet: `ctypes <https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/dlpack.py>`__ |
| 195 | +* TVM: `ctypes <https://github.com/apache/tvm/blob/main/python/tvm/_ffi/_ctypes/ndarray.py>`__, |
| 196 | + `Cython <https://github.com/apache/tvm/blob/main/python/tvm/_ffi/_cython/ndarray.pxi>`__ |
| 197 | +* mpi4py: `Cython <https://github.com/mpi4py/mpi4py/blob/master/src/mpi4py/MPI/asdlpack.pxi>`_ |
0 commit comments