Skip to content

Commit 1b53995

Browse files
author
dlpack-gh-actions-bot
committed
Generate DLPack website
0 parents  commit 1b53995

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+26941
-0
lines changed

.nojekyll

Whitespace-only changes.

latest/.buildinfo

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: aa5259f4f73fb5580435e0671ae0c59c
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7

latest/.doctrees/c_api.doctree

129 KB
Binary file not shown.

latest/.doctrees/environment.pickle

152 KB
Binary file not shown.

latest/.doctrees/index.doctree

22.3 KB
Binary file not shown.

latest/.doctrees/python_spec.doctree

41.3 KB
Binary file not shown.

latest/_images/DLPack_diagram.png

23.4 KB
Loading

latest/_sources/c_api.rst.txt

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
.. _c_api:
2+
3+
C API (``dlpack.h``)
4+
====================
5+
6+
Macros
7+
~~~~~~
8+
9+
.. doxygendefine:: DLPACK_EXTERN_C
10+
11+
.. doxygendefine:: DLPACK_MAJOR_VERSION
12+
13+
.. doxygendefine:: DLPACK_MINOR_VERSION
14+
15+
.. doxygendefine:: DLPACK_DLL
16+
17+
.. doxygendefine:: DLPACK_FLAG_BITMASK_READ_ONLY
18+
19+
.. doxygendefine:: DLPACK_FLAG_BITMASK_IS_COPIED
20+
21+
Enumerations
22+
~~~~~~~~~~~~
23+
24+
.. doxygenenum:: DLDeviceType
25+
26+
.. doxygenenum:: DLDataTypeCode
27+
28+
Structs
29+
~~~~~~~
30+
31+
.. doxygenstruct:: DLPackVersion
32+
:members:
33+
34+
.. doxygenstruct:: DLDevice
35+
:members:
36+
37+
.. doxygenstruct:: DLDataType
38+
:members:
39+
40+
.. doxygenstruct:: DLTensor
41+
:members:
42+
43+
.. doxygenstruct:: DLManagedTensor
44+
:members:
45+
46+
.. doxygenstruct:: DLManagedTensorVersioned
47+
:members:

latest/_sources/index.rst.txt

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
Welcome to DLPack's documentation!
2+
==================================
3+
4+
5+
Purpose
6+
~~~~~~~
7+
8+
In order for an ndarray system to interact with a variety of frameworks, a
9+
stable in-memory data structure is needed.
10+
11+
DLPack is one such data structure that allows exchange between major
12+
frameworks. It is developed with inputs from many deep learning system core
13+
developers. Highlights include:
14+
15+
* Minimum and stable: :ref:`simple header <c_api>`
16+
* Designed for cross hardware: CPU, CUDA, OpenCL, Vulkan, Metal, VPI, ROCm,
17+
WebGPU, Hexagon
18+
* Already a standard with wide community adoption and support:
19+
20+
* `NumPy <https://numpy.org/doc/stable/release/1.22.0-notes.html#add-nep-47-compatible-dlpack-support>`_
21+
* `CuPy <https://docs.cupy.dev/en/stable/reference/generated/cupy.fromDlpack.html>`_
22+
* `PyTorch <https://pytorch.org/docs/stable/dlpack.html>`_
23+
* `Tensorflow <https://www.tensorflow.org/api_docs/python/tf/experimental/dlpack/from_dlpack>`_
24+
* `MXNet <https://mxnet.apache.org/versions/master/api/python/docs/_modules/mxnet/dlpack.html>`_
25+
* `TVM <https://tvm.apache.org/docs/reference/api/python/contrib.html#module-tvm.contrib.dlpack>`_
26+
* `mpi4py <https://mpi4py.readthedocs.io/en/stable/overview.html#support-for-gpu-aware-mpi>`_
27+
28+
* Clean C ABI compatible.
29+
30+
* Means you can create and access it from any language.
31+
* It is also essential for building JIT and AOT compilers to support these
32+
data types.
33+
34+
35+
Scope
36+
~~~~~
37+
38+
The main design rationale of DLPack is the minimalism. DLPack drops the
39+
consideration of allocator, device API and focus on the minimum data
40+
structure. While still considering the need for cross hardware support
41+
(e.g. the data field is opaque for platforms that does not support normal
42+
addressing).
43+
44+
It also simplifies some of the design to remove legacy issues (e.g. everything
45+
assumes to be row major, strides can be used to support other case, and avoid
46+
the complexity to consider more layouts).
47+
48+
49+
Roadmap
50+
~~~~~~~
51+
52+
* C API that could be exposed as a new Python attribute ``__dlpack_info__``
53+
for returning API and ABI versions. (see `#34 <https://github.com/dmlc/dlpack/issues/34>`_,
54+
`#72 <https://github.com/dmlc/dlpack/pull/72>`_)
55+
* Clarify alignment requirements. (see
56+
`data-apis/array-api#293 <https://github.com/data-apis/array-api/issues/293>`_,
57+
`numpy/numpy#20338 <https://github.com/numpy/numpy/issues/20338>`_,
58+
`data-apis/array-api#293 (comment) <https://github.com/data-apis/array-api/issues/293#issuecomment-964434449>`_)
59+
* Adding support for boolean data type (see `#75 <https://github.com/dmlc/dlpack/issues/75>`_)
60+
* Adding a read-only flag (ABI break) or making it a hard requirement in the spec that
61+
imported arrays should be treated as read-only. (see
62+
`data-apis/consortium-feedback#1 (comment) <https://github.com/data-apis/consortium-feedback/issues/1#issuecomment-675857753>`_,
63+
`data-apis/array-api#191 <https://github.com/data-apis/array-api/issues/191>`_)
64+
* Standardize C interface for stream exchange. (see `#74 <https://github.com/dmlc/dlpack/issues/74>`_,
65+
`#65 <https://github.com/dmlc/dlpack/issues/65>`_)
66+
67+
68+
DLPack Documentation
69+
~~~~~~~~~~~~~~~~~~~~
70+
71+
.. toctree::
72+
:maxdepth: 2
73+
74+
c_api
75+
python_spec
76+
77+
78+
Indices and tables
79+
==================
80+
81+
* :ref:`genindex`
82+
* :ref:`search`

latest/_sources/python_spec.rst.txt

+197
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
.. _python-spec:
2+
3+
Python Specification for DLPack
4+
===============================
5+
6+
The Python specification for DLPack is a part of the
7+
`Python array API standard <https://data-apis.org/array-api/latest/index.html>`_.
8+
More details about the spec can be found under the :ref:`data-interchange` page.
9+
10+
11+
Syntax for data interchange with DLPack
12+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13+
14+
The array API will offer the following syntax for data interchange:
15+
16+
1. A ``from_dlpack(x, ...)`` function, which accepts any (array) object with
17+
the two DLPack methods implemented (see below) and uses them to construct
18+
a new array containing the data from ``x``.
19+
2. ``__dlpack__`` and ``__dlpack_device__`` methods on the
20+
array object, which will be called from within ``from_dlpack``, to query
21+
what device the array is on (may be needed to pass in the correct
22+
stream, e.g. in the case of multiple GPUs) and to access the data.
23+
24+
25+
Semantics
26+
~~~~~~~~~
27+
28+
DLPack describes the memory layout of dense, strided, n-dimensional arrays.
29+
When a user calls ``y = from_dlpack(x)``, the library implementing ``x`` (the
30+
"producer") will provide access to the data from ``x`` to the library
31+
containing ``from_dlpack`` (the "consumer"). If possible, this must be
32+
zero-copy (i.e. ``y`` will be a *view* on ``x``). If not possible, that library
33+
may flag this and make a copy of the data. In both cases:
34+
35+
- The producer keeps owning the memory of ``x`` (and ``y`` if a copy is made)
36+
- ``y`` may or may not be a view, therefore the user must keep the recommendation to
37+
avoid mutating ``y`` in mind - see :ref:`copyview-mutability`.
38+
- Both ``x`` and ``y`` may continue to be used just like arrays created in other ways.
39+
40+
If an array that is accessed via the interchange protocol lives on a device that
41+
the requesting (consumer) library does not support, it is recommended to raise a
42+
``BufferError``, unless an explicit copy is requested (see below) and the producer
43+
can support the request.
44+
45+
Stream handling through the ``stream`` keyword applies to CUDA and ROCm (perhaps
46+
to other devices that have a stream concept as well, however those haven't been
47+
considered in detail). The consumer must pass the stream it will use to the
48+
producer; the producer must synchronize or wait on the stream when necessary.
49+
In the common case of the default stream being used, synchronization will be
50+
unnecessary so asynchronous execution is enabled.
51+
52+
Starting Python array API standard v2023, a copy can be explicitly requested (or
53+
disabled) through the new ``copy`` argument of ``from_dlpack()``. When a copy is
54+
made, the producer must set the ``DLPACK_FLAG_BITMASK_IS_COPIED`` bit flag.
55+
It is also possible to request cross-device copies through the new ``device``
56+
argument, though the v2023 standard only mandates the support of ``kDLCPU``.
57+
58+
Implementation
59+
~~~~~~~~~~~~~~
60+
61+
*Note that while this API standard largely tries to avoid discussing
62+
implementation details, some discussion and requirements are needed
63+
here because data interchange requires coordination between
64+
implementers on, e.g., memory management.*
65+
66+
.. image:: /_static/images/DLPack_diagram.png
67+
:alt: Diagram of DLPack structs
68+
69+
*DLPack diagram. Dark blue are the structs it defines, light blue
70+
struct members, gray text enum values of supported devices and data
71+
types.*
72+
73+
Starting Python array API standard v2023, a new ``max_version`` argument
74+
is added to ``__dlpack__`` for the consumer to signal the producer the
75+
maximal supported DLPack version. Starting DLPack 1.0, the ``DLManagedTensorVersioned``
76+
struct should be used and the existing ``DLManagedTensor`` struct is considered
77+
deprecated, though a library should try to support both during the transition
78+
period if possible.
79+
80+
In the rest of this document, ``DLManagedTensorVersioned`` and ``DLManagedTensor``
81+
are treated as synonyms, assuming a proper handling of ``max_version`` has been
82+
done to choose the right struct. As far as the capsule name is concerned,
83+
when ``DLManagedTensorVersioned`` is in use the capsule names ``dltensor``
84+
and ``used_dltensor`` will need a ``_versioned`` suffix.
85+
86+
The ``__dlpack__`` method will produce a ``PyCapsule`` containing a
87+
``DLManagedTensor``, which will be consumed immediately within
88+
``from_dlpack`` - therefore it is consumed exactly once, and it will not be
89+
visible to users of the Python API.
90+
91+
The producer must set the ``PyCapsule`` name to ``"dltensor"`` so that
92+
it can be inspected by name, and set ``PyCapsule_Destructor`` that calls
93+
the ``deleter`` of the ``DLManagedTensor`` when the ``"dltensor"``-named
94+
capsule is no longer needed.
95+
96+
The consumer must transer ownership of the ``DLManagedTensor`` from the
97+
capsule to its own object. It does so by renaming the capsule to
98+
``"used_dltensor"`` to ensure that ``PyCapsule_Destructor`` will not get
99+
called (ensured if ``PyCapsule_Destructor`` calls ``deleter`` only for
100+
capsules whose name is ``"dltensor"``), but the ``deleter`` of the
101+
``DLManagedTensor`` will be called by the destructor of the consumer
102+
library object created to own the ``DLManagedTensor`` obtained from the
103+
capsule. Below is an example of the capsule deleter written in the Python
104+
C API which is called either when the refcount on the capsule named
105+
``"dltensor"`` reaches zero or the consumer decides to deallocate its array:
106+
107+
.. code-block:: C
108+
109+
static void dlpack_capsule_deleter(PyObject *self){
110+
if (PyCapsule_IsValid(self, "used_dltensor")) {
111+
return; /* Do nothing if the capsule has been consumed. */
112+
}
113+
114+
DLManagedTensor *managed = (DLManagedTensor *)PyCapsule_GetPointer(self, "dltensor");
115+
if (managed == NULL) {
116+
PyErr_WriteUnraisable(self);
117+
return;
118+
}
119+
/* the spec says the deleter can be NULL if there is no way for the caller to provide a reasonable destructor. */
120+
if (managed->deleter) {
121+
managed->deleter(managed);
122+
}
123+
}
124+
125+
Note: the capsule names ``"dltensor"`` and ``"used_dltensor"`` must be
126+
statically allocated.
127+
128+
The ``DLManagedTensor`` deleter must ensure that sharing beyond Python
129+
boundaries is possible, this means that the GIL must be acquired explicitly
130+
if it uses Python objects or API.
131+
In Python, the deleter usually needs to ``Py_DECREF()`` the original owner
132+
and free the ``DLManagedTensor`` allocation.
133+
For example, NumPy uses the following code to ensure sharing with arbitrary
134+
non-Python code is safe:
135+
136+
.. code-block:: C
137+
138+
static void array_dlpack_deleter(DLManagedTensor *self)
139+
{
140+
/*
141+
* Leak the Python object if the Python runtime is not available.
142+
* This can happen if the DLPack consumer destroys the tensor late
143+
* after Python runtime finalization (for example in case the tensor
144+
* was indirectly kept alive by a C++ static variable).
145+
*/
146+
if (!Py_IsInitialized()) {
147+
return;
148+
}
149+
150+
PyGILState_STATE state = PyGILState_Ensure();
151+
152+
PyObject *array = (PyObject *)self->manager_ctx;
153+
// This will also free the shape and strides as it's one allocation.
154+
PyMem_Free(self);
155+
Py_XDECREF(array);
156+
157+
PyGILState_Release(state);
158+
}
159+
160+
When the ``strides`` field in the ``DLTensor`` struct is ``NULL``, it indicates a
161+
row-major compact array. If the array is of size zero, the data pointer in
162+
``DLTensor`` should be set to either ``NULL`` or ``0``.
163+
164+
For further details on DLPack design and how to implement support for it,
165+
refer to `github.com/dmlc/dlpack <https://github.com/dmlc/dlpack>`_.
166+
167+
.. warning::
168+
DLPack contains a ``device_id``, which will be the device
169+
ID (an integer, ``0, 1, ...``) which the producer library uses. In
170+
practice this will likely be the same numbering as that of the
171+
consumer, however that is not guaranteed. Depending on the hardware
172+
type, it may be possible for the consumer library implementation to
173+
look up the actual device from the pointer to the data - this is
174+
possible for example for CUDA device pointers.
175+
176+
It is recommended that implementers of this array API consider and document
177+
whether the ``.device`` attribute of the array returned from ``from_dlpack`` is
178+
guaranteed to be in a certain order or not.
179+
180+
181+
Reference Implementations
182+
~~~~~~~~~~~~~~~~~~~~~~~~~
183+
184+
Several Python libraries have adopted this standard using Python C API, C++, Cython,
185+
ctypes, cffi, etc:
186+
187+
* NumPy: `Python C API <https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dlpack.c>`__
188+
* CuPy: `Cython <https://github.com/cupy/cupy/blob/master/cupy/_core/dlpack.pyx>`__
189+
* Tensorflow: `C++ <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/eager/dlpack.cc>`__,
190+
`Python wrapper using Python C API <https://github.com/tensorflow/tensorflow/blob/a97b01a4ff009ed84a571c138837130a311e74a7/tensorflow/python/tfe_wrapper.cc#L1562>`__,
191+
`XLA <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/python/dlpack.cc>`__
192+
* PyTorch: `C++ <https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/DLConvertor.cpp>`__,
193+
`Python wrapper using Python C API <https://github.com/pytorch/pytorch/blob/c22b8a42e6038ed2f6a161114cf3d8faac3f6e9a/torch/csrc/Module.cpp#L355>`__
194+
* MXNet: `ctypes <https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/dlpack.py>`__
195+
* TVM: `ctypes <https://github.com/apache/tvm/blob/main/python/tvm/_ffi/_ctypes/ndarray.py>`__,
196+
`Cython <https://github.com/apache/tvm/blob/main/python/tvm/_ffi/_cython/ndarray.pxi>`__
197+
* mpi4py: `Cython <https://github.com/mpi4py/mpi4py/blob/master/src/mpi4py/MPI/asdlpack.pxi>`_

0 commit comments

Comments
 (0)