Skip to content

Commit

Permalink
Merge pull request #9 from asmeurer/docs-update
Browse files Browse the repository at this point in the history
Update documentation
  • Loading branch information
asmeurer authored Jan 24, 2024
2 parents e8a9164 + c0f1534 commit 397713f
Show file tree
Hide file tree
Showing 4 changed files with 230 additions and 117 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

.DS_Store
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# array-api-strict Changelog

## 1.0 (????)

This is the first release of `array_api_strict`. It is extracted from
`numpy.array_api`, which was included as an experimental submodule in NumPy
versions prior to 2.0. Note that the commit history in this repository is
extracted from the git history of numpy/array_api/ (see the [README](README.md)).

Additionally, the following changes are new to `array_api_strict` from
`numpy.array_api` in NumPy 1.26 (the last NumPy feature release to include
`numpy.array_api`):

- ``array_api_strict`` was made more portable. In particular:

- ``array_api_strict`` no longer uses ``"cpu"`` as its "device", but rather a
separate ``CPU_DEVICE`` object (which is not accessible in the namespace).
This is because "cpu" is not part of the array API standard.

- ``array_api_strict`` now uses separate wrapped objects for dtypes.
Previously it reused the ``numpy`` dtype objects. This makes it clear
which behaviors on dtypes are part of the array API standard (effectively,
the standard only requires ``==`` on dtype objects).

- ``numpy.array_api.nonzero`` now errors on zero-dimensional arrays, as
required by the array API standard.

- Support for the optional [fft
extension](https://data-apis.org/array-api/latest/extensions/fourier_transform_functions.html)
was added.
188 changes: 184 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,193 @@
array-api-strict
================
# array-api-strict

A strict, minimal implementation of the [Python array
`array_api_strict` is a strict, minimal implementation of the [Python array
API](https://data-apis.org/array-api/latest/)

The purpose of array-api-strict is to provide an implementation of the array
API for consuming libraries to test against so they can be completely sure
their usage of the array API is portable.

It is *not* intended to be used by end-users. End-users of the array API
should just use their favorite array library (NumPy, CuPy, PyTorch, etc.) as
usual. It is also not intended to be used as a dependency by consuming
libraries. Consuming library code should use the
[array-api-compat](https://github.com/data-apis/array-api-compat) package to
support the array API. Rather, it is intended to be used in the test suites of
consuming libraries to test their array API usage.

## Install

`array-api-strict` is available on both
[PyPI](https://pypi.org/project/array-api-strict/)

```
python -m pip install array-api-strict
```

and [Conda-forge](https://anaconda.org/conda-forge/array-api-strict)

```
conda install --channel conda-forge array-api-strict
```

array-api-strict supports NumPy 1.26 and (the upcoming) NumPy 2.0.

## Rationale

The array API has many functions and behaviors that are required to be
implemented by conforming libraries, but it does not, in most cases, disallow
implementing additional functions, keyword arguments, and behaviors that
aren't explicitly required by the standard.

However, this poses a problem for consumers of the array API, as they may
accidentally use a function or rely on a behavior which just happens to be
implemented in every array library they test against (e.g., NumPy and
PyTorch), but isn't required by the standard and may not be included in other
libraries.

array-api-strict solves this problem by providing a strict, minimal
implementation of the array API standard. Only those functions and behaviors
that are explicitly *required* by the standard are implemented. For example,
most NumPy functions accept Python scalars as inputs:

```py
>>> import numpy as np
>>> np.sin(0.0)
0.0
```

However, the standard only specifies function inputs on `Array` objects. And
indeed, some libraries, such as PyTorch, do not allow this:

```py
>>> import torch
>>> torch.sin(0.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sin(): argument 'input' (position 1) must be Tensor, not float
```

In array-api-strict, this is also an error:

```py
>>> import array_api_strict as xp
>>> xp.sin(0.0)
Traceback (most recent call last):
...
AttributeError: 'float' object has no attribute 'dtype'
```

Here is an (incomplete) list of the sorts of ways that array-api-strict is
strict/minimal:

- Only those functions and methods that are [defined in the
standard](https://data-apis.org/array-api/latest/API_specification/index.html)
are included.

- In those functions, only the keyword-arguments that are defined by the
standard are included. All signatures in array-api-strict use
[positional-only
arguments](https://data-apis.org/array-api/latest/API_specification/function_and_method_signatures.html#function-and-method-signatures).
As noted above, only `array_api_strict` array objects are accepted by
functions, except in the places where the standard allows Python scalars
(i.e., functions do not automatically call `asarray` on their inputs).

- Only those [dtypes that are defined in the
standard](https://data-apis.org/array-api/latest/API_specification/data_types.html)
are included.

- All functions and methods reject inputs if the standard does not *require*
the input dtype(s) to be supported. This is one of the most restrictive
aspects of the library. For example, in NumPy, most transcendental functions
like `sin` will accept integer array inputs, but the [standard only requires
them to accept floating-point
inputs](https://data-apis.org/array-api/latest/API_specification/generated/array_api.sin.html#array_api.sin),
so in array-api-strict, `sin(integer_array)` will raise an exception.

- The
[indexing](https://data-apis.org/array-api/latest/API_specification/indexing.html)
semantics required by the standard are limited compared to those implemented
by NumPy (e.g., out-of-bounds slices are not supported, integer array
indexing is not supported, only a single boolean array index is supported).

- There are no distinct "scalar" objects as in NumPy. There are only 0-D
arrays.

- Dtype objects are just empty objects that only implement [equality
comparison](https://data-apis.org/array-api/latest/API_specification/generated/array_api.data_types.__eq__.html).
The way to access dtype objects in the standard is by name, like
`xp.float32`.

- The array object type itself is private and should not be accessed.
Subclassing or otherwise trying to directly initialize this object is not
supported. Arrays should be created with one of the [array creation
functions](https://data-apis.org/array-api/latest/API_specification/creation_functions.html)
such as `asarray`.

## Caveats

array-api-strict is a thin pure Python wrapper around NumPy. NumPy 2.0 fully
supports the array API but NumPy 1.26 does not, so many behaviors are wrapped
in NumPy 1.26 to provide array API compatible behavior. Although it is based
on NumPy, mixing NumPy arrays with array-api-strict arrays is not supported.
This should generally raise an error, as it indicates a potential portability
issue, but this hasn't necessarily been tested thoroughly.

1. array-api-strict is validated against the [array API test
suite](https://github.com/data-apis/array-api-tests). However, there may be
a few minor instances where NumPy deviates from the standard in a way that
is inconvenient to workaround in array-api-strict, since it aims to remain
pure Python. You can see the full list of tests that are known to fail in
the [xfails
file](https://github.com/data-apis/array-api-strict/blob/main/array-api-tests-xfails.txt).

The most notable of these is that in NumPy 1.26, the `copy=False` flag is
not implemented for `asarray` and therefore `array_api_strict` raises
`NotImplementedError` in that case.

2. Since NumPy is a CPU-only library, the [device
support](https://data-apis.org/array-api/latest/design_topics/device_support.html)
in array-api-strict is superficial only. `x.device` is always a (private)
`CPU_DEVICE` object, and `device` keywords to creation functions only
accept either this object or `None`. A future version of array-api-strict
[may add support for a CuPy
backend](https://github.com/data-apis/array-api-strict/issues/5) so that
more significant device support can be tested.

3. Although only array types are expected in array-api-strict functions,
currently most functions do not do extensive type checking on their inputs,
so a sufficiently duck-typed object may pass through silently (or at best,
you may get `AttributeError` instead of `TypeError`). However, all type
signatures have type annotations (based on those from the standard), so
this deviation may be tested with type checking. This [behavior may improve
in the future](https://github.com/data-apis/array-api-strict/issues/6).

4. There are some behaviors in the standard that are not required to be
implemented by libraries that cannot support [data dependent
shapes](https://data-apis.org/array-api/latest/design_topics/data_dependent_output_shapes.html).
This includes [the `unique_*`
functions](https://data-apis.org/array-api/latest/API_specification/set_functions.html),
[boolean array
indexing](https://data-apis.org/array-api/latest/API_specification/indexing.html#boolean-array-indexing),
and the
[`nonzero`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.nonzero.html)
function. array-api-strict currently implements all of these. In the
future, [there may be a way to disable them](https://github.com/data-apis/array-api-strict/issues/7).

5. array-api-strict currently only supports the latest version of the array
API standard. [This may change in the future depending on
need](https://github.com/data-apis/array-api-strict/issues/8).

## Usage

TODO: Add a sample CI script here.

## Relationship to `numpy.array_api`

Previously this implementation was available as `numpy.array_api`, but it was
moved to a separate package for NumPy 2.0.

Note: the history of this repo prior to commit
Note that the history of this repo prior to commit
fbefd42e4d11e9be20e0a4785f2619fc1aef1e7c was generated automatically
from the numpy git history, using the following
[git-filter-repo](https://github.com/newren/git-filter-repo) command:
Expand Down
127 changes: 14 additions & 113 deletions array_api_strict/__init__.py
Original file line number Diff line number Diff line change
@@ -1,117 +1,18 @@
"""
A NumPy sub-namespace that conforms to the Python array API standard.
This submodule accompanies NEP 47, which proposes its inclusion in NumPy. It
is still considered experimental, and will issue a warning when imported.
This is a proof-of-concept namespace that wraps the corresponding NumPy
functions to give a conforming implementation of the Python array API standard
(https://data-apis.github.io/array-api/latest/). The standard is currently in
an RFC phase and comments on it are both welcome and encouraged. Comments
should be made either at https://github.com/data-apis/array-api or at
https://github.com/data-apis/consortium-feedback/discussions.
NumPy already follows the proposed spec for the most part, so this module
serves mostly as a thin wrapper around it. However, NumPy also implements a
lot of behavior that is not included in the spec, so this serves as a
restricted subset of the API. Only those functions that are part of the spec
are included in this namespace, and all functions are given with the exact
signature given in the spec, including the use of position-only arguments, and
omitting any extra keyword arguments implemented by NumPy but not part of the
spec. The behavior of some functions is also modified from the NumPy behavior
to conform to the standard. Note that the underlying array object itself is
wrapped in a wrapper Array() class, but is otherwise unchanged. This submodule
is implemented in pure Python with no C extensions.
The array API spec is designed as a "minimal API subset" and explicitly allows
libraries to include behaviors not specified by it. But users of this module
that intend to write portable code should be aware that only those behaviors
that are listed in the spec are guaranteed to be implemented across libraries.
Consequently, the NumPy implementation was chosen to be both conforming and
minimal, so that users can use this implementation of the array API namespace
and be sure that behaviors that it defines will be available in conforming
namespaces from other libraries.
A few notes about the current state of this submodule:
- There is a test suite that tests modules against the array API standard at
https://github.com/data-apis/array-api-tests. The test suite is still a work
in progress, but the existing tests pass on this module, with a few
exceptions:
- DLPack support (see https://github.com/data-apis/array-api/pull/106) is
not included here, as it requires a full implementation in NumPy proper
first.
The test suite is not yet complete, and even the tests that exist are not
guaranteed to give a comprehensive coverage of the spec. Therefore, when
reviewing and using this submodule, you should refer to the standard
documents themselves. There are some tests in array_api_strict.tests, but
they primarily focus on things that are not tested by the official array API
test suite.
- There is a custom array object, array_api_strict.Array, which is returned by
all functions in this module. All functions in the array API namespace
implicitly assume that they will only receive this object as input. The only
way to create instances of this object is to use one of the array creation
functions. It does not have a public constructor on the object itself. The
object is a small wrapper class around numpy.ndarray. The main purpose of it
is to restrict the namespace of the array object to only those dtypes and
only those methods that are required by the spec, as well as to limit/change
certain behavior that differs in the spec. In particular:
- The array API namespace does not have scalar objects, only 0-D arrays.
Operations on Array that would create a scalar in NumPy create a 0-D
array.
- Indexing: Only a subset of indices supported by NumPy are required by the
spec. The Array object restricts indexing to only allow those types of
indices that are required by the spec. See the docstring of the
array_api_strict.Array._validate_indices helper function for more
information.
- Type promotion: Some type promotion rules are different in the spec. In
particular, the spec does not have any value-based casting. The spec also
does not require cross-kind casting, like integer -> floating-point. Only
those promotions that are explicitly required by the array API
specification are allowed in this module. See NEP 47 for more info.
- Functions do not automatically call asarray() on their input, and will not
work if the input type is not Array. The exception is array creation
functions, and Python operators on the Array object, which accept Python
scalars of the same type as the array dtype.
- All functions include type annotations, corresponding to those given in the
spec (see _typing.py for definitions of some custom types). These do not
currently fully pass mypy due to some limitations in mypy.
- Dtype objects are just the NumPy dtype objects, e.g., float64 =
np.dtype('float64'). The spec does not require any behavior on these dtype
objects other than that they be accessible by name and be comparable by
equality, but it was considered too much extra complexity to create custom
objects to represent dtypes.
- All places where the implementations in this submodule are known to deviate
from their corresponding functions in NumPy are marked with "# Note:"
comments.
Still TODO in this module are:
- DLPack support for numpy.ndarray is still in progress. See
https://github.com/numpy/numpy/pull/19083.
- The copy=False keyword argument to asarray() is not yet implemented. This
requires support in numpy.asarray() first.
- Some functions are not yet fully tested in the array API test suite, and may
require updates that are not yet known until the tests are written.
- The spec is still in an RFC phase and may still have minor updates, which
will need to be reflected here.
- Complex number support in array API spec is planned but not yet finalized,
as are the fft extension and certain linear algebra functions such as eig
that require complex dtypes.
array_api_strict is a strict, minimal implementation of the Python array
API (https://data-apis.org/array-api/latest/)
The purpose of array-api-strict is to provide an implementation of the array
API for consuming libraries to test against so they can be completely sure
their usage of the array API is portable.
It is *not* intended to be used by end-users. End-users of the array API
should just use their favorite array library (NumPy, CuPy, PyTorch, etc.) as
usual. It is also not intended to be used as a dependency by consuming
libraries. Consuming library code should use the
array-api-compat (https://github.com/data-apis/array-api-compat) package to
support the array API. Rather, it is intended to be used in the test suites of
consuming libraries to test their array API usage.
"""

Expand Down

0 comments on commit 397713f

Please sign in to comment.