Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding order argument to asarray #571

Closed
fcharras opened this issue Dec 14, 2022 · 9 comments
Closed

Adding order argument to asarray #571

fcharras opened this issue Dec 14, 2022 · 9 comments
Labels
API change Changes to existing functions or objects in the API. status: Rejected Proposed change was not accepted.

Comments

@fcharras
Copy link

A usecase I'm confronted to is to ensure that a given array has either a C or F memory layout.

The layout can be important when optimizing memory access patterns in lower level code. For instance in sklearn KMeans requires C order. Typically xp.asarray(X, order=order) is used to check that the input order matches the required order (if copy=False), or will trigger a copy if necessary to get the expected order (if copy!=False), before passing the array to lower level code (e.g cython or jitted functions).

Here's an example of API implemented by dpctl.tensor, an array-API compatible array library, that also exposes the order parameter to asarray.

@leofang
Copy link
Contributor

leofang commented Dec 14, 2022

I think we purposely leave strides (and thus order) out of the standard because it's considered too low-level. However this information is totally available through DLPack, I would think this should meet your need?

@fcharras
Copy link
Author

fcharras commented Dec 20, 2022

For my particular needs I have a workaround already, in general I would think that people that really need this can think of workarounds in some way, but at the cost of some extra lines of code that are going to be specific to one array library. Having order in asarray could help having better code for developers that do input validation in python.

@leofang
Copy link
Contributor

leofang commented Jan 15, 2023

at the cost of some extra lines of code that are going to be specific to one array library

Sorry but I disagree. If you check the strides obtained through DLPack (see, for example, mpi4py) it's generic to the any array library. This is arguably not natural to Python-only developers, and suggestions are welcome in this DLPack thread: dmlc/dlpack#116 🙂

Currently Array API is designed to not standardizing over strides (and thus order) so that people do not have to argue over the execution model. For example, it allows the two extremes, PyTorch (which is always C-order unless you tweak it very hard) and Dask (which has no notion of single-node strides, as it's distributed), to implement Array API support.

@fcharras
Copy link
Author

Sorry

No need to be ☺️ thank you for the insights.

I by no mean want to imply that the perspective I offer on the matter goes beyond the scope of the limited project I've started working on, that happens to be somewhat related to the Array API, which made me think this proposal could be relevant. If it resonates with other people experience, that would be interesting ! if it's not the case, let it be. Maybe the issue can remain opened for some time to see if there's other feedback ?

And as you say maybe I don't need strides at all myself. After all, all arrays can be seen as C-contiguous, it's a matter of defining carefully row and column indexes 🤷 . But then it's an abstraction that must be implemented at the level of my own user code if I really want to and it seems very verbose.

In the end I don't think the dlpack interface answers what I'm looking for specifically. Unless I'm missing something, the from_dlpack function isn't supposed to take any other argument (such as strides/order) than the object exposing __dlpack__, and the PyCapsule object doesn't expose attributes readable from python code, which are things I would have been looking for.

@rgommers rgommers added API change Changes to existing functions or objects in the API. status: Rejected Proposed change was not accepted. labels Jan 16, 2023
@rgommers
Copy link
Member

This has been brought up a couple of times before, although I cannot locate the most relevant discussion right now. The problem is mainly that the array API standard wants to only standardize things that are implemented or can be implemented in all array libraries. This is not the case for order=. TensorFlow, JAX and PyTorch all do not have Fortran memory order as a user-visible concept, and do not want to add it.

On top of that I'd say that it's not good UX. The problem with NumPy is that, due to the lack of more advanced facilities for automatically optimizing function calls for performance, it exposed a host of things that really should be implementation details (like views and C/Fortran memory order) directly to end users.

For order=, it will have to remain specific to the libraries that have such a concept. It can be implemented as a superset of what's in the array API standard, so there is no problem here.

The layout can be important when optimizing memory access patterns in lower level code. For instance in sklearn KMeans requires C order.

This is very much true. But it's an implementation detail; how to get optimized code internally within a library should be a separate question from what API is exposed to end users.

@fcharras
Copy link
Author

Thanks for your in-depth explanations @rgommers @leofang it makes the choice clear for me.

This has been brought up a couple of times before, although I cannot locate the most relevant discussion right now

Couldn't find previous discussions either, this issue now documents the answer :-)

It can be implemented as a superset of what's in the array API standard, so there is no problem here.

Yes that sounds like the right thing to do. (I hope array libraries that want to expose additional parameters can use the same semantics for similar concepts)

@rgommers
Copy link
Member

rgommers commented Jan 16, 2023

I hope array libraries that want to expose additional parameters can use the same semantics for similar concepts

I hope so too. AFAIK that's the case here, libraries that have the concept follow NumPy in API design I believe.

@leofang
Copy link
Contributor

leofang commented Jan 17, 2023

This has been brought up a couple of times before

Yeah IIRC I raised the discussion on strides during the meetings at least twice, and others did too. I had the same idea in mind as @fcharras did that it'd allow better interoperability with 3rd party (non-array/tensor) libraries. But in the end people (the stakeholders) are happy to settle on this choice for reasons Ralf and I mentioned above 🙂

@rgommers
Copy link
Member

Let's close this, since the reasons given seem enough. Thanks again @fcharras and @leofang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change Changes to existing functions or objects in the API. status: Rejected Proposed change was not accepted.
Projects
None yet
Development

No branches or pull requests

3 participants