Skip to content

[API Proposal]: IsDense/Sparse/Contiguous #111964

Closed
@michaelgsharp

Description

@michaelgsharp

Background and motivation

Tensors can be either be dense (all elements are next to each other in memory and all elements are represented in memory) or sparse (either elements are not next to each other in memory say from slicing, or you are manipulating the strides to conserve memory while representing more elements than are actually present in memory). We have a way to determine this internally, but there is no public way of doing this. In the cases a user needs to know whether the tensor is dense or sparse, a user has to figure out how to calculate that themselves. We need to expose this to users.

We expect this to be a common query so instead of calculating it each time it should be a property.

API Proposal

namespace System.Numerics.Tensors;

public interface IReadOnlyTensor
{
    bool IsSparse { get; }
    bool IsContiguous { get; }
    bool IsDense { get; }
    bool IsLinear { get; }
    IReadOnlyTensor ToDenseTensor();
}

API Usage

Tensor<int> tensor = Tensor.Create<int>([1, 2, 3, 4], [2, 2]);
// Will be false.
bool dense = tensor.IsSparse;

// Create a tensor with only 1 element in memory but actually representing a 2 x 2 tensor with all values 1.
Tensor<int> tensor = Tensor.Create<int>([1], [2, 2], [0, 0]);
// Will be true.
bool dense = tensor.IsSparse;

Alternative Designs

IsSparse is what is used by PyTorch, but we could do the inverse on our side and something like IsDense, but we would then need another parameter IsContiguous since dense/contiguous could be separate. Sparse does have some nuance about exactly what it means though in other frameworks (see glossary below).

IsView - could be used to refer to anything that is not fully dense/contiguous. See glossary below for additional details.

Onnx Runtime does not have any property to represent this, they just check if the tensor is a DenseTensor<T>. I don't think this is a good approach for us.

IsDistinct - could be used when the data matches exactly 1 to 1 with its representation. No other frameworks that I could find use this though, so it would be very different from existing frameworks.

Risks

This would be a new api in a preview object, so the risks are very minimal.

Glossary

PyTorch Sparse - https://pytorch.org/docs/stable/sparse.html. PyTorch uses Sparse to refer to a tensor where "elements are mostly zero valued." They support 5 different formats of sparse (see prior link). If a tensor is sparse, they also track how many dimensions are represented in a sparse format and how many are represented in a dense format (in the same tensor).

PyTorch Views - https://pytorch.org/docs/stable/tensor_view.html. PyTorch uses a view to essentially let you know that this "Tensor" is actually pointing to the memory of another tensor. Kinda like our TensorSpan. But these views don't have to be contiguous.

PyTorch IsContiguous - https://pytorch.org/docs/stable/generated/torch.Tensor.is_contiguous.html. Basically the same as we would consider contiguous (the data is contiguous in memory), but they provide additional options/details about how the memory is laid out.

OnnxRuntime C# api doesn't have sparse tensors, but their python api does and it matches exactly PyTorch, https://onnxruntime.ai/docs/api/python/api_summary.html#sparsetensor. In fact you can bind PyTorch tensors directly as input/ouput.

TensorFlow also supports sparse tensors, https://www.tensorflow.org/guide/sparse_tensor, they are interepretted the same way as PyTorch, but they only support 1 format compared to PyTorch's 5.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions