Description
Background and motivation
Tensors can be either be dense (all elements are next to each other in memory and all elements are represented in memory) or sparse (either elements are not next to each other in memory say from slicing, or you are manipulating the strides to conserve memory while representing more elements than are actually present in memory). We have a way to determine this internally, but there is no public way of doing this. In the cases a user needs to know whether the tensor is dense or sparse, a user has to figure out how to calculate that themselves. We need to expose this to users.
We expect this to be a common query so instead of calculating it each time it should be a property.
API Proposal
namespace System.Numerics.Tensors;
public interface IReadOnlyTensor
{
bool IsSparse { get; }
bool IsContiguous { get; }
bool IsDense { get; }
bool IsLinear { get; }
IReadOnlyTensor ToDenseTensor();
}
API Usage
Tensor<int> tensor = Tensor.Create<int>([1, 2, 3, 4], [2, 2]);
// Will be false.
bool dense = tensor.IsSparse;
// Create a tensor with only 1 element in memory but actually representing a 2 x 2 tensor with all values 1.
Tensor<int> tensor = Tensor.Create<int>([1], [2, 2], [0, 0]);
// Will be true.
bool dense = tensor.IsSparse;
Alternative Designs
IsSparse is what is used by PyTorch, but we could do the inverse on our side and something like IsDense
, but we would then need another parameter IsContiguous
since dense/contiguous could be separate. Sparse does have some nuance about exactly what it means though in other frameworks (see glossary below).
IsView - could be used to refer to anything that is not fully dense/contiguous. See glossary below for additional details.
Onnx Runtime does not have any property to represent this, they just check if the tensor is a DenseTensor<T>
. I don't think this is a good approach for us.
IsDistinct - could be used when the data matches exactly 1 to 1 with its representation. No other frameworks that I could find use this though, so it would be very different from existing frameworks.
Risks
This would be a new api in a preview object, so the risks are very minimal.
Glossary
PyTorch Sparse - https://pytorch.org/docs/stable/sparse.html. PyTorch uses Sparse
to refer to a tensor where "elements are mostly zero valued." They support 5 different formats of sparse (see prior link). If a tensor is sparse, they also track how many dimensions are represented in a sparse format and how many are represented in a dense format (in the same tensor).
PyTorch Views - https://pytorch.org/docs/stable/tensor_view.html. PyTorch uses a view to essentially let you know that this "Tensor" is actually pointing to the memory of another tensor. Kinda like our TensorSpan
. But these views don't have to be contiguous.
PyTorch IsContiguous - https://pytorch.org/docs/stable/generated/torch.Tensor.is_contiguous.html. Basically the same as we would consider contiguous (the data is contiguous in memory), but they provide additional options/details about how the memory is laid out.
OnnxRuntime C# api doesn't have sparse tensors, but their python api does and it matches exactly PyTorch, https://onnxruntime.ai/docs/api/python/api_summary.html#sparsetensor. In fact you can bind PyTorch tensors directly as input/ouput.
TensorFlow also supports sparse tensors, https://www.tensorflow.org/guide/sparse_tensor, they are interepretted the same way as PyTorch, but they only support 1 format compared to PyTorch's 5.