Skip to content

Do not require DenseArray #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
orenbenkiki opened this issue Jun 7, 2023 · 27 comments · May be fixed by #6
Open

Do not require DenseArray #2

orenbenkiki opened this issue Jun 7, 2023 · 27 comments · May be fixed by #6

Comments

@orenbenkiki
Copy link

Right not the constructor of StridedView requires the base array to be DenseArray.
However there are types which support strides and are not DenseArray,
so would work in a StridedView - if it were not for the requirements.

How about removing the <:DenseArray from the signature? If the argument has strides, it "should" just work; if it does not, then it would fail with a more informative error message (missing strides) than today (not having a StridedView constructor).

Right now StridedView knows about some special cases (e.g. Transpose) which can be made strided w/o being DenseArray, but this doesn't work with non-Base types. I encountered this trying to pass PyArray from PythonCall to @strided, but this is just one example.

julia> using BenchmarkTools

julia> using Strided

julia> using PythonCall

julia> a = rand(4000, 4000);

julia> pa = PyArray(a);

julia> b = similar(a);

julia> pb = PyArray(b);

julia> @btime b .= (a .+ a) ./ 2;
  18.413 ms (4 allocations: 128 bytes)

julia> @btime pb .= (a .+ a) ./ 2;
  18.694 ms (4 allocations: 128 bytes)

julia> @btime @strided b .= (a .+ a) ./ 2;
  8.684 ms (17 allocations: 1.06 KiB)

julia> @btime @strided pb .= (pa .+ pa) ./ 2;
ERROR: MethodError: no method matching StridedView(::PyArray{Float64, 2, true, true, Float64})
@Jutho
Copy link
Owner

Jutho commented Oct 5, 2023

The problem is that, the way all the methods are implemented, is by assuming that in the end there is a dense memory region underlying the data, as all indices are in the end transformed into a linear index into this dense parent array. I guess I could add a generic fallback for AbstractArray that then tries to apply parent to see if this yields (possibly after several levels of recursion) a DenseArray.

I should check that would work with the PyArray implementation.

@orenbenkiki
Copy link
Author

Why not simply call strides on the array? If it works, great, you can use your implementation. If it doesn't, the user gets a clear error that the array isn't strided, so it makes no sense to use @strided on it.

If you want to fall back to an implementation for any AbstractArray, that's also possible, but that's a separate feature.

@Jutho
Copy link
Owner

Jutho commented Dec 15, 2023

My apologies for not responding sooner. Maybe this is a good strategy nowadays. I think there used to be a time that strides did return something also for arrays that were not strided.

Still, in the end the current implementation needs to have an object in which it can index linearly using the strides to compute the linear index. Maybe that feels for some cases which are not DenseArray?

@orenbenkiki
Copy link
Author

I don't see what strides can (meaningfully) return for arrays that are not strided. Being paranoid one can test that not only it returns (and not missing), that it also returns a tuple of the right amount of integers - otherwise, consider the array non-strided. If it does return a tuple of the right amount of integers, then it is pretty safe to say the array elements can be indexed linearly using the strides, right?

lkdvos added a commit that referenced this issue Jan 3, 2024
This addresses #2.

In particular, while this allows `StridedView`s to be used for non-`DenseArray` types, it does not define a constructor for them.
This reflects the fact that in general, `StridedView`s are only well-defined for `DenseArray`s, but allows users to manually tap into the machinery of `Strided`, in which case it is up to them to ensure correctness.
@Jutho
Copy link
Owner

Jutho commented Jan 3, 2024

My apologies for the late response. I am afraid that what you say is not true. Consider the following:

julia> A = randn(10,10);

julia> B = view(A, 1:2:10, 1:2:10); strides(B)
(2, 20)

julia> B[3,2], B[1 + (3-1)*2 + (2-1)*20], B.parent[1 + (3-1)*2+(2-1)*20]
(-1.3798464342130103, -1.7417661865291487, -1.3798464342130103)

julia> B[3,4], B[1 + (3-1)*2 + (4-1)*20]
ERROR: BoundsError: attempt to access 5×5 view(::Matrix{Float64}, 1:2:9, 1:2:9) with eltype Float64 at index [65]
Stacktrace:
 [1] throw_boundserror(A::SubArray{Float64, 2, Matrix{Float64}, Tuple{StepRange{Int64, Int64}, StepRange{Int64, Int64}}, false}, I::Tuple{Int64})
   @ Base ./abstractarray.jl:744
 [2] checkbounds
   @ ./abstractarray.jl:709 [inlined]
 [3] _getindex
   @ ./abstractarray.jl:1340 [inlined]
 [4] getindex(A::SubArray{Float64, 2, Matrix{Float64}, Tuple{StepRange{Int64, Int64}, StepRange{Int64, Int64}}, false}, I::Int64)
   @ Base ./abstractarray.jl:1296
 [5] top-level scope
   @ REPL[10]:1

As you can see, it is only because I can do linearly indexing in the parent array associated with the view, or more generally, in the memory associated with that array and strides, that I can use the strides to compute all the index information. Hence, I take DenseArray as a substitute for a continuous block of memory in which I can compute index information using strides. I should check the PyArray implementation, but I assume it has something similar, despite maybe not being declared as DenseArray.

@Jutho
Copy link
Owner

Jutho commented Jan 3, 2024

To add to that, we are loosening the restriction of DenseArray in the type definition, but still there will need to be something like a continuous block of memory in which linear indexing using stride calculations makes sense, in order for a custom array to be compatible with StridedViews.jl

@orenbenkiki
Copy link
Author

orenbenkiki commented Jan 4, 2024

I don't follow the code in the issue above (what's "B"?)

If the point is that an array can be strided, but still not contiguous, well, of course.
To be contiguous, one stride must be 1 element, another must be the dimension size, another the product of this dimension size by another dimension size, etc.

That is, if I have a 5x6 array, then strides (1, 5) and (6, 1) indicate a contiguous layout in memory, and any other strides will indicate a "Swiss cheese" array - e.g., (2, 10).

It is reasonable that some operations will only work for strided arrays that are also contiguous. Still, not all such arrays happen to be DenseArray. It still seems to me that the most robust way to check would be to call strides and examine the results.

Am I missing something?

@Jutho
Copy link
Owner

Jutho commented Jan 4, 2024

My example is indeed missing a line, my apologies for that. Not sure how that happend. I have now corrected it.

The point is that linear indexing into an array does not mean that you are accessing the underlying memory. Julia's indexing behaviour makes it such that you can do linear indexing into any AbstractArray as if it were an Array, i.e. as if it were a contiguous block of memory with the strides given by the cumulative product of the previous sizes.

There is no general way to directly index into the memory that backs this AbstractArray, for which one would compute a "linear memory index" using the actual stride information. For the Base wrapper types, this is only possible by recursively calling parent, until the underlying DenseArray is obtained.

@orenbenkiki
Copy link
Author

orenbenkiki commented Jan 4, 2024

I thought the whole point of the strides function was that it only existed in case one can compute the memory address of each element using the linear indexing using the strides. For any array where this isn't the case (e.g., sparse arrays or whatever), then the strides function is simply not implemented. All this has nothing to do with the fact that accessing the array using a single index still works.

Consider this session:

julia> using SparseArrays

julia> a = randn(3, 4)
3x4 Matrix{Float64}:
 -0.811094    0.205791   0.649853   -0.393014
  0.0505602   1.39507   -0.23117     0.158512
  0.838679   -0.235745  -0.0231283   0.623796

julia> a[a .< 0] .= 0
5-element view(::Vector{Float64}, ith eltype Float64:
 0.0
 0.0
 0.0
 0.0
 0.0

julia> a
3x4 Matrix{Float64}:
 0.0        0.205791  0.649853  0.0
 0.0505602  1.39507   0.0       0.158512
 0.838679   0.0       0.0       0.623796

julia> s = sparse(a)
3x4 SparseMatrixCSC{Float64, Int64} with 7 stored entries:
 .          0.205791  0.649853   .
 0.0505602  1.39507   .          0.158512
 0.838679   .         .          0.623796

julia> a[1]
0.0

julia> s[1]
0.0

julia> a[2]
0.05056023785407835

julia> s[2]
0.05056023785407835

julia> a[3]
0.8386790273652533

julia> s[3]
0.8386790273652533

julia> a[4]
0.2057912550575144

julia> s[4]
0.2057912550575144

julia> a[5]
1.39507

julia> s[5]
1.39507

julia> a[6]
0.0

julia> s[6]
0.0

julia> strides(a)
(1, 3)

julia> strides(s)
ERROR: MethodError: no method matching strides(::SparseMatrixCSC{Float64, Int64})
...

Yes, one can write both a[4] and s[4] and get the same (correct) result, but s is not strided while a is. That is, for a the implementation of a[4] would be trivial and efficient - the address of the element in memory is exactly the 4th consecutive element starting from the base array pointer; while for s the implementation of s[4] would be complex and inefficient as it needs to account for the unstored 0 values (the offset in memory would be 3, not 4).

Looking at the parent doesn't tell us anything because anyone is free to define a non-dense array type that has a DenseArray parent. For example, a hypothetical zero-copy view of a triangular matrix out of a full matrix would have a DenseArray parent, but would most definitely not be a strided array.

It seems to me that the only safe way to distinguish between the cases is by checking whether strides is implemented. If it is, we know that the memory address of elements by using a linear combination of the indices using the strides; if it is not implemented, then the array isn't strided. If the array is strided, we can further check whether the strides are such that the array is contiguous in memory or not (e.g., a zero-copy view of every other row of a DenseArray would be strided, but not contiguous).

@akriegman
Copy link

akriegman commented May 2, 2025

I have a different use case with a different solution. I was hoping to compose multiple StridedViews, similar to what they do in tinygrad. So to index into a composition of strided views, you would do outer multiindex -> linear index -> inner multiindex -> linear index, and then finally use that to index into memory. So in other words we could pretend that the parent is a dense block of memory, and in the cases where it actually is hopefully the compiler would recognize that. Sounds like maybe that's not the point of this package though.

@Jutho
Copy link
Owner

Jutho commented May 4, 2025

The point is that I still need to know, if I compute a linear index using the strides, what to index into. Consider the following:

julia> A = randn(5,5)
5×5 Matrix{Float64}:
 -1.83299     0.373954  -0.0752648   0.504695    1.99652
 -0.442616    0.411342   1.24364    -0.0993078  -0.670033
  0.394759    0.672543   1.53753    -0.569871   -1.55445
 -0.714782    0.910227   0.0708801  -0.317265    0.563627
 -0.0564112  -1.16758   -1.93046     2.44655     1.28567

julia> B = view(A, 1:3, 1:5);

julia> strides(B)
(1, 5)

julia> (x, y) = (3, 3)
(3, 3)

julia> B[x, y]
1.537533054119266

julia> B[1 + (x-1) * stride(B, 1) + (y-1) * stride(B, 2)]
1.9965179361688334

julia> parent(B)[1 + (x-1) * stride(B, 1) + (y-1) * stride(B, 2)]
1.537533054119266

The only reason I can get the correct element B[x,y] via linear indexing using the strides is because I know that B has an underlying DenseArray parent, for which the "memory index" computed via the strides is equivalent to linear indexing as provided by Julia. If I wouldn't know and use this, I would only have B to index into, and I could not use the "memory index" computed via the strides. Hence, even though B is a strided array, Julia does not provide an established high-level interface to use those strides to do indexing. One could do this using pointers and unsafe_ operations, but ultimately that amounts to assuming that there is a DenseArray underlying the given input array.

@akriegman
Copy link

I don't fully understand the considerations here, but I was imagining we could define linear indexing into sparse arrays to agree with their multiindexing behavior. So we could define B[idx] by

(s1, s2, ...) := size(B)
i1 := idx % s1
i2 := idx ÷ s1 % s2
i3 := idx ÷ (s1*s2) % s3
...
B[idx] := B[i1, i2, ...]

instead of just treating idx as a memory offset.

@Jutho
Copy link
Owner

Jutho commented May 7, 2025

Yes, that is what Julia's linear indexing already allows you to do. The point is that Strided.jl is specifically about about data that is layed out according to a strided pattern in memory, and that it will optimize the data access pattern utilizing that information.

@orenbenkiki
Copy link
Author

orenbenkiki commented May 8, 2025

I think I have identified a confusion (in my head at least) between two aspects of strides. Making this explicit would help clarify the discussion.

The way I see it, there are two possible meaning for "strides":

One is that if you look at the memory layout - the physical addresses of where the data is - then it is possible to compute the address of the element at (i, j) by a formula address = a * i * sizeof(eltype) + b * j * sizeof(eltype) + c. You can then access the element directly using this physical memory address. Call this "physical strides".

Another applies to a "strided view" and says that if you have some parent array, and a strided view, it is possible to compute the index-in-the-parent of the element at (i, j) in the view by a formula index = a * i + b * j. You can then pass this index to the [] operator of the parent array to access the element. Call this "virtual strides".

Note that one doesn't imply the other - one can have virtual strides and not physical strides. I think that in practice physical strides imply virtual strides, but I suppose one could construct a pathological case where this isn't the case?

Either way, I think that in the discussion, the term "strides" sometimes means "physical strides" and sometimes means "virtual strides" depending on on the author and the context.

As a concrete example of the difference between the two, consider a subset of just the even-indexed columns of a DenseMatrix parent, as opposed to the same subset of a SparseMatrixCSC parent.

There is a second distinction between cases where one of a or b in the formulas is equal to 1 ("continuous") and a case where both are larger than one ("swiss cheese").

Is it correct to say that at the end of the day, what StridedView actually needs is a continuous, physically strided array? This would allow using efficient machine code using vector ISA to manipulate the data.

The original motivation for this issue is that one can have continuous, physically strided arrays which are not DenseArray. One example is a PyArray view of a numpy array. Another is a subset of a dense array, as long as it spans a continuos range of rows and has a fixed gap between the columns.

I assumed that the strides function is meant to be physical. I still think this is correct (am I wrong here?). In this case, if strides returns a tuple of two integers, where one of them is 1, then StridedView could "just work". Am I missing something?

@Jutho
Copy link
Owner

Jutho commented May 8, 2025

I was indeed talking about physical strides, and this is also the meaning of strides in the Julia ecosystem as defined by Julia Base:

help?> strides
search: strides stride trues string otimes strip sortslices

  strides(A)

  Return a tuple of the memory strides in each dimension.

Unfortunately, that interface is incomplete since, once you computed the physical memory address or offset, Julia does not provide a standardized interface to use this offset to request the corresponding element at that position, unless by pointer conversion and other unsafe operations and pointer arithmetic. In that sense, requiring a DenseArray parent is what I need to ensure that the computed memory index can just be used as a linear index into that parent. If there would be another way to do this, I could happily drop the DenseArray restriction.

@lkdvos
Copy link
Collaborator

lkdvos commented May 8, 2025

Perhaps this might be interesting to you: Julia already offers various tools to work with simply "mapping indices", often also by combining some wrapper types. Some examples of things you might be interested in are below, which can be used to obtain the mapping behavior you like.

julia> using SparseArrays

julia> v = sprand(100, 0.2)
100-element SparseVector{Float64, Int64} with 26 stored entries:
[...]

julia> b = view(v, 3:60)
58-element view(::SparseVector{Float64, Int64}, 3:60) with eltype Float64:
[...]

julia> c = reshape(b, 2, 29)
2×29 reshape(view(::SparseVector{Float64, Int64}, 3:60), 2, 29) with eltype Float64:
[...]

julia> d = PermutedDimsArray(c, (2, 1))
29×2 PermutedDimsArray(reshape(view(::SparseVector{Float64, Int64}, 3:60), 2, 29), (2, 1)) with eltype Float64:
[...]

julia> c[2, 2] == c[4] == b[4] == v[3 + 4 - 1] == d[2, 2]
true

Note that these constructions are completely generic, in the sense that they are designed to work with any AbstractArray. Being this generic comes at a cost however, since it is possible to write more performant algorithms when you have additional information, for example if you know the memory layout is strided.
That is where this package comes in, as @Jutho mentioned, optimizing these code patterns whenever the data is layed out in a strided pattern in memory.
Whenever the data is not actually organized like this, for example with the sparse array example you mentioned, it would be reasonable to expect that this might actually hinder performance rather than help, since the code assumes that it can efficiently do some kind of linear indexing, and would not at all make use of any sparsity.

If you could elaborate a bit more on what you would like to achieve, given that I'm not too familiar with tinygrad, we might be able to help out better?

@orenbenkiki
Copy link
Author

@Jutho - It seems you can:

1 - Call strides to see it exists and that at least one of the strides is 1 (technically you could support any strides but you indicated this is important to the implementation).

2 - Call pointer to obtain the address of the 1st element of the array.

3 - Call unsafe_wrap to obtain a DenseArray wrapper object.

4 - Use the formula a * i + b * j access only the elements you need using the usual DenseArray APIs.

Basically, if the array you were given is not a DenseArray, you create a DenseArray wrapper object that is "functionally equivalent" to it. The rest of the code would work exactly as it does today - using the public API of the DenseArray.

Admittedly the call to unsafe_wrap is a concern, but as long as you only access the elements you are allowed to, and use the linear formula, this should be safe. Also, you need to hold on to the original parent array as well as to the DenseArray wrapper, to ensure it won't get GC-ed from under your feet.

This is a change only in the constructor (and adding a data member to hold on to the original parent). The rest of the code would be unaffected.

Makes sense?

@Jutho
Copy link
Owner

Jutho commented May 12, 2025

Probably; there was an UnsafeStridedArray in older versions of this package, which did something similar. However, the problem starts if you also want to support e.g. GPU Arrays. They are/can be strided, but the pointer and unsafe_wrap approach will not work. Even though Strided.jl doesn't really have any GPU support, we do use StridedViews of CuArrays in e.g. TensorOperations.jl.

@orenbenkiki
Copy link
Author

Interesting - so there's a case where strides exists but pointer doesn't. This highlights another distinction - StridedView is useful in two different ways.

  1. One is to just a convenient way to slice a regular subset of the parent array (e.g. just the even rows). I'm not sure how common this is, but it is certainly a nice to have this be explicitly supported in a convenient way. It is also more efficient than using an array of indices (but not dramatically so?). At any rate, in this use case, why restrict the type of the parent array at all in this case? It seems like it could be any AbstractMatrix.

  2. Another use is to generate more efficient code (inside @strided). This would only work for some, not all, base parent array types. For example, if both strides and pointer are defined for the base array, and one of the strides is 1, then it is possible to generate SIMD code to do many operations by the trick of generating the DenseArray wrapper as above.

I'm not sure under which category the StridedView of a CuArray falls. But it doesn't matter much, right? If StridedView worked on all parent types, then either @strided is enhanced to support CuArray parents to generate more efficient CUDA code, or not (and you get today's behavior, I think). But the overall scheme remains valid.

The current state of thins is counter-intuitive - StridedView doesn't work for PyArray (where you could get the full performance benefits of @strided) but does work for CuArray (where you don't, if I understand you correctly).

@Jutho
Copy link
Owner

Jutho commented May 13, 2025

As stated above, I agree that it would be nice if there would be a standard Julia interface to complement strides, namely that allows you to use the "memory index" that you can compute using the strides. Currently, there is not actually much point in the strides function if you cannot do anything useful with its output. (well, I guess it is mostly to pass it on to non-Julia libraries).

Nonetheless, I would also like to argue that PyArray should probably be a subtype of DenseArray. That is why that abstract subtype is there in the AbstractArray type hierarchy of the Julia Base library. CuArray is a subtype of DenseArray, which is why this natively works with StridedView.

@orenbenkiki
Copy link
Author

DenseArray is a concrete type and one can't create sub types for it. That would have been great if it were possible...

Isn't the combination of strides and also pointer working for some array type sufficient as the "standard interface allowing using the memory index"? It might not be elegant but is there any edge case that makes this invalid?

@akriegman
Copy link

I made my own

struct View{T,N,P<:AbstractArray{T}} <: AbstractArray{T,N}
    parent::P
    dims::NTuple{N,Int}
    strides::NTuple{N,Int}
    offset::Int
    function View{T,N,P}(parent, dims, strides, offset) where {T,N,P<:AbstractArray{T}}
        @inline
        if any(dims .< 1) ||
           sum(max.(0, strides) .* (dims .- 1)) + offset >= length(parent) ||
           sum(min.(0, strides) .* (dims .- 1)) + offset < 0
            throw(ArgumentError("unsafe view, parentdims: $(size(parent)) dims: $dims strides: $strides offset: $offset"))
        end
        new(parent, dims, strides, offset)
    end
end
View(parent::P, dims::NTuple{N,Int}, strides::NTuple{N,Int}, offset) where {T,N,P<:AbstractArray{T}} = View{T,N,P}(parent, dims, strides, offset)

Base.size(A::View) = A.dims

function Base.getindex(A::View{T,N}, I::Vararg{Int,N}) where {T,N}
    @inline
    @boundscheck checkbounds(A, I...)
    idx = sum(A.strides .* (I .- 1)) + A.offset + 1
    @inbounds out = A.parent[idx]
    out
end

function Base.setindex!(A::View{T,N}, val, I::Vararg{Int,N}) where {T,N}
    @inline
    @boundscheck checkbounds(A, I...)
    idx = sum(A.strides .* (I .- 1)) + A.offset + 1
    @inbounds A.parent[idx] = val
    A
end

@Jutho
Copy link
Owner

Jutho commented May 14, 2025

DenseArray is a concrete type and one can't create sub types for it. That would have been great if it were possible...

No it's not. Array is a subtype of the abstract type DenseArray, as are all GPUArrays (which is itself abstract and has CuArray etc as concrete types). That is why StridedViews.jl works with CuArray.

julia> supertype(Array)
DenseArray

julia> supertype(DenseArray)
AbstractArray

@akriegman , as I mentioned above, this doesn't work. The index idx you compute via strides cannot simply be used as a linear index into parent. That is not how linear indexing in Julia works. I gave a counterexample above. Linear indexing with idx will only be correct if parent has its data laid out contiguously in memory. This is exactly what DenseArray is for, so really, I think that other array types that have this property should really subtype DenseArray instead of just AbstractArray.

@Jutho
Copy link
Owner

Jutho commented May 14, 2025

Isn't the combination of strides and also pointer working for some array type sufficient as the "standard interface allowing using the memory index"? It might not be elegant but is there any edge case that makes this invalid?

If the proposal is to use the pointer to the beginning of the parent, and then unsafe_load to get values from a certain memory index, this only works for isbits types, i.e.:
This only works for isbits types, i.e.:

julia> a = rand(5)
5-element Vector{Float64}:
 0.9133291929143837
 0.17935218848356715
 0.31691884142854054
 0.7247547504715043
 0.7583702853388546

julia> pa = pointer(a)
Ptr{Float64} @0x000000010dee8460

julia> unsafe_load(pa, 3), a[3]
(0.31691884142854054, 0.31691884142854054)

julia> b = [randn(5) for n=1:5]
5-element Vector{Vector{Float64}}:
 [-0.39054832410530715, 1.2970022449051175, -1.1243745025700178, 1.2447151441862772, -0.4948897579954707]
 [0.035007918114120774, -1.4004292420450568, -0.17782930468583014, 0.4504752715516825, 0.07151919367908639]
 [1.3622083111074377, -0.6179471078219791, 0.7859175178496625, 0.07098911546608531, -0.5013870379927964]
 [-1.693626877184436, 0.11838104727817299, 0.27541815011285736, 1.1682348106310123, 0.7477141794602888]
 [0.03151197005787201, 0.5780909060892557, 0.8253062475739976, -1.7227093088929948, -0.10541693001699494]

julia> pb = pointer(b)
Ptr{Vector{Float64}} @0x000000010f195020

julia> unsafe_load(pb, 3)
ERROR: pointerref: invalid pointer type
Stacktrace:
 [1] unsafe_load(p::Ptr{Vector{Float64}}, i::Int64)
   @ Base ./pointer.jl:153
 [2] top-level scope
   @ REPL[25]:1

julia> b[3]
5-element Vector{Float64}:
  1.3622083111074377
 -0.6179471078219791
  0.7859175178496625
  0.07098911546608531
 -0.5013870379927964

@orenbenkiki
Copy link
Author

You are right, I misspoke, sorry. DenseArray is indeed an abstract type. However... PyArray wraps numpy arrays. Sometimes the result is a dense array, sometimes it doesn't, so they can't just derive from DenseArray. They can however implement strides and pointer when possible. They could have used two types, PyDenseArray and PyGeneralArray with an API to create the right one, but they have chosen not to (I suppose due to the added complexity to their APIs and backward compatibility issues).

I do want to nitpick about "Linear indexing with idx will only be correct if parent has its data laid out contiguously in memory" - that's not technically correct. The entries of each column must be continuous, but the start of one column does not have to be immediately after the end of the previous column. It is sufficient that the offset between the start of each column and the previous one is constant. That is, any strides of the form (N, 1) would work, even if N > num-rows. Sure there may be gaps between the columns, so the memory isn't "contiguous", but linear indexing and even SIMD code will work (flip "column" with "row" for a row-major layout).

Your claim that this would only work for isbit types is also incorrect. The problem with your example is that you have a vector of vectors and try to treat it as a vector of pointers. Consider:

julia> as = [rand(5) for _ in 1:3]
3-element Vector{Vector{Float64}}:
 [0.8570337283732056, 0.5836870755788675, 0.8906730627989266, 0.7433736550569628, 0.5104071661040798]
 [0.5203320795433448, 0.31176796596130063, 0.07296318700925619, 0.7560497079331108, 0.8671193957589669]
 [0.33311012811037755, 0.2226398369703596, 0.16547795306923163, 0.548449211344936, 0.648321627195505]

julia> strides(as)
(1,)

julia> bs = unsafe_wrap(Vector{Vector{Float64}}, pointer(as), 3)
3-element Vector{Vector{Float64}}:
 [0.8570337283732056, 0.5836870755788675, 0.8906730627989266, 0.7433736550569628, 0.5104071661040798]
 [0.5203320795433448, 0.31176796596130063, 0.07296318700925619, 0.7560497079331108, 0.8671193957589669]
 [0.33311012811037755, 0.2226398369703596, 0.16547795306923163, 0.548449211344936, 0.648321627195505]

julia> bs == as
true

julia> bs === as
false

Also your counter-example #2 (comment) isn't. It merely demonstrates that one can't take the physical strides and use them as logical strides (hence my rambling about the difference between the two). But that's irrelevant. The proposed implementation would not pass the linear address from the strides to the original array, it would pass it to a different "physical-array" constructed by unsafe_wrap. When passed to the "physical-array" then linear addressing will work. Of course this would only be possible in the cases it is safe to construct such a "physical-array" (using strides and pointer - if they both exist, and both do not throw, and both return sensible values).

Other examples of arrays that could be accepted by StridedView are transpose or permuted dimensions arrays (and sure, you deal with the standard types for these). Instead of having to deal with these on a case-by-case basis, converting all of them to a "physical-array" would make the API "just work" for any such type, regardless of where and when it is defined. That is, the extra code for using a "physical-array" would be offset by removing existing code for transpose and permuted dimensions.

At the end of the day, the question is whether the StridedView constructor should make the effort to look at the specific instance it was given and accept it whenever possible (using the "physical-array" approach), or just test for the type and reject possibly valid cases (e.g. PyArray) . the "physical-array" approach is obviously more effort but would eliminate (all?)most the false negatives which happen when looking just at the type. There's no technical reason not to do this though, and @strides should "just work" for such arrays to give all the performance benefits.

A separate question is whether StridedView should only work for physical strides or whether it should be allowed to support merely logical strides (e.g., a StridedView of a SparseMatrixCSC). I expect that @strided would always only provide performance benefits for the physical strides case, but there's an argument to be made that StridedView of a SparseMatrixCSC is useful even without these performance benefits. But that's a separate issue and mostly unrelated to the one above.

@Jutho
Copy link
Owner

Jutho commented May 15, 2025

I feel like I am repeating myself here.

Yes StridedViews.jl assumes strides is what you call "physical strides". I don't see a reason to invent the name "logical strides", since those quantities are given by size, i.e. these are just the sizes of the individual dimensions. That's how the linear indexing of Julia works.

So, when claiming that the proposed approach does not work, I was specifically referring to the implementation put forward by of @akriegman, which uses linear indexing with idx computed using "physical" strides, and where there was no indication in the code that parent would not be the actual parent array provided by the user, but some other parent created via unsafe_wrap and pointer.

I fully understand that with unsafe_wrap and pointer and keeping a reference to the original array so that it does not get GC'ed, you can come up with another functional StridedView implementation. If you want to make a PR that does so, I am open to considering that. I don't have the bandwidth to do so myself right now, and we have not needed this for the things we use StridedViews.jl for. The main problem I could still foresee is how to deal with CuArray and friends, but I guess with the proper specialisation that can be made to work.

Out of curiosity, have you also checkout out https://github.com/JuliaSIMD/StrideArrays.jl and whether this could do what you want?

@orenbenkiki
Copy link
Author

Fair enough, I didn't mean to get this thread into a rut, sorry. I have some bandwidth issues myself (who doesn't :-) but I'll see if I can find some for a PR. I'll also take a look at StrideArrays, it seems to be along the lines I described - thanks for the pointer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants