-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for NamedArray #65
Comments
Do these have a table structure, though? I'm just not sure... |
If dimension is no more than 2, I think so. |
Interesting, interesting... I think the general pattern he might be some way to get rows out of a matrix, both a named and a normal matrix. For Query.jl integration it would be nice if these rows were tuples and named tuples respectively. But I'm not sure that is in general a good idea, I'm not sure whether very large tuples (if there are very many rows) work well... It would have to be a special function, in any case, something like |
See davidavdav/NamedArrays.jl#55 about implementing an Iterator of NamedTuples for NamedArrays |
Pinging @davidavdav |
You rang? I am a bit out of context, I've tried to read up to the references, but it is not clear to me what an iterator of named tuples would do with a NamedArray. Do you want to be able to iterate over a dimension on a NamedArray, thereby getting a tuple of the name and the array slice of one dimension lower, and do you want this to be of type Iterator for NamedTuples? |
I think we should first tackle NamedArrays with dimension of 2 and see how such a NamedArray can be converted to a DataFrame (or any table-like data structure that IterableTables deals with as a sink) The problem of dimensions greater than 2 can't be handle in this project (I think... or at least not for now). |
In R tables of arbitrary dimensionality can be converted to data frames. Each dimension gets transformed into a column, and an additional columns holds the values of the array entries. |
@nalimilan so all the columns except the last one would hold the indices of that respective dimension? For example, this matrix:
Would be transformed into this table:
? I think that might be a really good general solution. It would still be nice if there was some easy way to handle a matrix differently, i.e. keep the table structure of the matrix, but one that could be done by say a I guess this also somehow interacts with this idea of how associative are handled. This would essentially treat an array as an associative, with the dimension indices as the key. I didn't follow that debate in detail, though... |
Yes, that's it, though it's even clearer when there are dimension names rather than indices. |
Ik looks like an R-style |
If we can have |
So you would want to treat |
I agree zeros should not be dropped. |
How would this do for you? using IndexedTables
import IndexedTables.IndexedTable
function IndexedTable(n::NamedArray)
L = length(n) # elements in array
cols = Dict{Symbol, Array}()
factor = 1
for d in 1:ndims(n)
nlevels = size(n, d)
nrep = L ÷ (nlevels * factor)
data = repmat(vcat([fill(x, factor) for x in names(n, d)]...), nrep)
cols[Symbol(dimnames(n, d))] = data
factor *= nlevels
end
return IndexedTable(Columns(;cols...), array(n)[:])
end |
the 2 behaviours could be considered when converting NamedArray to IndexedTables.
|
The simple implementation above would not become very efficient, memory-wise, for very large and very sparse tables, if we filter out the |
Maybe with the aim of filtering out some values, we should probably accept anonymous function instead of a given value such as "0"
|
There are two issues here, right? How to convert something to a IndexedTable, and how to convert something to just any table. Only the latter interacts with iterable tables at this point. |
Not sure if there is really two issues here in fact... |
On the other side... there is an issue about IndexedTables output with IterableTables not being able to filter out values to keep the sparse feature of IndexedTables julia> using IterableTables
julia> using IndexedTables
julia> a=[0 0 1 0;2 0 3 0;0 0 5 0;2 0 0 1]
4×4 Array{Int64,2}:
0 0 1 0
2 0 3 0
0 0 5 0
2 0 0 1
julia> IndexedTable(a)
─────┬──
1 1 │ 0
1 2 │ 0
1 3 │ 1
1 4 │ 0
2 1 │ 2
2 2 │ 0
2 3 │ 3
2 4 │ 0
3 1 │ 0
3 2 │ 0
3 3 │ 5
3 4 │ 0
4 1 │ 2
4 2 │ 0
4 3 │ 0
4 4 │ 1 we could expect an api like julia> IndexedTable(a, x -> x == 0)
─────┬──
1 3 │ 1
2 1 │ 2
2 3 │ 3
3 3 │ 5
4 1 │ 2
4 4 │ 1 If we want anonymous function to filter out So in this case... this is clearly an other issue or julia> IndexedTable(a, x -> x != 0) if we want anonymous function to define which values we want to keep Issue opened at JuliaData/IndexedTables.jl#91 |
Thanks to @davidavdav commit davidavdav/NamedArrays.jl@5b8205f a NamedArray of any dimension can now be flattened (returning a flattened NamedArray) as exposed in #65 (comment) julia> using NamedArrays
julia> srand(1234);
julia> n=NamedArray(rand(2,4,3))
2×4×3 Named Array{Float64,3}
[:, :, C=1] =
A ? B │ 1 2 3 4
──────┼───────────────────────────────────────
1 │ 0.590845 0.566237 0.794026 0.200586
2 │ 0.766797 0.460085 0.854147 0.298614
[:, :, C=2] =
A ? B │ 1 2 3 4
──────┼───────────────────────────────────────────
1 │ 0.246837 0.648882 0.066423 0.646691
2 │ 0.579672 0.0109059 0.956753 0.112486
[:, :, C=3] =
A ? B │ 1 2 3 4
──────┼───────────────────────────────────────────
1 │ 0.276021 0.0566425 0.950498 0.945775
2 │ 0.651664 0.842714 0.96467 0.789904
julia> n[:]
24-element Named Array{Float64,1}
(:A, :B, :C) │
────────────────┼──────────
("1", "1", "1") │ 0.590845
("2", "1", "1") │ 0.766797
("1", "2", "1") │ 0.566237
("2", "2", "1") │ 0.460085
("1", "3", "1") │ 0.794026
("2", "3", "1") │ 0.854147
("1", "4", "1") │ 0.200586
("2", "4", "1") │ 0.298614
("1", "1", "2") │ 0.246837
("2", "1", "2") │ 0.579672
("1", "2", "2") │ 0.648882
("2", "2", "2") │ 0.0109059
("1", "3", "2") │ 0.066423
("2", "3", "2") │ 0.956753
("1", "4", "2") │ 0.646691
("2", "4", "2") │ 0.112486
("1", "1", "3") │ 0.276021
("2", "1", "3") │ 0.651664
("1", "2", "3") │ 0.0566425
("2", "2", "3") │ 0.842714
("1", "3", "3") │ 0.950498
("2", "3", "3") │ 0.96467
("1", "4", "3") │ 0.945775
("2", "4", "3") │ 0.789904 |
Hello,
I'm using FreqTables.jl freqtable.
This function outputs NamedArray objects.
Maybe it could be a good idea to add support for NamedArrays into IterableTables.
Kind regards
The text was updated successfully, but these errors were encountered: