Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Index tree interface #107

Open
dmbates opened this issue Jun 12, 2022 · 11 comments
Open

The Index tree interface #107

dmbates opened this issue Jun 12, 2022 · 11 comments

Comments

@dmbates
Copy link
Contributor

dmbates commented Jun 12, 2022

I am having difficulty getting my head around the indexed tree interface. I have a small package RangeTrees.jl that creates interval trees in what I think is the indexed tree form. A RangeTree{T} consists of a Vector{RangeNode{T}} and the index of the root node in that vector. Each RangeNode contains the indexes of its left and right child nodes with the convention that a zero index indicates there is no node on that side.

If someone (@ExpandingMan @ararslan ?) can describe for me what AbstractTrees methods I should implement for RangeTree in my package, I will create a documentation PR for the index tree interface.

@dmbates
Copy link
Contributor Author

dmbates commented Jun 12, 2022

Sorry for the noise - I hadn't reread the documentation after the v0.4.0 release when I opened the issue. The section on IndexNode in there makes it clear what I need to do.

@dmbates dmbates closed this as completed Jun 12, 2022
@dmbates
Copy link
Contributor Author

dmbates commented Jun 12, 2022

See dmbates/RangeTrees.jl#1

@dmbates dmbates reopened this Jun 12, 2022
@ExpandingMan
Copy link
Contributor

ExpandingMan commented Jun 12, 2022

I did not read the entirety of the article so I'm still more than a bit confused about what the actual tree structure of a RangeTree is supposed to be...

However, on the matter of the indexed tree interface, it sounds like you may be conflating this a bit with the regular abstract tree interface. The case for an indexed tree is essentially if you have all nodes stored in a single object and the tree structure is given implicitly.

Here is an example that works, though I don't think it's what you are looking for

using AbstractTrees

struct RangeTree
    ranges::Vector{UnitRange{Int}}
end

Base.getindex(rt::RangeTree, idx::Integer) = rt.ranges[idx]

function AbstractTrees.childindices(rt::RangeTree, idx::Integer)
    n = length(rt.ranges)
    idx == n && return ()
    idx == n - 1 && return (idx+1,)
    (idx+1, idx+2)
end

AbstractTrees.parentindex(rt::RangeTree, idx::Integer) = iszero(idx) ? nothing : idx-1

Then

◖◗ rt = RangeTree([1:2, 3:4, 5:6])
RangeTree(UnitRange{Int64}[1:2, 3:4, 5:6])

◖◗ n = IndexNode(rt, 1)
IndexNode{RangeTree, Int64}(RangeTree(UnitRange{Int64}[1:2, 3:4, 5:6]), 1)

◖◗ print_tree(n)
1:2
├─ 3:4
│  └─ 5:6
└─ 5:6

As you can see in the RangeTree structure the concept of nodes is entirely implicit, the nodes are created via IndexNode. The function childindices takes as its argument the index of the node you want the children of and returns the indices of its children.

Does this help? I'll have to read the range tree stuff more thoroughly to give a better example.

@dmbates
Copy link
Contributor Author

dmbates commented Jun 12, 2022

Thank you for this. I am further along but still not quite where I want to be. As you might suspect, my immediate goal is to be able to use print_tree. In dmbates/RangeTrees.jl@e3d8107 I have added methods for Base.getindex, AbstractTrees.childindices, AbstractTrees.parentindex, AbstractTrees.rootindex, and AbstractTrees.nodevalue. What other methods would I need to define to get print_tree to regard a RangeTree as something other than a singleton?

Right now in the tests calling print_tree on

 rt = RangeTree([0:0, 3:40, 10:14, 20:35, 29:98]) # example from Wikipedia page

produces

RangeTree{Int64}(RangeNode{Int64}[RangeNode{Int64}(0:0, 0, 0, 2, 0), RangeNode{Int64}(3:40, 1, 0, 3, 40), RangeNode{Int64}(10:14, 2, 5, 0, 98), RangeNode{Int64}(20:35, 0, 0, 5, 35), RangeNode{Int64}(29:98, 4, 0, 3, 98)])

which is what I mean when I say that the tree is regarded as a singleton.

FYI. A RangeTree represents a balanced binary tree of a set of ranges representing intervals. The particular application for this package is taking a set of reference segments of DNA from chromosomes and determining which of these reference segments overlaps with a target segment. This implementation of an interval tree stores the reference segments sorted by first and adds one more piece of information, which is maxlast - the maximum value of last in the subtree rooted at this node.

Generally the intersection proceeds as a PreOrderDFS (although not using that iterator) which terminates at nodes where maxlast < first(target). Also, the sorting of the nodes allows you to skip the right subtree if last(target) < first(intvl).

This is all to say that the tree structure is just a binary tree of nodes sorted according to a particular criterion but each node has that one additional piece of information, maxlast.

@ExpandingMan
Copy link
Contributor

I'm still a bit confused about what you're trying to achieve here (I still haven't read everything thoroughly, though I did read your responses.

I suspect you are trying to construct something as an indexed tree which is better served as a regular tree. For example, I think the following is a special case of what you are trying to do

struct RangeNode
    a::Int
    b::Int
end

RangeNode(ab::UnitRange) = RangeNode(first(ab), last(ab))

AbstractTrees.nodevalue(r::RangeNode) = r.a:r.b

function AbstractTrees.children(r::RangeNode)
    n = r.b - r.a + 1
    n == 1 && return ()
    c = cld(n, 2)
    (RangeNode(r.a, r.a + c - 1), RangeNode(r.a + c, r.b))
end

which gives

◖◗ r = RangeNode(1, 4);

◖◗ print_tree(r)
1:4
├─ 1:2
│  ├─ 1:1
│  └─ 2:2
└─ 3:4
   ├─ 3:3
   └─ 4:4

This is a binary partition tree. Note that in this case we didn't need arrays at all and everything is stack allocated, so even though the original object was an AbstractArray (1:4) the tree nodes don't involve AbstractArray at all.

Can you try implementing your nodes as a regular tree rather than an indexed tree?

If that still doesn't make sense I'll have to go through and try to fully implement this.

@dmbates
Copy link
Contributor Author

dmbates commented Jun 13, 2022

Thanks again for responding. Just in the last hour the light finally dawned that I had misunderstood the role of IndexNode and I have been able to achieve what I want with that.

I will look at your suggestion but I am having difficulty imagining how I would adapt it. In my case the range in a node isn't related to the node's position in a tree other than the requirement that the ranges are sorted by first value in the array from which the tree is constructed and in the tree itself. This may just be a failure of imagination on my part - my background is statistical computing and numerical linear algebra rather than computer science and data structures. I can reason about vectors and other arrays but trees always seem to be black magic to some extent.

Shall we close this issue? I will open a documentation PR if I succeed in getting things to work with index arrays.

@ExpandingMan
Copy link
Contributor

It's still sounding to me a bit like you don't want indexed trees at all, from your description I'm still imagining that what I wrote above is a special case of what you're trying to do, so perhaps serves as an example.

The entire indexed tree interface is really just an adapter for IndexNode, I came to the decision that it was problematic to have two completely separate interfaces for a number of reasons. IndexNode works just like a regular tree node, but you can use it on top of something implementing the indexed tree interface.

Feel free to close if you feel this is resolved.

@dmbates
Copy link
Contributor Author

dmbates commented Jun 14, 2022

The penny finally dropped last night that in your example the every node is itself a tree. I have been able to implement your approach and am now writing tests and docs. I'll reply here when I have committed this version and give a link to an example showing why I want to do this.

@ExpandingMan
Copy link
Contributor

ExpandingMan commented Jun 14, 2022

The penny finally dropped last night that in your example the every node is itself a tree.

This is essentially the difference between the indexed trees and regular trees. An indexed tree requires a "central" tree object which can be indexed to get its nodes, IndexNode wraps such a tree to give you nodes which are true tree nodes, each of which, like you said, is the root of its own tree (though they do not in general satisfy isroot which in AbstractTrees.jl just means parent(node) \equiv nothing).

For what it's worth, I do myself have a rather hard time imagining good use cases for indexed trees and struggled to do so somewhat when writing unit tests. I considered dropping the concept altogether, but at the time I was not aware of quite how rarely used this was in dependencies (though this might owe largely to lack of documentation).

@dmbates
Copy link
Contributor Author

dmbates commented Jun 14, 2022

I was able to implement something based on your approach in dmbates/RangeTrees.jl#3 but it does not perform as well as my clumsy approach in the main branch using both RangeTree and RangeNode, which is wrapped with IndexNode, in my large-scale example.

Part of the problem was that the real-world project depended indirectly on MathTeXEngine.jl which forced version 0.3 of AbstractTrees.jl with unfortunate consequences. I installed a dev version of MathTeXEngine.jl and was able to get better performance by defining a nodetype method but it still does not match the version from the main branch of RangeTrees.jl or even that of intervalTrees.jl. Because I was doing this primarily to define methods for print_tree, treeheight, etc., I think I will stay with the IndexNode approach.

@ExpandingMan
Copy link
Contributor

This is the situation we wind up in when we upgrade a package with so many dependencies. Any help pushing through PR's to update bounds is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants