Skip to content

Enable sampling multiple nodes per "example"#155

Open
kmontemayor2-sc wants to merge 2 commits intoalibaba:mainfrom
kmontemayor2-sc:kmonte/node-inputs
Open

Enable sampling multiple nodes per "example"#155
kmontemayor2-sc wants to merge 2 commits intoalibaba:mainfrom
kmontemayor2-sc:kmonte/node-inputs

Conversation

@kmontemayor2-sc
Copy link
Contributor

We have a use case where we'd like to sample against multiple nodes at the same time. For instance, we'd like to generate subgraphs for both A and B in the below graph.

image

I'd like to be able to specify that input_nodes = [[A, B], [C, D]] as inputs to DistNeighborLoader 1 so that we sample against A, B for one example and B, C for another example.

While it is possible to achieve the same functionality via batching, I feel like it will be very tedious to keep track of both the "example count" and "batch size" and how they interplay.
Additionally, if we must batch here, then we would need to construct our input_nodes very carefully, which I feel is also quite error prone.

Without this change, when we are iterating through the NodeSamplerInput in _sampling_worker_loop 2, if we have something like NodeSamplerInput([[0, 1], [2, 3]]) then when the dataloader outputs tensor([0]) we would pass in tensor([[0, 1]]), whose shape is (1, 2) to the inducer 3, and the inducer would use seed_size = tensor([[0, 1]]).size(0) = 1

I feel like the existing behavior is somewhat surprising, as it essentially means if you pass in a tensor of greater than dim 1, some of your inputs get ignored.

Alternatives:

I considered making it so we could pass in NodeSamplerInput (or a subclass) to DisNeighborLoader but that seemed quite complicated given that the sampler input has some interplay with the worker options/etc and I wasn't sure how to consolidate all of the types of input_data (input_data: list[RemoteNodeSplitSamplerInput] | RemoteNodeSplitSamplerInput | RemoteNodePathSamplerInput | NodeSamplerInput) 4

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants