Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Any way to attach PipelineResult.run_id to the nodes and relationships that were touched by the SimpleKGPipeline run? #289

Open
nickjfrench opened this issue Feb 27, 2025 · 3 comments

Comments

@nickjfrench
Copy link

I want to be able to MATCH what nodes and relationships were added or edited by a run. I know SimpleKGPipeline returns a PipelineResult that contains the number of nodes and the run_id, but it doesn't seem to attach itself to any nodes or relationships within the graph DB. There is an id property added, but that seems to come from the chunk's id.

Am I missing something that already exists either within the GraphRAG package or Neo4j GraphDB, or is there a easy way to attach it to nodes and rels myself?
If I attach it myself, I imagine the property would need to be an array, as nodes will be edited by multiple runs.

@stellasia
Copy link
Contributor

Hi @nickjfrench ,

This is not possible at the moment, the internal pipeline components, especially the KGWriter, do not know about this run_id (which is used to store and access components results during pipeline execution).

I understand the use case though, so I'll keep this issue open until we can provide a real solution.

@oskrocha
Copy link

Just wanted to chime in — this is very crucial to my use case as well. I’m working on enriching an existing KG, and being able to identify which nodes and relationships were touched by a specific SimpleKGPipeline run (via a run_id or similar) is crucial.

Looking forward to any updates on this.

@stellasia
Copy link
Contributor

Hi,

New in release 1.6.1: it is now possible to access the run_id from within the component. To do so, you must implement the run_with_context method instead of run (note that these two methods will eventually be merged when the API is stabilized). The run_id is attached to the RunContext that's passed as a first argument to this method. Here is an example:

class MyComponent(Component):

    async def run_with_context(
        self,
        context_: RunContext,
        numbers: list[int],
        **kwargs: Any,
    ) -> ComponentResult:

        run_id = context_.run_id

        return ComponentResult(run_id=run_id)

In order to attach it to all created nodes, at the moment you still need to create your own extractor. We're discussing this point internally, I'll come back to you as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants