Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate neo4j retriever #109

Merged
merged 7 commits into from
Mar 25, 2025
Merged

Conversation

vga91
Copy link
Contributor

@vga91 vga91 commented Mar 21, 2025

Issue

Closes #

Change

Migration of Neo4jContentRetriever

General checklist

  • There are no breaking changes
  • I have added unit and integration tests for my change
  • I have manually run all the unit tests in all modules, and they are all green
  • I have manually run all integration tests in the module I have added/changed, and they are all green

Checklist for adding new maven module

  • I have added my new module in the root pom.xml and langchain4j-community-bom/pom.xml

Checklist for adding new embedding store integration

  • I have added a {NameOfIntegration}EmbeddingStoreIT that extends from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
  • I have added a {NameOfIntegration}EmbeddingStoreRemovalIT that extends from EmbeddingStoreWithRemovalIT

Checklist for changing existing embedding store integration

  • I have manually verified that the {NameOfIntegration}EmbeddingStore works correctly with the data persisted using the latest released version of LangChain4j

Sorry, something went wrong.

@vga91 vga91 marked this pull request as draft March 21, 2025 13:21
@vga91 vga91 force-pushed the migrate-neo4j-retriever branch from d2cfa4f to 4d2a7b4 Compare March 21, 2025 13:22
@vga91 vga91 mentioned this pull request Mar 21, 2025
11 tasks
Copy link
Collaborator

@Martin7-1 Martin7-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vga91 Thank you!

<relativePath>../../pom.xml</relativePath>
</parent>

<artifactId>langchain4j-community-neo4j-retriever</artifactId>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about naming it langchain4j-community-content-retriever-neo4j?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the naming pattern? otherwise, what other kinds of retrievers are there? structure retrievers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no naming pattern about it... Just to tell users that what component in this module

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.input.PromptTemplate;

public class Neo4jContentRetrieverBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about moving it to Neo4jContentRetriever and make it a inner static class?

void shouldRetrieveContentWhenQueryIsValidAndOpenAiChatModelIsUsed() {

// With
ChatLanguageModel openAiChatModel = OpenAiChatModel.builder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you need to test with OpenAiChatModel in particular, is it different from mock's ChatLanguageModel?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean for an IT you want to use a real model/infra not just a mock.
I don't know if LC4J has also local models as part of the IT infrastructure that could be run on the GH action server?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LC4J only have embedding local model for now.

Maybe this test should move to a new Neo4jContentRetrieverIT to do integration test and add @EnabledIfEnvironmentVariable. And this old one should rename to Neo4jContentRetrieverTest and use mock model. WDYT?

@jexp
Copy link

jexp commented Mar 24, 2025

Some format violations:

Error:  Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.43.0:check (default) on project langchain4j-community-neo4j-retriever: The following files had format violations:
Error:      src/test/java/dev/langchain4j/rag/content/retriever/neo4j/Neo4jContentRetrieverIT.java
Error:          @@ -28,10 +28,11 @@
Error:           @ExtendWith(MockitoExtension.class)
Error:           class·Neo4jContentRetrieverIT·{
Error:           ····private·static·final·String·NEO4J_VERSION·=·System.getProperty("neo4jVersion",·"5.26");
Error:          -····
Error:          +
Error:           ····@Container
Error:          -····private·static·final·Neo4jContainer<?>·neo4jContainer·=
Error:          -············new·Neo4jContainer<>("neo4j:"·+·NEO4J_VERSION).withoutAuthentication().withPlugins("apoc");
Error:          +····private·static·final·Neo4jContainer<?>·neo4jContainer·=·new·Neo4jContainer<>("neo4j:"·+·NEO4J_VERSION)
Error:          +············.withoutAuthentication()
Error:          +············.withPlugins("apoc");
Error:           
Error:           ····private·Driver·driver;
Error:           ····private·Neo4jGraph·graph;
Error:  Run 'mvn spotless:apply' to fix these violations.

import org.neo4j.driver.types.Type;
import org.neo4j.driver.types.TypeSystem;

public class Neo4jContentRetriever implements ContentRetriever {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably name this something like Neo4jText2CypherRetriever ?

}
}

private static final String NODE_PROPERTIES_QUERY =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably do this in one query, but that's an improvement for the future


public List<Record> executeRead(String queryString) {

try (Session session = this.driver.session()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move this to driver.executeQuery which automatically does retries

List<Record> records = graph.executeRead(cypherQuery);
return records.stream()
.flatMap(r -> r.values().stream())
.map(value -> NODE.isTypeOf(value) ? value.asMap().toString() : value.toString())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably also handle RELATIONSHIP and PATH?


public void refreshSchema() {

List<String> nodeProperties = formatNodeProperties(executeRead(NODE_PROPERTIES_QUERY));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future we should also add the enhanced schema with sample property values

}

@Test
void shouldReturnEmptyListWhenQueryIsInvalid() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the query invalid or does it just return empty list?

we should probably also add a test for a truly invalid query generated that fails during execution


String question = query.text();
String schema = graph.getSchema();
String cypherQuery = generateCypherQuery(schema, question);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future we should also add the capability to re-generate and re-run failed queries, best with the question, previous query and error message from the database, for the model to fix (up to N retries)

Copy link

@jexp jexp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM most of my comments are future improvements

@vga91 vga91 force-pushed the migrate-neo4j-retriever branch from cd690dd to 3e12612 Compare March 24, 2025 10:24
Copy link
Collaborator

@Martin7-1 Martin7-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vga91 Thank you!

import org.neo4j.driver.types.Type;
import org.neo4j.driver.types.TypeSystem;

public class Neo4jText2CypherRetriever implements ContentRetriever {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change from Neo4jContentRetriever to Neo4jText2CypherRetriever... Any idea to keep compatibility (marked Neo4jContentRetriever as @Deprecated and removed it in the future?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the @Deprecated class, lemme know if it's ok this way

<relativePath>../../pom.xml</relativePath>
</parent>

<artifactId>langchain4j-community-neo4j-retriever</artifactId>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no naming pattern about it... Just to tell users that what component in this module

@vga91 vga91 marked this pull request as ready for review March 24, 2025 14:02
@Martin7-1
Copy link
Collaborator

Martin7-1 commented Mar 24, 2025

Maybe the IT should split into two parts:

  1. Neo4JText2CypherRetrieverIT: The real IT contains OpenAiChatModel and should annotated with @EnabledIfEnvironmentVariable(named = "OPENAI_API_KEY", matches = ".+")
  2. Neo4JText2CypherRetrieverTest: The test uses mock model to test functionality.

@vga91 WDYT?

@vga91
Copy link
Contributor Author

vga91 commented Mar 24, 2025

Maybe the IT should split into two parts:

  1. Neo4JText2CypherRetrieverIT: The real IT contains OpenAiChatModel and should annotated with @EnabledIfEnvironmentVariable(named = "OPENAI_API_KEY", matches = ".+")
  2. Neo4JText2CypherRetrieverTest: The test uses mock model to test functionality.

@vga91 WDYT?

Yes, great idea. I just split them

Copy link
Collaborator

@Martin7-1 Martin7-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vga91 Thank you! Could you please remove the langchain4j-neo4j in langchain4j repo? Also, the embedding store and content retriever need a detailed migration docs.

@jexp Thanks for the review!

@Martin7-1 Martin7-1 merged commit 43eff19 into langchain4j:main Mar 25, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants