Skip to content

Support Metadata filtering with Neo4J #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 4, 2025

Conversation

vga91
Copy link
Contributor

@vga91 vga91 commented Mar 25, 2025

Issue

Closes langchain4j/langchain4j#1252

Change

Migration from langchain4j core repo PR: langchain4j/langchain4j#2577

  • Added Filter implementation to Neo4j embedding.

    • If filter is null we can find the relevant documents using the existing vector index,
    • otherwise, we start preparing the base Cypher statement that will be used for the pre-filtered metadata approach and then a vector.similarity.cosine will be executed.
      As explained. here for the langchain python implementation.
  • Upgraded neo4j container to latest 5.26.x version

  • Added Neo4jFilterMapperTest.java

Similar to https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/neo4j_vector.py#L1064.

General checklist

  • There are no breaking changes
  • I have added unit and integration tests for my change
  • I have manually run all the unit tests in all modules, and they are all green
  • I have manually run all integration tests in the module I have added/changed, and they are all green

Checklist for adding new maven module

  • I have added my new module in the root pom.xml and langchain4j-community-bom/pom.xml

Checklist for adding new embedding store integration

  • I have added a {NameOfIntegration}EmbeddingStoreIT that extends from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
  • I have added a {NameOfIntegration}EmbeddingStoreRemovalIT that extends from EmbeddingStoreWithRemovalIT

Checklist for changing existing embedding store integration

  • I have manually verified that the {NameOfIntegration}EmbeddingStore works correctly with the data persisted using the latest released version of LangChain4j

@Martin7-1 Martin7-1 added enhancement New feature or request P3 Medium priority theme: embedding store Issues/PRs related to embedding store labels Mar 25, 2025

try (var session = session()) {
String statement = String.format(
"CALL { MATCH (n:%1$s) WHERE n.%2$s IS NOT NULL AND size(n.%2$s) = toInteger(%3$s) AND %4$s DETACH DELETE n } IN TRANSACTIONS ",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to apply the filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter is applied via filterEntry.getKey() in the String.format and filterEntry.getValue() in the params.
For example, if the Filter is IsEqualTo(key=type, comparisonValue=a) , the filterEntry.getKey() is "n.type = $param_1" and the filterEntry.getValue() is Map.of("param_1", "a") .

Therefore the result is:

 session.run( "CALL { MATCH  .... AND n.type = $param_1 DETACH DELETE n } IN TRANSACTIONS ", Map.of("param_1", "a") )

so that we can handle any neo4j data type

String statement =
String.format("CALL { MATCH (n:%1$s) DETACH DELETE n } IN TRANSACTIONS", this.sanitizedLabel);
String statement = String.format(
"CALL { MATCH (n:%1$s) WHERE n.%2$s IS NOT NULL AND size(n.%2$s) = toInteger(%3$s) DETACH DELETE n } IN TRANSACTIONS",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check for embedding and embedding size match? this will make it super expensive?

final AbstractMap.SimpleEntry<String, Map<?, ?>> entry = new Neo4jFilterMapper().map(filter);
final String query =
"""
CYPHER runtime = parallel parallelRuntimeSupport=all
Copy link

@jexp jexp Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot to apply the filter in the query?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's this? AND %4$s entry.getKey(),

we should probably switch to cypher-dsl soon (separate PR), so all this text-formatting doesn't get out of hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it makes sense, we'll switch to cypher-dsl 👍

}
}

/*
Private methods
*/
private EmbeddingSearchResult getSearchResUsingVectorSimilarity(
EmbeddingSearchRequest request, Filter filter, Value embeddingValue, Session session) {
final AbstractMap.SimpleEntry<String, Map<?, ?>> entry = new Neo4jFilterMapper().map(filter);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A map.entry as value is quite odd? shouldn't it be a list of record QueryFilter or something?
Or the appropriate thing from cypher-dsl in the future

}
}

private String getOperation(String key, String operator, Object value) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually this is an enum of operators which is then also able to format it
and a record(property, Operator, value) and then a list of that

but I think Cypher DSL has this stuff out of the box.


public static final String UNSUPPORTED_FILTER_TYPE_ERROR = "Unsupported filter type: ";

public static class IncrementalKeyMap {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a list might be better


final And filter = new And(
new And(new IsEqualTo("key1", "value1"), new IsEqualTo("key2", "10")),
new Not(new Or(new IsIn("key3", asList("1", "2")), new IsNotEqualTo("key4", "value4"))));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we test teh formatting somewhere explicitely including the parentheses and parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we added some complex condition, like in the should_map_or_not_and test

Copy link

@jexp jexp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM see my small comments

@vga91 vga91 mentioned this pull request Mar 30, 2025
11 tasks
@Martin7-1
Copy link
Collaborator

@vga91 There are some conflicts, could you please resolve it?

@vga91 vga91 force-pushed the issue-core-repo-1252 branch from 87782f5 to 3cc1a24 Compare March 30, 2025 13:44
@vga91
Copy link
Contributor Author

vga91 commented Mar 30, 2025

@Martin7-1 just resolved it, thanks!

Copy link
Collaborator

@Martin7-1 Martin7-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vga91 Thank you!

ORDER BY score DESC
LIMIT $maxResults
"""
.formatted(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same .formatted problem like previous PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all of them

this.dimension,
entry.getKey(),
embeddingValue);
final Map params = entry.getValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you please specify the generic type? like Map<String, Object>?

return new RuntimeException(invalidSanitizeValue);
});

return "n.%s %s $%s".formatted(sanitizedKey, operator, param);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same .formatted problem like previous PR.


public String mapNotIn(IsNotIn filter) {
final String inOperation = getOperation(filter.key(), "IN", filter.comparisonValues());
return "NOT (%s)".formatted(inOperation);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same .formatted problem like previous PR.

}

private String mapAnd(And filter) {
return "(%s) AND (%s)".formatted(getStringMapping(filter.left()), getStringMapping(filter.right()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same .formatted problem like previous PR.

}

private String mapOr(Or filter) {
return "(%s) OR (%s)".formatted(getStringMapping(filter.left()), getStringMapping(filter.right()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same .formatted problem like previous PR.

}

private String mapNot(Not filter) {
return "NOT (%s)".formatted(getStringMapping(filter.expression()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same .formatted problem like previous PR.


private int counter = 1;

public String put(Object value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should use AtomicInteger to keep thread-safe. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to private final AtomicInteger integer = new AtomicInteger(); 👍

String statement = String.format(
"CALL { MATCH (n:%1$s) WHERE n.%2$s IS NOT NULL AND size(n.%2$s) = toInteger(%3$s) AND %4$s DETACH DELETE n } IN TRANSACTIONS ",
this.sanitizedLabel, this.embeddingProperty, this.dimension, filterEntry.getKey());
final Map params = filterEntry.getValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you please specify the generic type? like Map<String, Object>?

@Martin7-1
Copy link
Collaborator

@vga91 Looks like AtomicInteger does not initialize correctly, and thus cause theNeo4jFilterMapperTest failed. Could you please check?

Copy link
Collaborator

@Martin7-1 Martin7-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vga91 Thank you!

@Martin7-1 Martin7-1 merged commit c314460 into langchain4j:main Apr 4, 2025
4 checks passed
@vga91 vga91 mentioned this pull request Apr 11, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P3 Medium priority theme: embedding store Issues/PRs related to embedding store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support Metadata filtering with Neo4J
3 participants