-
Notifications
You must be signed in to change notification settings - Fork 29
Support Metadata filtering with Neo4J #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
try (var session = session()) { | ||
String statement = String.format( | ||
"CALL { MATCH (n:%1$s) WHERE n.%2$s IS NOT NULL AND size(n.%2$s) = toInteger(%3$s) AND %4$s DETACH DELETE n } IN TRANSACTIONS ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem to apply the filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter is applied via filterEntry.getKey()
in the String.format and filterEntry.getValue()
in the params.
For example, if the Filter is IsEqualTo(key=type, comparisonValue=a)
, the filterEntry.getKey()
is "n.type = $param_1" and the filterEntry.getValue()
is Map.of("param_1", "a") .
Therefore the result is:
session.run( "CALL { MATCH .... AND n.type = $param_1 DETACH DELETE n } IN TRANSACTIONS ", Map.of("param_1", "a") )
so that we can handle any neo4j data type
String statement = | ||
String.format("CALL { MATCH (n:%1$s) DETACH DELETE n } IN TRANSACTIONS", this.sanitizedLabel); | ||
String statement = String.format( | ||
"CALL { MATCH (n:%1$s) WHERE n.%2$s IS NOT NULL AND size(n.%2$s) = toInteger(%3$s) DETACH DELETE n } IN TRANSACTIONS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to check for embedding and embedding size match? this will make it super expensive?
final AbstractMap.SimpleEntry<String, Map<?, ?>> entry = new Neo4jFilterMapper().map(filter); | ||
final String query = | ||
""" | ||
CYPHER runtime = parallel parallelRuntimeSupport=all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot to apply the filter in the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's this? AND %4$s
entry.getKey(),
we should probably switch to cypher-dsl soon (separate PR), so all this text-formatting doesn't get out of hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it makes sense, we'll switch to cypher-dsl 👍
} | ||
} | ||
|
||
/* | ||
Private methods | ||
*/ | ||
private EmbeddingSearchResult getSearchResUsingVectorSimilarity( | ||
EmbeddingSearchRequest request, Filter filter, Value embeddingValue, Session session) { | ||
final AbstractMap.SimpleEntry<String, Map<?, ?>> entry = new Neo4jFilterMapper().map(filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A map.entry as value is quite odd? shouldn't it be a list of record QueryFilter
or something?
Or the appropriate thing from cypher-dsl in the future
} | ||
} | ||
|
||
private String getOperation(String key, String operator, Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually this is an enum of operators which is then also able to format it
and a record(property, Operator, value)
and then a list of that
|
||
public static final String UNSUPPORTED_FILTER_TYPE_ERROR = "Unsupported filter type: "; | ||
|
||
public static class IncrementalKeyMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a list might be better
|
||
final And filter = new And( | ||
new And(new IsEqualTo("key1", "value1"), new IsEqualTo("key2", "10")), | ||
new Not(new Or(new IsIn("key3", asList("1", "2")), new IsNotEqualTo("key4", "value4")))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we test teh formatting somewhere explicitely including the parentheses and parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we added some complex condition, like in the should_map_or_not_and test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM see my small comments
@vga91 There are some conflicts, could you please resolve it? |
87782f5
to
3cc1a24
Compare
@Martin7-1 just resolved it, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vga91 Thank you!
ORDER BY score DESC | ||
LIMIT $maxResults | ||
""" | ||
.formatted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same .formatted
problem like previous PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all of them
this.dimension, | ||
entry.getKey(), | ||
embeddingValue); | ||
final Map params = entry.getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you please specify the generic type? like Map<String, Object>
?
return new RuntimeException(invalidSanitizeValue); | ||
}); | ||
|
||
return "n.%s %s $%s".formatted(sanitizedKey, operator, param); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same .formatted problem like previous PR.
|
||
public String mapNotIn(IsNotIn filter) { | ||
final String inOperation = getOperation(filter.key(), "IN", filter.comparisonValues()); | ||
return "NOT (%s)".formatted(inOperation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same .formatted problem like previous PR.
} | ||
|
||
private String mapAnd(And filter) { | ||
return "(%s) AND (%s)".formatted(getStringMapping(filter.left()), getStringMapping(filter.right())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same .formatted problem like previous PR.
} | ||
|
||
private String mapOr(Or filter) { | ||
return "(%s) OR (%s)".formatted(getStringMapping(filter.left()), getStringMapping(filter.right())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same .formatted problem like previous PR.
} | ||
|
||
private String mapNot(Not filter) { | ||
return "NOT (%s)".formatted(getStringMapping(filter.expression())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same .formatted problem like previous PR.
|
||
private int counter = 1; | ||
|
||
public String put(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it should use AtomicInteger
to keep thread-safe. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to private final AtomicInteger integer = new AtomicInteger();
👍
String statement = String.format( | ||
"CALL { MATCH (n:%1$s) WHERE n.%2$s IS NOT NULL AND size(n.%2$s) = toInteger(%3$s) AND %4$s DETACH DELETE n } IN TRANSACTIONS ", | ||
this.sanitizedLabel, this.embeddingProperty, this.dimension, filterEntry.getKey()); | ||
final Map params = filterEntry.getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you please specify the generic type? like Map<String, Object>
?
@vga91 Looks like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vga91 Thank you!
Issue
Closes langchain4j/langchain4j#1252
Change
Migration from langchain4j core repo PR: langchain4j/langchain4j#2577
Added
Filter
implementation to Neo4j embedding.vector.similarity.cosine
will be executed.As explained. here for the langchain python implementation.
Upgraded neo4j container to latest 5.26.x version
Added
Neo4jFilterMapperTest.java
Similar to https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/neo4j_vector.py#L1064.
General checklist
Checklist for adding new maven module
pom.xml
andlangchain4j-community-bom/pom.xml
Checklist for adding new embedding store integration
{NameOfIntegration}EmbeddingStoreIT
that extends from eitherEmbeddingStoreIT
orEmbeddingStoreWithFilteringIT
{NameOfIntegration}EmbeddingStoreRemovalIT
that extends fromEmbeddingStoreWithRemovalIT
Checklist for changing existing embedding store integration
{NameOfIntegration}EmbeddingStore
works correctly with the data persisted using the latest released version of LangChain4j