-
Notifications
You must be signed in to change notification settings - Fork 93
feat: support Nested Lists #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support Nested Lists #627
Conversation
18967e4 to
9e99091
Compare
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Thanks!
core/src/main/java/io/substrait/expression/proto/ExpressionProtoConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a few more comments but the core/ part is looking great!
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Show resolved
Hide resolved
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the core stuff looks great! I will try and take a pass at the isthmus stuff, but as I mentioned to you online, I am not as comfortable with that part of the codebase. If the changes there seem simple enough that I am comfortable approving, I will do so. Otherwise, I'll leave it to @vbarua or others to make the final judgement call 🙂
isthmus/src/main/java/io/substrait/isthmus/NestedFunctions.java
Outdated
Show resolved
Hide resolved
vbarua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working this, its looking really good overall. I did have some comments, but they are minor suggestions. If you could take a look and respond to them, I think we can get this PR merge in this week easily 🙂
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/ExpressionCreator.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/SubstraitRelVisitor.java
Outdated
Show resolved
Hide resolved
isthmus/src/test/java/io/substrait/isthmus/NestedExpressionsTest.java
Outdated
Show resolved
Hide resolved
isthmus/src/test/java/io/substrait/isthmus/NestedExpressionsTest.java
Outdated
Show resolved
Hide resolved
isthmus/src/test/java/io/substrait/isthmus/NestedExpressionsTest.java
Outdated
Show resolved
Hide resolved
asolimando
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments purely on the Isthmus side of things
isthmus/src/main/java/io/substrait/isthmus/expression/NestedExpressionConverter.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/SubstraitRelVisitor.java
Outdated
Show resolved
Hide resolved
isthmus/src/test/java/io/substrait/isthmus/NestedExpressionsTest.java
Outdated
Show resolved
Hide resolved
| * constructor creates a special type of SqlKind.ARRAY_VALUE_CONSTRUCTOR for lists that store | ||
| * non-literal expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually and by looking at the tests (NestedListWithJustLiteralsTest), it seems this can indeed handle literal expressions, can you double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can handle both literals and non-literals. I'll update the comment to avoid the confusion
| * constructor creates a special type of SqlKind.ARRAY_VALUE_CONSTRUCTOR for lists that store | ||
| * non-literal expressions. | ||
| */ | ||
| public class NestedListConstructor extends SqlMultisetValueConstructor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you picked SqlMultisetValueConstructor instead of the more natural SqlArrayValueConstructor because CallConverters.java#L144 would then match on your NestedList and treat it as the regular SqlArrayValueConstructor (then invoke LiteralConstructorConverter.java#L32 which is not what we want here).
SqlMultisetValueConstructor is conceptually wrong as a multiset is radically different from an array/list in standard SQL, on top of that I imagine that tomorrow you might want to support something similar and hit the same issue you are trying to avoid here by using this class in the first place.
The impedance mismatch is that in SQL (and Calcite), arrays and lists are technically the same entity, while IIRC in Substrait they are treated as different entities (@benbellick can you confirm this?).
By looking at LiteralConstructorConverter, there is an implicit assumption that arrays store only literals, we go down that route without checking if elements in the array are really literals (LiteralConstructorConverter.java#L62).
It's probably enough to change LiteralConstructorConverter::toNonEmptyListLiteral to something like this (haven't tested it):
private Optional<Expression> toNonEmptyListLiteral(
RexCall call, Function<RexNode, Expression> topLevelConverter) {
List<Expression> expressions = call.operands.stream()
.map(topLevelConverter)
.collect(Collectors.toList());
// Check if all operands are actually literals
if (expressions.stream().allMatch(e -> e instanceof Expression.Literal)) {
return Optional.of(ExpressionCreator.list(
call.getType().isNullable(),
expressions.stream()
.map(e -> (Expression.Literal) e)
.collect(Collectors.toList())));
}
return Optional.empty();
}
I suggest to extend SqlArrayValueConstructor (which, I know, extends SqlMultisetValueConstructor, but still), then fix LiteralConstructorConverter as suggested, so that we can continue with NestedExpressionConverter which comes just after, and we should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I am understanding it correctly, let me know your thoughts on the following scenarios: We want to ensure that the roundtrip of both a nested list with literals and non-literals are both returned to a nested list. If the literalConstructorConverter is run first on a list of just literals, then it would pass and then wouldn't be mapped back to a nested list. In the other case, where the nestedExpressionConverter is run first, the literal lists that were originally not a NestedList would become a nested list. Does the above account for this or is the difference not important?
Also, is there a way to meaningfully extend the SqlArrayValueConstructor class? Its definition is bare with just a constructor to its parent type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no distinction between lists and arrays in substrait. Only list is a legitimate type. Array is informally used in some places in the docs, but that doesn't actually exist as a distinct type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your point about
We want to ensure that the roundtrip of both a nested list with literals and non-literals are both returned to a nested list.
The design you have works to solve the problem of "NestedList in is always NestedList out". Thinking about what Alessandro is saying though, I actually don't think it's worth preserving that. There is a little bit of redundancy on the Substrait side in that list containing only literal values can be expressed equivalently as a Literal.List or a Nested.List.
In Calcite, these can both be mapped a SqlArrayValueConstructor. You introduced your NestedListConstructor to be able to distinguish between the two incoming cases so as to roundtrip them, but what if instead of doing that we just said:
- A
SqlArrayValueConstructorcall with all literals gets turned into a Substrait Literal.List - A
SqlArrayValueConstructorcall with any non-literals gets turned into Substrat Nested.List
This isn't as nice from a roundtrip perspective, but from a pure Calcite perspective we're mapping to the most specific Substrait construct that we can use, which is better in my opinion.
One thing I would push for that's different from what Alessandro is suggesting is to have a single call converter SqlArrayValueConstructorCallConverter that combines the array converting code in LiteralConstructorConverter and your code in NestedExpressionCallConverter. Effectively, let's not split the code that handles arrays with only literals, and arrays with non-literals, and just put it all in one class so that it's easy to see how the behaviour works and you don't have to look at 2 different converters for 1 Calcite construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @benbellick and @vbarua for the clarification, the fact that only List exists in Substrait actually makes things easier, as we can map it to Array in standard SQL (and in Calcite, where it's informally called List too sometimes).
In light of this clarification, my previous suggestion is an unneeded over complication, and I fully agree with Victor's proposal.
| Expression.NestedList.builder().addValues(b.str("xzy")).addValues(b.str("abc")).build(); | ||
|
|
||
| Rel project = | ||
| io.substrait.relation.Project.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I am not very familiar with coding style conventions in substrait-java, but I generally prefer to use imports over fully qualified names for Java classes (in case you accept to change it, I saw this pattern in quite a few places, it should be fixed consistently)
This PR is to add support for the Expression Nested List type.
Issue
Key Changes:
Testing:
./gradlew test --tests io.substrait.isthmus.NestedExpressionsTest --debug-jvm