Skip to content

feat(core): add convenience methods Plan.toProto() and Plan.toJsonString()#381

Closed
nielspardon wants to merge 1 commit intosubstrait-io:mainfrom
nielspardon:par-toproto
Closed

feat(core): add convenience methods Plan.toProto() and Plan.toJsonString()#381
nielspardon wants to merge 1 commit intosubstrait-io:mainfrom
nielspardon:par-toproto

Conversation

@nielspardon
Copy link
Copy Markdown
Member

@nielspardon nielspardon commented Apr 9, 2025

I noticed that whenever we wanted to serialize a Plan to Proto one would have to instantiate a PlanToProtoConverter and call the PlanToProtoConverter.toProto() instead I think we can offer some convenience by adding a Plan.toProto() method which generates the proto from the current plan.

This PR also changes the existing code to use the new Plan.toProto() method. In that context I'm also fixing an issue with the Spark example code using an older static version of Substrait instead of the most recent one.

Similarly, generating the proto JSON representation is quite complicated since it requires people to first convert a Plan to proto and then set up a protobuf JsonFormat printer to generate the proto JSON String which also requires people to declare an additional dependency. Instead we can hide all of this complexity and simply have a Plan.toJsonString() method which generates the proto JSON for the current plan.

@vbarua
Copy link
Copy Markdown
Member

vbarua commented Apr 9, 2025

I noticed that whenever we wanted to serialize a Plan to Proto one would have to instantiate a PlanToProtoConverter and call the PlanToProtoConverter.toProto() instead I think we can offer some convenience by adding a Plan.toProto() method which generates the proto from the current plan.

I would push against this actually. Specifically, in your Plan#toProto method you create a new PlanProtoConverter(). At the moment, you can't really customize the PlanProtoConverter with a different RelProtoConverter, but it's something that we should eventually support, in the same way that the RelProtoConverter supports providing custom Expression and Type converters when you extend it.

/** Converts from {@link io.substrait.relation.Rel} to {@link io.substrait.proto.Rel} */
public class RelProtoConverter implements RelVisitor<Rel, RuntimeException> {
protected final ExpressionProtoConverter exprProtoConverter;
protected final TypeProtoConverter typeProtoConverter;
protected final ExtensionCollector extensionCollector;
public RelProtoConverter(ExtensionCollector extensionCollector) {
this.extensionCollector = extensionCollector;
this.exprProtoConverter = new ExpressionProtoConverter(extensionCollector, this);
this.typeProtoConverter = new TypeProtoConverter(extensionCollector);
}
public ExpressionProtoConverter getExpressionProtoConverter() {
return this.exprProtoConverter;
}
public TypeProtoConverter getTypeProtoConverter() {
return this.typeProtoConverter;
}

Thinking about the PlanProtoConverter as something that is intended to be extended, I'm wary of hardcoding it like you have Plan#toProto.

Similarly, generating the proto JSON representation is quite complicated since it requires people to first convert a Plan to proto and then set up a protobuf JsonFormat printer to generate the proto JSON String which also requires people to declare an additional dependency. Instead we can hide all of this complexity and simply have a Plan.toJsonString() method which generates the proto JSON for the current plan.

My preference would be to not include this as it requires bringing in an extra dependency. Additionally, there's also lots of different ways to configure the JsonFormat used when marshalling, which will depend on your preferences. It is convenient, but it's not too difficult for end users to configure their own utility class for it.

@nielspardon
Copy link
Copy Markdown
Member Author

I do understand your concerns, however JSON is the recommended text serialization format of the Substrait specification according to this: https://substrait.io/serialization/text_serialization/
in my opinion there should be a built-in (default) way in substrait-java/core to generate both the protobuf and the JSON serialization in a convenient way. that does not exclude that substrait-java/core should also allow for customization/extension.

@vbarua
Copy link
Copy Markdown
Member

vbarua commented Apr 17, 2025

I do understand your concerns, however JSON is the recommended text serialization format of the Substrait specification

The JSON format described in that documentation does not exist yet, and is not the same as the JSON that arises from converting protobuf messages to JSON.

@nielspardon
Copy link
Copy Markdown
Member Author

The JSON format described in that documentation does not exist yet, and is not the same as the JSON that arises from converting protobuf messages to JSON.

We can skip the JSON part then. I understand you have further concerns about the toProto() methods. We can close this PR and revisit the topic at a later point time when things become clearer.

*/
public io.substrait.proto.Plan toProto() {
return new PlanProtoConverter().toProto(this);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this some more, it's fine to include this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants