Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions examples/isthmus-api/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
_apps
_data
**/*/bin
build
139 changes: 139 additions & 0 deletions examples/isthmus-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Isthmus API Examples

The Isthmus library converts Substrait plans to and from SQL Plans. There are two examples showing conversion in each direction.

## How does this work in theory?

The [Calcite](https://calcite.apache.org/) library is used to do parsing and generation of the SQL String. Calcite has it's own relational object model, distinct from substrait's. There are classes within Isthmus to convert Substrait to and from Calcite's object model.

The conversion flows work as follows:

**SQL to Substrait:**
`SQL ---[Calcite parsing]---> Calcite Object Model ---[Isthmus conversion]---> Substrait`

**Substrait to SQL:**
`Substrait ---[Isthmus conversion]---> Calcite Object Model ---[Calcite SQL generation]---> SQL`

## Running the examples

There are 2 example classes:

- [FromSql](./src/main/java/io/substrait/examples/FromSql.java) that creates a plan starting from SQL
- [ToSql](./app/src/main/java/io/substrait/examples/ToSQL.java) that reads a plan and creates the SQL


### Requirements

To run these you will need Java 17 or greater, and this repository cloned to you local system.


## Creating a Substrait Plan from SQL

To run [`FromSql.java`](./src/main/java/io/substrait/examples/FromSql.java), execute the command below from the root of this repository.

```bash
./gradlew examples:isthmus-api:run --args "FromSql substrait.plan"
```

The example writes a binary plan to `substrait.plan` and outputs the text format of the protobuf to stdout. The output is quite lengthy, so it has been abbreviated here.

```bash
> Task :examples:isthmus-api:run
extension_uris {
extension_uri_anchor: 2
uri: "/functions_aggregate_generic.yaml"
}
extension_uris {
extension_uri_anchor: 1
uri: "/functions_comparison.yaml"
}
extensions {
extension_function {
extension_uri_reference: 1
function_anchor: 1
name: "equal:any_any"
extension_urn_reference: 1
}
}
extensions {
extension_function {
extension_uri_reference: 2
function_anchor: 2
name: "count:"
extension_urn_reference: 2
}
}
relations {....}
}
version {
minor_number: 77
producer: "isthmus"
}
extension_urns {
extension_urn_anchor: 1
urn: "extension:io.substrait:functions_comparison"
}
extension_urns {
extension_urn_anchor: 2
urn: "extension:io.substrait:functions_aggregate_generic"
}

File written to substrait.plan
```

Please see the code comments for details of how the conversion is done.

## Creating SQL from a Substrait Plan

To run [`ToSql.java`](./src/main/java/io/substrait/examples/ToSql.java), execute the command below from the root of this repository.
```bash
./gradlew examples:isthmus-api:run --args "ToSql substrait.plan"
```

The example reads from `substrait.plan` (likely the file created by `FromSql`) and outputs SQL. The text format of the protobuf has been abbreviated
```bash
> Task :examples:isthmus-api:run
Reading from substrait.plan
extension_uris {
extension_uri_anchor: 2
uri: "/functions_aggregate_generic.yaml"
}
extension_uris {
extension_uri_anchor: 1
uri: "/functions_comparison.yaml"
}
extensions {
extension_function {
extension_uri_reference: 1
function_anchor: 1
name: "equal:any_any"
extension_urn_reference: 1
}
}
extensions {....}
relations {....}
version {
minor_number: 77
producer: "isthmus"
}
extension_urns {
extension_urn_anchor: 1
urn: "extension:io.substrait:functions_comparison"
}
extension_urns {
extension_urn_anchor: 2
urn: "extension:io.substrait:functions_aggregate_generic"
}


SELECT `t2`.`colour0` AS `COLOUR`, `t2`.`$f1` AS `COLOURCOUNT`
FROM (SELECT `vehicles`.`colour` AS `colour0`, COUNT(*) AS `$f1`
FROM `vehicles`
INNER JOIN `tests` ON `vehicles`.`vehicle_id` = `tests`.`vehicle_id`
WHERE `tests`.`test_result` = 'P'
GROUP BY `vehicles`.`colour`
ORDER BY COUNT(*) IS NULL, 2) AS `t2`

```

The SQL statement in the selected dialect will be created (MySql is used in the example).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is dialect selected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this is better?

The example generates SQL in MySQL dialect. Different SQL dialects can be specified when converting from Substrait to SQL.

22 changes: 22 additions & 0 deletions examples/isthmus-api/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
plugins {
// Apply the application plugin to add support for building a CLI application in Java.
id("application")
alias(libs.plugins.spotless)
id("substrait.java-conventions")
}

repositories { mavenCentral() }

dependencies {
implementation(project(":isthmus"))
implementation(libs.calcite.core)
implementation(libs.calcite.server)
}

application { mainClass = "io.substrait.examples.IsthmusAppExamples" }

tasks.named<Test>("test") { useJUnitPlatform() }

java { toolchain { languageVersion.set(JavaLanguageVersion.of(17)) } }

tasks.pmdMain { dependsOn(":core:shadowJar") }
107 changes: 107 additions & 0 deletions examples/isthmus-api/src/main/java/io/substrait/examples/FromSql.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
package io.substrait.examples;

import io.substrait.examples.IsthmusAppExamples.Action;
import io.substrait.isthmus.SqlToSubstrait;
import io.substrait.isthmus.SubstraitTypeSystem;
import io.substrait.isthmus.sql.SubstraitCreateStatementParser;
import io.substrait.plan.Plan;
import io.substrait.plan.PlanProtoConverter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import org.apache.calcite.config.CalciteConnectionConfig;
import org.apache.calcite.config.CalciteConnectionProperty;
import org.apache.calcite.jdbc.CalciteSchema;
import org.apache.calcite.jdbc.JavaTypeFactoryImpl;
import org.apache.calcite.prepare.CalciteCatalogReader;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.sql.SqlDialect;
import org.apache.calcite.sql.parser.SqlParseException;

/**
* Substrait from SQL conversions.
*
* <p>The conversion process involves four steps:
*
* <p>1. Create a fully typed schema for the inputs. Within a SQL context this represents the CREATE
* TABLE commands, which need to be converted to a Calcite Schema.
*
* <p>2. Parse the SQL query to convert (in the source SQL dialect).
*
* <p>3. Convert the SQL query to Calcite Relations.
*
* <p>4. Convert the Calcite Relations to Substrait relations.
*
* <p>Note that the schema could be created from other means, such as Calcite's reflection-based
* schema.
*/
public class FromSql implements Action {

@Override
public void run(final String[] args) {
try {
final String createSql =
"""
CREATE TABLE "vehicles" ("vehicle_id" varchar(15), "make" varchar(40), "model" varchar(40),
"colour" varchar(15), "fuel_type" varchar(15),
"cylinder_capacity" int, "first_use_date" varchar(15));

CREATE TABLE "tests" ("test_id" varchar(15), "vehicle_id" varchar(15),
"test_date" varchar(20), "test_class" varchar(20), "test_type" varchar(20),
"test_result" varchar(15),"test_mileage" int, "postcode_area" varchar(15));

""";

// Create the Calcite Schema from the CREATE TABLE statements.
// The Isthmus helper classes assume a standard SQL format for parsing.
final CalciteSchema calciteSchema = CalciteSchema.createRootSchema(false);
SubstraitCreateStatementParser.processCreateStatements(createSql)
.forEach(t -> calciteSchema.add(t.getName(), t));

// Type Factory based on Java Types
final RelDataTypeFactory typeFactory =
new JavaTypeFactoryImpl(SubstraitTypeSystem.TYPE_SYSTEM);

// Default configuration for calcite
final CalciteConnectionConfig calciteDefaultConfig =
CalciteConnectionConfig.DEFAULT.set(
CalciteConnectionProperty.CASE_SENSITIVE, Boolean.FALSE.toString());

final CalciteCatalogReader catalogReader =
new CalciteCatalogReader(calciteSchema, List.of(), typeFactory, calciteDefaultConfig);

// Query that needs to be converted; again this could be in a variety of SQL dialects
final String apacheDerbyQuery =
"""
SELECT vehicles.colour, count(*) as colourcount FROM vehicles INNER JOIN tests
ON vehicles.vehicle_id=tests.vehicle_id WHERE tests.test_result = 'P'
GROUP BY vehicles.colour ORDER BY count(*)
""";
final SqlToSubstrait sqlToSubstrait = new SqlToSubstrait();

// choose Apache Derby as an example dialect
final SqlDialect dialect = SqlDialect.DatabaseProduct.DERBY.getDialect();
final Plan substraitPlan = sqlToSubstrait.convert(apacheDerbyQuery, catalogReader, dialect);

// Create the proto plan to display to stdout - as it has a better format
final PlanProtoConverter planToProto = new PlanProtoConverter();
final io.substrait.proto.Plan protoPlan = planToProto.toProto(substraitPlan);
System.out.println(protoPlan);

// write out to file if given a file name
// convert to a protobuff byte array and write as binary file
if (args.length == 1) {

final byte[] buffer = protoPlan.toByteArray();
final Path outputFile = Paths.get(args[0]);
Files.write(outputFile, buffer);
System.out.println("File written to " + outputFile);
}

} catch (SqlParseException | IOException e) {
e.printStackTrace();
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package io.substrait.examples;

import java.util.Arrays;

/** Main class */
public final class IsthmusAppExamples {

/** Implemented by all examples */
@FunctionalInterface
public interface Action {

/**
* Run
*
* @param args String []
*/
void run(String[] args);
}

private IsthmusAppExamples() {}

/**
* Traditional main method
*
* @param args string[]
*/
@SuppressWarnings("unchecked")
public static void main(final String args[]) {
try {

if (args.length == 0) {
System.err.println(
"Please provide base classname of example to run. eg ToSql to run class io.substrait.examples.ToSql ");
System.exit(-1);
}
final String exampleClass = args[0];

final Class<Action> clz =
(Class<Action>)
Class.forName(
String.format("%s.%s", IsthmusAppExamples.class.getPackageName(), exampleClass));
final Action action = clz.getDeclaredConstructor().newInstance();
if (args.length == 1) {
action.run(new String[] {});
} else {
action.run(Arrays.copyOfRange(args, 1, args.length));
}
} catch (Exception e) {
e.printStackTrace();
System.exit(-1);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package io.substrait.examples;

import io.substrait.isthmus.calcite.SubstraitTable;
import io.substrait.isthmus.sql.SubstraitCreateStatementParser;
import java.util.ArrayList;
import java.util.List;
import org.apache.calcite.jdbc.CalciteSchema;
import org.apache.calcite.prepare.CalciteCatalogReader;
import org.apache.calcite.sql.parser.SqlParseException;

/** Helper functions for schemas. */
public final class SchemaHelper {

private SchemaHelper() {}

/**
* Parses one or more SQL strings containing only CREATE statements into a {@link
* CalciteCatalogReader}
*
* @param createStatements a SQL string containing only CREATE statements
* @return a {@link CalciteCatalogReader} generated from the CREATE statements
* @throws SqlParseException
*/
public static CalciteSchema processCreateStatementsToSchema(final List<String> createStatements)
throws SqlParseException {

final List<SubstraitTable> tables = new ArrayList<>();
for (final String statement : createStatements) {
tables.addAll(SubstraitCreateStatementParser.processCreateStatements(statement));
}

final CalciteSchema rootSchema = CalciteSchema.createRootSchema(false);
for (final SubstraitTable table : tables) {
rootSchema.add(table.getName(), table);
}

return rootSchema;
}
}
Loading