From 8a9fd6926e1d936263a7a5e1539f78836ee31151 Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Tue, 2 Jun 2026 11:41:40 +0200 Subject: [PATCH 01/11] feature - implement RFC 003 query blocks (#4) --- .gitignore | 1 + docs/language/README.md | 2 + docs/language/reference/dataset_carriers.md | 2 +- docs/language/reference/dataset_methods.md | 14 +- docs/language/reference/query_blocks.md | 59 ++ docs/release_notes/v0_1.md | 5 +- docs/rfcs/003_inql_query_blocks.md | 36 +- docs/rfcs/013_function_catalog_program.md | 5 +- docs/rfcs/README.md | 4 +- examples/README.md | 2 +- examples/dataset_api.incn | 4 +- incan.toml | 3 + scripts/smoke_pub_consumer.sh | 289 +++++++++ src/aggregate_builders.incn | 49 ++ src/dataset/mod.incn | 271 ++++++--- src/dataset/ops.incn | 59 +- src/lib.incn | 7 +- src/prism/lower.incn | 41 +- src/prism/mod.incn | 176 +++--- src/prism/output_columns.incn | 46 +- src/prism/rewrite.incn | 30 +- src/prism/store.incn | 40 +- src/prism/types.incn | 13 +- src/projection_builders.incn | 5 + src/session/datafusion_backend.incn | 70 ++- src/session/types.incn | 9 +- src/substrait/expr_lowering.incn | 70 ++- src/substrait/extensions.incn | 16 + src/substrait/mod.incn | 2 + src/substrait/relations.incn | 108 +++- src/substrait/schema.incn | 42 ++ tests/test_dataset.incn | 87 ++- tests/test_prism.incn | 33 +- tests/test_session_generators.incn | 20 + tests/test_session_projection.incn | 2 +- tests/test_substrait_plan.incn | 4 +- vocab_companion/Cargo.lock | 114 ++++ vocab_companion/Cargo.toml | 10 + vocab_companion/src/desugar.rs | 636 ++++++++++++++++++++ vocab_companion/src/lib.rs | 139 +++++ 40 files changed, 2202 insertions(+), 323 deletions(-) create mode 100644 docs/language/reference/query_blocks.md create mode 100644 vocab_companion/Cargo.lock create mode 100644 vocab_companion/Cargo.toml create mode 100644 vocab_companion/src/desugar.rs create mode 100644 vocab_companion/src/lib.rs diff --git a/.gitignore b/.gitignore index 3632b66..b4fa2c1 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,7 @@ # will have compiled files and executables debug target +incan # These are backup files generated by rustfmt **/*.rs.bk diff --git a/docs/language/README.md b/docs/language/README.md index bcf5bc0..e157418 100644 --- a/docs/language/README.md +++ b/docs/language/README.md @@ -11,6 +11,7 @@ This section documents the current InQL package surface. - [Dataset carriers (Reference)][dataset-reference] - [Dataset carriers (Explanation)][dataset-explanation] +- [Query blocks (Reference)][query-blocks-reference] ### Execution and materialization @@ -29,6 +30,7 @@ This section documents the current InQL package surface. [explanation]: explanation/ [dataset-reference]: reference/dataset_carriers.md [dataset-explanation]: explanation/dataset_carriers.md +[query-blocks-reference]: reference/query_blocks.md [execution-reference]: reference/execution_context.md [execution-explanation]: explanation/execution_context.md [substrait-read-root]: reference/substrait/read_root_binding_contract.md diff --git a/docs/language/reference/dataset_carriers.md b/docs/language/reference/dataset_carriers.md index 00c867a..e40d77a 100644 --- a/docs/language/reference/dataset_carriers.md +++ b/docs/language/reference/dataset_carriers.md @@ -37,7 +37,7 @@ Deferred logical pipeline. Always bounded. ### `DataStream[T]` -Streaming specialization. Shares the `DataSet[T]` API while carrying unbounded semantics. +Streaming specialization. Shares the carrier method surface while carrying unbounded semantics. ## Related reference pages diff --git a/docs/language/reference/dataset_methods.md b/docs/language/reference/dataset_methods.md index 00458c6..4215a9d 100644 --- a/docs/language/reference/dataset_methods.md +++ b/docs/language/reference/dataset_methods.md @@ -1,6 +1,6 @@ # Dataset methods (Reference) -This page documents the current `DataSet[T]` method surface. Builder-function details live under `reference/builders/`. +This page documents the current carrier method surface. Builder-function details live under `reference/builders/`. The Substrait helper surface behind these methods is split by semantic role: @@ -9,21 +9,25 @@ The Substrait helper surface behind these methods is split by semantic role: - `src/substrait/inspect.incn` owns relation/plan inspection and output-column inference - `src/schema_registry.incn` owns logical named-table schema binding -## Shared method surface +## Carrier method surface | Method | Signature | Meaning | | ------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | | `filter` | `def filter(self, predicate: ColumnExpr) -> Self` | Restrict rows by a boolean scalar expression. | | `join` | `def join(self, other: Self, on: bool) -> Self` | Combine with another same-carrier relation using the package's boolean join predicate surface. | -| `select` | `def select(self) -> Self` | Preserve the current projection shape as an identity projection. | +| `select` | `def select[U](self, assignments: list[ProjectionAssignment] = []) -> SameCarrier[U]` | Project an output row shape while preserving the carrier kind. | | `with_column` | `def with_column(self, name: str, expr: ColumnExpr) -> Self` | Add or replace one projected column using a scalar expression. | | `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. | | `agg` | `def agg(self, measures: list[AggregateMeasure]) -> Self` | Apply aggregate measures over the current relation or current grouping. | | `generate` | `def generate(self, generator: GeneratorApplication) -> Self` | Apply a relation-shaping generator such as `explode(...)` with explicit output aliases. | -| `with_window_column` | `def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self` | Add or replace one projected column using a placed window function. | +| `with_window_column` | `def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self` | Add or replace one projected column using a named window function. | | `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. | | `limit` | `def limit(self, n: int) -> Self` | Cap row count. | +`SameCarrier[U]` means `DataFrame[U]` for `DataFrame[T]`, `LazyFrame[U]` for `LazyFrame[T]`, and `DataStream[U]` +for `DataStream[T]`. The root `DataSet[T]` trait remains the common plan/schema contract; schema-changing +projection is expressed on concrete carriers until Incan grows native trait type-family support. + ## `with_column` ### Signature @@ -67,7 +71,7 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]: ## Capability notes - `join(...)` is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature. -- `select(...)` preserves projection shape; explicit projection lists are represented today through `with_column(...)` and scalar-expression builders. +- `select(...)` is the schema-changing projection boundary used by query blocks. Identity `select()` preserves the current row model through its surrounding expected type, while explicit assignments can retarget to a new row model. - `generate(...)` preserves all input columns and appends generated output aliases for `explode`, `explode_outer`, `posexplode`, `posexplode_outer`, `inline`, `inline_outer`, `flatten`, and `stack` generator applications. Alias collisions are rejected during planning/lowering. - `with_window_column(...)` supports placed ranking, distribution, offset, value, and aggregate-over-window helpers over explicit window specs. Portable helpers lower through Substrait window relations and execute through the DataFusion session adapter. - `DataFrame[T]` exposes materialized metadata and preview text; row-level accessors belong to the materialized DataFrame API surface. diff --git a/docs/language/reference/query_blocks.md b/docs/language/reference/query_blocks.md new file mode 100644 index 0000000..06c6157 --- /dev/null +++ b/docs/language/reference/query_blocks.md @@ -0,0 +1,59 @@ +# Query blocks (Reference) + +Query blocks are dependency-activated InQL expressions. Import `pub::inql` to make the vocabulary and helper surface +available in a downstream Incan package. + +```incan +from pub::inql import DataFrame, count, desc, sum +from models import Order, OrderSummary + +def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: + return query { + FROM orders + GROUP BY .customer_id + SELECT + .customer_id as customer_id, + sum(.amount) as total, + count() as order_count, + ORDER BY desc(.total) + LIMIT 10 + } +``` + +InQL also accepts the colon spelling in expression position: + +```incan +selected = query: + FROM orders + SELECT: + .customer_id as customer_id + .amount as amount +``` + +## Clauses + +The implemented v0.1 query-block surface supports: + +- `FROM ` +- `WHERE ` before or after `SELECT` +- `GROUP BY , ...` +- `SELECT as , ...` +- `SELECT DISTINCT as , ...` +- `ORDER BY , ...` +- `LIMIT ` +- `JOIN ON ` +- `LEFT JOIN ON ` +- `EXPLODE as ` +- `WINDOW BY = ` + +`ORDER BY` uses InQL ordering helpers such as `asc(...)` and `desc(...)`; postfix SQL spellings such as +`.amount DESC` are not part of the v0.1 query-block grammar. + +## Resolution + +- `.column` refers to the primary `FROM` relation or the current query schema after a projection boundary. +- `relation.column` refers to an explicitly joined relation. +- `SELECT` aliases become the output schema for later clauses. +- A `SELECT` alias may be reused by later expressions in the same `SELECT` list. + +Query blocks lower into the same Dataset, Prism, Substrait, and Session adapter path as equivalent method-chain code. diff --git a/docs/release_notes/v0_1.md b/docs/release_notes/v0_1.md index c22a47e..1fef3fd 100644 --- a/docs/release_notes/v0_1.md +++ b/docs/release_notes/v0_1.md @@ -9,7 +9,10 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable). - **Language:** Foundational InQL syntax and semantics (naming, query schema, layer boundaries). - **Carriers:** `DataSet[T]` hierarchy including bounded vs unbounded traits and concrete frame/stream types. - **Plans:** Apache Substrait as the logical interchange contract. -- **Authoring:** method-chain lowering into a real Substrait boundary today, with `query {}` work still ahead. +- **Authoring:** method-chain lowering and RFC 003 `query {}` blocks share the same InQL logical planning path and + Substrait boundary. Query blocks support the brace spelling and expression-position `query:` spelling, including + SELECT aliases, lateral alias reuse, grouped aggregates, `SELECT DISTINCT`, post-SELECT filters, ordering, limits, + inner and left joins, generator clauses, and named window expressions. - **Aggregates:** builder-based `col`, `sum`, `count`, `count_expr`, `count_distinct`, `count_if`, `avg`, `min`, and `max` helpers now lower grouped and global aggregates through Prism, Substrait, and Session execution. `count()` counts rows, `count(expr)` counts non-null expression values, `count_expr(expr)` remains a compatibility spelling, and the first aggregate modifier slice supports `DISTINCT` plus aggregate-local `FILTER` where valid. - **Scalar expressions:** RFC 012 unifies filter predicates, computed projection values, grouping keys, and aggregate inputs around one `ColumnExpr` surface with canonical `lit(...)` and typed literal helpers. - **Core scalar functions:** RFC 015 adds registry-backed scalar function applications and the first core helper slice for casts, comparisons, boolean logic, null/NaN predicates, arithmetic, conditionals, membership/range predicates, and ordering expressions. Primitive cast targets can use source-level type tokens such as `cast(col("amount_text"), float)`, while explicit string target spellings remain available for compatibility aliases such as `int64` and `float64`. Implemented helpers lower to Substrait IR through registry metadata, built-in Rex shapes, or structural sort-field lowering; DataFusion remains the first execution adapter rather than the semantic boundary. diff --git a/docs/rfcs/003_inql_query_blocks.md b/docs/rfcs/003_inql_query_blocks.md index 40586f4..9928d89 100644 --- a/docs/rfcs/003_inql_query_blocks.md +++ b/docs/rfcs/003_inql_query_blocks.md @@ -1,6 +1,6 @@ # InQL RFC 003: `query {}` blocks — syntax, typing, Substrait -- **Status:** Planned +- **Status:** Implemented - **Created:** 2026-03-22 - **Author(s):** Danny Meijer - **Related:** @@ -9,12 +9,12 @@ - InQL RFC 002 (Apache Substrait — **normative `Rel`-level contract** for lowering) - **Issue:** [InQL #4](https://github.com/dannys-code-corner/InQL/issues/4) - **RFC PR:** - -- **Written against:** Incan v0.2 -- **Shipped in:** - +- **Written against:** Incan v0.3 +- **Shipped in:** InQL v0.1 ## Summary -This RFC specifies the **`query { ... }`** expression: grammar, typechecking (including clause-level use of `.column`, `relation.column`, bare identifiers, and aggregate rules), vocabulary activation for the `query` keyword (InQL package as dependency), and lowering to Apache Substrait. Naming-form semantics and current query schema are defined in InQL RFC 000; this RFC **must** remain consistent with that document. It depends on InQL RFC 001: `FROM` sources **must** conform to InQL RFC 001's `DataSet[T]` trait (`DataFrame[T]`, `LazyFrame[T]`, or `DataStream[T]`) so that `T` supplies fields for resolution. InQL RFC 002 owns the Substrait `Rel` and expression contract, mapping catalog, and read vs binding boundaries; this RFC **must** conform to InQL RFC 002 for serialized plan semantics. `SELECT DISTINCT` is part of the minimum clause surface defined here. +This RFC specifies the **`query { ... }`** expression: grammar, typechecking (including clause-level use of `.column`, `relation.column`, bare identifiers, and aggregate rules), vocabulary activation for the `query` keyword (InQL package as dependency), and lowering to Apache Substrait. InQL also accepts the expression-position colon spelling `query:` for consistency with Incan vocabulary declarations. Naming-form semantics and current query schema are defined in InQL RFC 000; this RFC **must** remain consistent with that document. It depends on InQL RFC 001: `FROM` sources **must** conform to InQL RFC 001's `DataSet[T]` trait (`DataFrame[T]`, `LazyFrame[T]`, or `DataStream[T]`) so that `T` supplies fields for resolution. InQL RFC 002 owns the Substrait `Rel` and expression contract, mapping catalog, and read vs binding boundaries; this RFC **must** conform to InQL RFC 002 for serialized plan semantics. `SELECT DISTINCT` is part of the minimum clause surface defined here. ## Motivation @@ -39,8 +39,7 @@ A SQL-familiar surface inside Incan improves readability and enables compile-tim ## Guide-level explanation ```incan -from pub::inql import DataFrame -from pub::inql.functions import count, sum, avg +from pub::inql import DataFrame, avg, count, desc, sum from models import Order, OrderSummary def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: @@ -49,11 +48,11 @@ def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: WHERE .status == "completed" GROUP BY .region SELECT - region, + .region as region, count() as order_count, sum(.amount) as total_revenue, avg(.amount) as avg_order_value, - ORDER BY total_revenue DESC + ORDER BY desc(.total_revenue) } ``` @@ -64,8 +63,9 @@ The compiler checks `.status`, `.amount`, `GROUP BY` / `SELECT` consistency, and ### Packaging and activation - Projects that depend on InQL **must** obtain `query` through library-driven vocabulary activation in the host compiler. -- A compilation unit with InQL active **must** parse `query { ... }` as specified here. -- Aggregate helpers such as `count`, `sum`, `avg`, `min`, and `max` are library symbols, instead of ambient builtins. Examples in this RFC import them from `pub::inql.functions`; implementations **must** provide an equivalent importable surface for aggregate functions used in relational expressions. +- A compilation unit with InQL active **must** parse `query { ... }` as specified here. It **may** also accept + expression-position `query:` as an equivalent spelling. +- Aggregate helpers such as `count`, `sum`, `avg`, `min`, and `max` are library symbols, instead of ambient builtins. Examples in this RFC import them from the `pub::inql` facade; implementations **must** provide an equivalent importable surface for aggregate functions used in relational expressions. ### `FROM` and relation to InQL RFC 001 @@ -95,8 +95,8 @@ Inside relational expression positions (`WHERE`, `JOIN ON`, `GROUP BY`, `ORDER B ### Aggregates - Under `GROUP BY`, `SELECT` references **must** be grouped or aggregated; illegal mixing **must** error. -- Aggregate function calls in relational expressions **must** resolve through imported library symbols (for example `from pub::inql.functions import count, sum, avg`). The compiler **must not** treat `count`, `sum`, `avg`, `min`, or `max` as implicitly in scope ambient names. -- This RFC defines the minimum required aggregate-function surface and import model for `query {}`; it is not an exhaustive catalog of all present or future InQL functions. Additional functions **may** be added later through additive library evolution or follow-up RFCs, provided they do not change the semantics of the required set defined here. +- Aggregate function calls in relational expressions **must** resolve through imported library symbols (for example `from pub::inql import count, sum, avg`). The compiler **must not** treat `count`, `sum`, `avg`, `min`, or `max` as implicitly in scope ambient names. +- This RFC defines the minimum required aggregate-function surface and import model for `query {}`; it is not an exhaustive catalog of all InQL functions. Additional functions require additive library evolution or follow-up RFCs that do not change the semantics of the required set defined here. - `SELECT DISTINCT` **must** be supported as a projection modifier in the minimum `query {}` surface. It removes duplicate rows from the projected schema and lowers using the distinct-row contract defined by InQL RFC 002. ### Clause inventory (minimum) @@ -105,8 +105,8 @@ This RFC **must** require at least: - `FROM`, `WHERE`, `SELECT`, `GROUP BY`, `ORDER BY`, `LIMIT` - inner `JOIN ... ON`, `LEFT JOIN ... ON` -- `EXPLODE` (or equivalent) for `List` fields -- `WINDOW BY` (or equivalent) for ranked/windowed forms in scope +- `EXPLODE as ` for list-valued expressions +- `WINDOW BY = ` for ranked/windowed forms in scope Post-`SELECT` filters on the projected schema use `WHERE` again (a `WHERE` clause ordered after `SELECT` in the block). `HAVING` is **not** InQL syntax and **must not** be introduced. @@ -155,9 +155,11 @@ Post-`SELECT` filters on the projected schema use `WHERE` again (a `WHERE` claus - **Return type inference**: `query {}` infers the output schema from the `SELECT` list. The result preserves the collection kind of the `FROM` source: a `query {}` over a `DataStream` yields a `DataStream`; over a `LazyFrame` yields a `LazyFrame`; over a `DataFrame` yields a `DataFrame`. Explicit type annotation at the call site is optional but recommended for documentation. - **Post-`SELECT` clause ordering**: the canonical clause order is `FROM` → `JOIN` → `WHERE` → `GROUP BY` → `SELECT` → `WHERE` (post-`SELECT` filter) → `ORDER BY` → `LIMIT`. `HAVING` is not InQL syntax (InQL RFC 000). Exact diagnostic wording for ordering violations is an implementation detail. - **Aggregate minimum set**: the initial implementation requires at least `count`, `sum`, `avg`, `min`, `max`. Window functions (`WINDOW BY` and ranked expressions) are part of the clause inventory but their detailed builtin set evolves during implementation. `DataStream` source restrictions follow InQL RFC 001's static capability gating: operations requiring unbounded state are statically rejected. -- **Aggregate function scope:** the minimum aggregate set is exposed through an importable InQL functions module (examples use `pub::inql.functions`). These names are ordinary imported symbols that gain aggregate meaning in aggregate-capable relational positions; they are not special ambient builtins. -- **`IN` clause**: deferred to a future amendment. The initial implementation does not require `IN` as a clause operator. If added later, the RHS **must** conform to `DataSet[T]` with a compatible schema. +- **Aggregate function scope:** the minimum aggregate set is exposed through an importable InQL facade (examples use `pub::inql`). These names are ordinary imported symbols that gain aggregate meaning in aggregate-capable relational positions; they are not special ambient builtins. +- **`IN` clause**: not part of the RFC003 clause grammar. A separate RFC is required before introducing `IN` as a query-block clause operator, and its RHS contract must conform to `DataSet[T]` with a compatible schema. - **Substrait version and mapping catalog**: InQL RFC 002 owns pinning policy, the north-star operator → `Rel` catalog, and extension URI requirements; the exact revision shipped with a toolchain is documented in release artifacts alongside the implementation. - **Alternate surfaces**: pipe-forward is InQL RFC 005; method chains are InQL RFC 001. This RFC does not mandate alternative surfaces in the initial implementation. -- **Minimum join surface:** the required v0.1 clause inventory includes `JOIN ... ON` (inner join) and `LEFT JOIN ... ON`. `RIGHT` and `FULL OUTER` joins are not part of the required minimum and **may** be added later as additive extensions. +- **Minimum join surface:** the required v0.1 clause inventory includes `JOIN ... ON` (inner join) and `LEFT JOIN ... ON`. `RIGHT` and `FULL OUTER` joins are not part of the required RFC003 minimum; adding them requires an additive extension that preserves the current join semantics. - **`SELECT DISTINCT`:** `query {}` **must** support `SELECT DISTINCT` in the minimum clause surface. It is the canonical clause-level spelling for duplicate elimination in this surface; method-chain APIs may expose equivalent operations, but they do not replace the query-surface keyword. +- **Ordering syntax:** the implemented v0.1 query surface uses ordering helpers such as `asc(.amount)` and + `desc(.amount)` rather than postfix SQL tokens such as `.amount DESC`. diff --git a/docs/rfcs/013_function_catalog_program.md b/docs/rfcs/013_function_catalog_program.md index 92a19ca..a6733f4 100644 --- a/docs/rfcs/013_function_catalog_program.md +++ b/docs/rfcs/013_function_catalog_program.md @@ -1,6 +1,6 @@ # InQL RFC 013: Function catalog program -- **Status:** In Progress +- **Status:** Implemented - **Created:** 2026-04-27 - **Author(s):** Danny Meijer (@dannymeijer) - **Related:** @@ -21,7 +21,7 @@ - **Issue:** [InQL #30](https://github.com/dannys-code-corner/InQL/issues/30) - **RFC PR:** — - **Written against:** Incan v0.2 -- **Shipped in:** — +- **Shipped in:** InQL v0.1 ## Summary @@ -141,3 +141,4 @@ Existing helpers may remain while the child RFCs migrate them into the registry- - **Child RFC scope:** the current child RFC set is the scope of the function catalog program. InQL RFC 014 through InQL RFC 026 are required children unless this umbrella RFC is later amended or superseded. - **Implemented status:** this umbrella RFC may be marked Implemented only when all required child RFCs through InQL RFC 026 are implemented, rejected, or superseded by explicit design decision. Extension, sketch, typed sketch value, and semi-structured value families are part of the umbrella scope, not optional follow-on scope. +- **Closeout:** the required child RFC set is now resolved. InQL RFC 014 through InQL RFC 026 are implemented, and the shared catalog surface also includes typed value-or-column inputs plus schema-aware scalar input validation as cross-cutting support grounded in InQL RFC 000, InQL RFC 012, and InQL RFC 014. No additional child RFC is required for that support because it tightens the existing registry and query-schema contracts rather than adding a new function family. diff --git a/docs/rfcs/README.md b/docs/rfcs/README.md index 9f177dc..e11824e 100644 --- a/docs/rfcs/README.md +++ b/docs/rfcs/README.md @@ -9,7 +9,7 @@ InQL uses its **own** RFC series (starting at 000), independent of the [Incan la | [000][rfc-000] | Planned | Language specification — core model, naming, schema shapes, layer boundaries | | | [001][rfc-001] | In Progress | Dataset types and carriers (`DataSet[T]`, `BoundedDataSet[T]`, `UnboundedDataSet[T]`) | | | [002][rfc-002] | In Progress | Apache Substrait — `Rel`-level contract, mapping catalog, binding boundaries | | -| [003][rfc-003] | Planned | `query {}` blocks — grammar, typing, Substrait lowering | | +| [003][rfc-003] | Implemented | `query {}` blocks — grammar, typing, Substrait lowering | | | [004][rfc-004] | In Progress | Execution context — session, DataFusion, read/transform/write | | | [005][rfc-005] | Blocked | Pipe-forward relational syntax (`\|>`) — optional surface | | | [006][rfc-006] | Blocked | Promote unnest/explode to core Substrait lowering — blocked on upstream Substrait standardization | | @@ -19,7 +19,7 @@ InQL uses its **own** RFC series (starting at 000), independent of the [Incan la | [010][rfc-010] | Draft | CSV dialect and interpretation contract | | | [011][rfc-011] | Draft | Source discovery and parse-unit expansion | | | [012][rfc-012] | Implemented | Unified scalar expression surface | | -| [013][rfc-013] | In Progress | Function catalog program | | +| [013][rfc-013] | Implemented | Function catalog program | | | [014][rfc-014] | Implemented | Function registry and catalog governance | | | [015][rfc-015] | Implemented | Core scalar functions and operators | | | [016][rfc-016] | Implemented | Core aggregate functions | | diff --git a/examples/README.md b/examples/README.md index f2e7291..15860ac 100644 --- a/examples/README.md +++ b/examples/README.md @@ -59,5 +59,5 @@ These RFCs provide the trait and interop foundation InQL builds on. ## Scope boundaries - **Materialized row access** — Session collection exposes typed `DataFrame[T]` materialization metadata and preview text; row iteration/accessor design belongs to the DataFrame API work. -- **Output row retargeting** — projection and aggregate methods currently preserve the carrier type parameter while planned columns expose shape changes. +- **Output row retargeting** — `select[U](...)` preserves the carrier kind while allowing the projected row model to change. - **Convenience authoring** — these examples use the explicit builder surface that concise query and operator surfaces lower into. diff --git a/examples/dataset_api.incn b/examples/dataset_api.incn index 9e5546d..834f91e 100644 --- a/examples/dataset_api.incn +++ b/examples/dataset_api.incn @@ -9,7 +9,7 @@ describe portable relational intent. - builder-based grouping and aggregation via `col(...)`, `sum(...)`, and `count()` """ -from functions import add, col, count, eq, gt, lit, mul, sum +from functions import add, always_true, col, count, eq, gt, lit, mul, sum from dataset import DataSet, BoundedDataSet, DataFrame, LazyFrame from models import AggregateOrder, Customer, Order, OrderLine @@ -31,7 +31,7 @@ def customer_dimension(customers: LazyFrame[Customer]) -> LazyFrame[Customer]: def join_orders_with_revision(orders: LazyFrame[Order], revised_orders: LazyFrame[Order]) -> LazyFrame[Order]: - return orders.join(revised_orders, true) + return orders.join(revised_orders, always_true()) def customer_dimension_preview(customers: LazyFrame[Customer]) -> LazyFrame[Customer]: diff --git a/incan.toml b/incan.toml index d0a33f6..6c3091e 100644 --- a/incan.toml +++ b/incan.toml @@ -15,3 +15,6 @@ datafusion_expr = { package = "datafusion-expr", version = "53" } datafusion-substrait = { version = "53", features = ["protoc"] } crc32fast = "1.5" url = "2.5" + +[vocab] +crate = "vocab_companion" diff --git a/scripts/smoke_pub_consumer.sh b/scripts/smoke_pub_consumer.sh index b7b275b..00cb3cf 100644 --- a/scripts/smoke_pub_consumer.sh +++ b/scripts/smoke_pub_consumer.sh @@ -40,7 +40,296 @@ def main() -> Result[None, SessionError]: return Ok(None) EOF +cat > "$PROJECT_DIR/src/query_blocks_smoke.incn" < LazyFrame[AggregateOrder]: + """Load the shared aggregate-order fixture.""" + return assert_is_ok( + session.read_csv(table_name, AGGREGATE_ORDERS_CSV_FIXTURE), + "aggregate orders fixture should load", + ) + + +def _collect_or_fail[T with Clone](mut session: Session, frame: LazyFrame[T]) -> DataFrame[T]: + """Collect a query-block frame or fail with the backend diagnostic.""" + match session.collect(frame): + Ok(df) => return df + Err(err) => return fail_t(err.error_message()) + + +def _preview_line_contains_all(line: str, expected_cells: list[str]) -> bool: + """Return whether one rendered preview row contains every expected cell value.""" + for cell in expected_cells: + if not line.contains(cell): + return false + return true + + +def _assert_preview_row_contains(payload: str, expected_cells: list[str], context: str) -> None: + """Assert one rendered preview row carries the expected materialized cells together.""" + for line in payload.split("\n"): + if _preview_line_contains_all(line, expected_cells): + return + return fail_t(context) + + +def _brace_select_aliases_and_lateral_aliases_materialize() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_select") + + # -- Act -- + selected: LazyFrame[SelectedOrder] = query { + FROM orders + SELECT + .customer_id as customer, + .amount + 5 as adjusted, + adjusted * 2 as doubled, + } + df = _collect_or_fail(session, selected) + payload = df.preview_text() + + # -- Assert -- + assert df.row_count() == 3, "query SELECT should preserve source row count" + assert df.resolved_columns() == ["customer", "adjusted", "doubled"], "query SELECT should publish requested aliases" + _assert_preview_row_contains(payload, ["A", "15", "30"], "SELECT should materialize computed aliases") + _assert_preview_row_contains(payload, ["B", "12", "24"], "SELECT should materialize later rows") + + +def _grouped_aggregate_select_materializes_group_and_measures() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_grouped") + + # -- Act -- + grouped: LazyFrame[CustomerRollup] = query { + FROM orders + GROUP BY .customer_id + SELECT + .customer_id as customer, + sum(.amount) as total, + count() as order_count, + } + df = _collect_or_fail(session, grouped) + payload = df.preview_text() + + # -- Assert -- + assert df.row_count() == 2, "query GROUP BY should produce one row per group" + assert df.resolved_columns() == ["customer", "total", "order_count"], "query aggregate SELECT should preserve SELECT order and aliases" + _assert_preview_row_contains(payload, ["A", "25", "2"], "customer A grouped values should materialize") + _assert_preview_row_contains(payload, ["B", "7", "1"], "customer B grouped values should materialize") + + +def _select_distinct_materializes_unique_rows() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_distinct") + + # -- Act -- + distinct_customers: LazyFrame[CustomerOnly] = query { + FROM orders + SELECT DISTINCT + .customer_id as customer, + } + df = _collect_or_fail(session, distinct_customers) + payload = df.preview_text() + + # -- Assert -- + assert df.row_count() == 2, "SELECT DISTINCT should collapse duplicate projection rows" + assert df.resolved_columns() == ["customer"], "SELECT DISTINCT should preserve selected output columns" + _assert_preview_row_contains(payload, ["A"], "SELECT DISTINCT should include customer A") + _assert_preview_row_contains(payload, ["B"], "SELECT DISTINCT should include customer B") + + +def _post_select_where_order_and_limit_materialize() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_order_limit") + + # -- Act -- + selected: LazyFrame[CustomerAmount] = query { + FROM orders + SELECT + .customer_id as customer, + .amount as amount, + WHERE .amount > 10 + ORDER BY desc(.amount) + LIMIT 1 + } + df = _collect_or_fail(session, selected) + payload = df.preview_text() + + # -- Assert -- + assert df.row_count() == 1, "post-SELECT WHERE plus LIMIT should leave one row" + assert df.resolved_columns() == ["customer", "amount"], "post-SELECT filtering should keep projected columns" + _assert_preview_row_contains(payload, ["A", "15"], "ORDER BY DESC should keep the highest filtered amount") + + +def _join_and_left_join_materialize() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_join_left") + discounts = _orders(session, "query_orders_join_right") + missing = _orders(session, "query_orders_left_missing").filter(eq(col("customer_id"), lit("Z"))) + + # -- Act -- + joined: LazyFrame[JoinedAmount] = query { + FROM orders + JOIN discounts + ON .customer_id == discounts.customer_id + SELECT + .customer_id as customer, + discounts.amount as joined_amount, + LIMIT 1 + } + left_joined: LazyFrame[LeftMatchedAmount] = query { + FROM orders + LEFT JOIN missing + ON .customer_id == missing.customer_id + SELECT + .customer_id as customer, + missing.amount as matched_amount, + } + joined_df = _collect_or_fail(session, joined) + left_df = _collect_or_fail(session, left_joined) + + # -- Assert -- + assert joined_df.row_count() == 1, "JOIN query block should materialize joined rows" + assert joined_df.resolved_columns() == ["customer", "joined_amount"], "JOIN SELECT should publish requested aliases" + assert left_df.row_count() == 3, "LEFT JOIN query block should preserve unmatched left rows" + assert left_df.resolved_columns() == ["customer", "matched_amount"], "LEFT JOIN SELECT should publish requested aliases" + + +def _explode_and_window_by_materialize() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_explode_window") + enriched = orders.with_column("tags", array([lit("paid"), col("customer_id")])) + + # -- Act -- + windowed: LazyFrame[ExplodedWindowOrder] = query { + FROM enriched + EXPLODE .tags as tag + WINDOW BY row_num = row_number().over(window().partition_by([.customer_id]).order_by([desc(.amount)])) + SELECT + .customer_id as customer, + .amount as amount, + .tag as tag, + .row_num as row_num, + ORDER BY desc(.amount) + LIMIT 2 + } + df = _collect_or_fail(session, windowed) + payload = df.preview_text() + + # -- Assert -- + assert df.row_count() == 2, "EXPLODE plus WINDOW BY should materialize generated rows" + assert df.resolved_columns() == ["customer", "amount", "tag", "row_num"], "EXPLODE/WINDOW SELECT should publish requested aliases" + _assert_preview_row_contains(payload, ["A", "paid", "1"], "WINDOW BY should rank generated rows") + + +def _colon_spelling_materializes_same_select_surface() -> None: + # -- Arrange -- + mut session = Session.default() + orders = _orders(session, "query_orders_colon_select") + + # -- Act -- + selected: LazyFrame[CustomerAmount] = query: + FROM orders + SELECT: + .customer_id as customer + .amount as amount + df = _collect_or_fail(session, selected) + payload = df.preview_text() + + # -- Assert -- + assert df.row_count() == 3, "query: SELECT should preserve source row count" + assert df.resolved_columns() == ["customer", "amount"], "query: SELECT should publish requested aliases" + _assert_preview_row_contains(payload, ["A", "10"], "query: SELECT should materialize the first row") + _assert_preview_row_contains(payload, ["B", "7"], "query: SELECT should materialize later rows") + + +def main() -> None: + println("query smoke: select") + _brace_select_aliases_and_lateral_aliases_materialize() + println("query smoke: aggregate") + _grouped_aggregate_select_materializes_group_and_measures() + println("query smoke: distinct") + _select_distinct_materializes_unique_rows() + println("query smoke: post-select") + _post_select_where_order_and_limit_materialize() + println("query smoke: joins") + _join_and_left_join_materialize() + println("query smoke: explode-window") + _explode_and_window_by_materialize() + println("query smoke: colon") + _colon_spelling_materializes_same_select_surface() +EOF + (cd "$PROJECT_DIR" && "$INCAN_BIN" lock >/dev/null) (cd "$PROJECT_DIR" && "$INCAN_BIN" --check src/main.incn >/dev/null) +(cd "$PROJECT_DIR" && "$INCAN_BIN" --check src/query_blocks_smoke.incn >/dev/null) +(cd "$PROJECT_DIR" && "$INCAN_BIN" run src/query_blocks_smoke.incn) echo "✓ pub consumer smoke check passed" diff --git a/src/aggregate_builders.incn b/src/aggregate_builders.incn index 35cbb12..c810112 100644 --- a/src/aggregate_builders.incn +++ b/src/aggregate_builders.incn @@ -54,6 +54,8 @@ pub model AggregateMeasure: pub has_filter: bool pub ordering: list[ColumnExpr] pub sketch_type: Option[SketchLogicalType] + pub output_name: str + pub has_output_name: bool def distinct(self) -> Self: """Return this aggregate measure with `DISTINCT` input semantics.""" @@ -69,6 +71,8 @@ pub model AggregateMeasure: has_filter=self.has_filter, ordering=self.ordering, sketch_type=self.sketch_type, + output_name=self.output_name, + has_output_name=self.has_output_name, ) def filter(self, predicate: ColumnExpr) -> Self: @@ -85,6 +89,8 @@ pub model AggregateMeasure: has_filter=true, ordering=self.ordering, sketch_type=self.sketch_type, + output_name=self.output_name, + has_output_name=self.has_output_name, ) def order_by(self, ordering: list[ColumnExpr]) -> Self: @@ -101,6 +107,8 @@ pub model AggregateMeasure: has_filter=self.has_filter, ordering=ordering, sketch_type=self.sketch_type, + output_name=self.output_name, + has_output_name=self.has_output_name, ) def over(self, spec: WindowSpec) -> WindowFunctionApplication: @@ -133,6 +141,8 @@ pub def sum(expr: ColumnExpr) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -151,6 +161,8 @@ pub def count(expr: Option[ColumnExpr] = None) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) return AggregateMeasure( kind=AggregateKind.Count, @@ -164,6 +176,8 @@ pub def count(expr: Option[ColumnExpr] = None) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -186,6 +200,8 @@ pub def avg(expr: ColumnExpr) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -203,6 +219,8 @@ pub def min(expr: ColumnExpr) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -220,6 +238,8 @@ pub def max(expr: ColumnExpr) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -237,6 +257,8 @@ pub def approx_count_distinct(expr: ColumnExpr) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -258,6 +280,8 @@ pub def approx_percentile(expr: ColumnExpr, percentile: float, accuracy: int = 1 has_filter=false, ordering=[], sketch_type=None, + output_name="", + has_output_name=false, ) @@ -281,6 +305,8 @@ pub def hll_sketch( has_filter=false, ordering=[], sketch_type=Some(sketch_type), + output_name="", + has_output_name=false, ) @@ -299,11 +325,34 @@ pub def hll_merge(sketch: SketchExpr) -> AggregateMeasure: has_filter=false, ordering=[], sketch_type=Some(checked.sketch_type), + output_name="", + has_output_name=false, + ) + + +pub def aggregate_as(measure: AggregateMeasure, output_name: str) -> AggregateMeasure: + """Return this aggregate measure with an explicit logical output column name.""" + return AggregateMeasure( + kind=measure.kind, + function_ref=measure.function_ref, + canonical_name=measure.canonical_name, + expr=measure.expr, + arguments=measure.arguments, + has_expr=measure.has_expr, + is_distinct=measure.is_distinct, + filter_expr=measure.filter_expr, + has_filter=measure.has_filter, + ordering=measure.ordering, + sketch_type=measure.sketch_type, + output_name=output_name, + has_output_name=true, ) pub def aggregate_measure_output_name(measure: AggregateMeasure) -> str: """Return the logical output column name for one aggregate measure.""" + if measure.has_output_name: + return measure.output_name mut output_name = measure.kind.value() if measure.is_distinct and measure.has_expr: output_name = output_name + "_distinct_" + scalar_expr_output_name(measure.expr, "expr") diff --git a/src/dataset/mod.incn b/src/dataset/mod.incn index 8c58459..6d7bada 100644 --- a/src/dataset/mod.incn +++ b/src/dataset/mod.incn @@ -56,7 +56,13 @@ See also: from rust::substrait::proto import Plan, Rel from aggregate_builders import AggregateMeasure from generator_builders import GeneratorApplication -from projection_builders import ColumnExpr +from projection_builders import ( + ColumnExpr, + ProjectionAssignment, + project_output_columns, + select_project_output_columns, + with_column_assignment, +) from window_builders import WindowFunctionApplication from dataset.materialization import DataFrameMaterialization from schema_registry import named_table_columns @@ -68,9 +74,11 @@ from dataset.ops import ( filter_ds_of_columns, generate_ds_of_columns, group_by_ds_of_columns, - join_ds, + join_ds_of_columns, + left_join_ds_of_columns, limit_ds, order_by_ds_of_columns, + select_project_ds_of_columns, select_ds_of_columns, with_column_ds, with_window_column_ds, @@ -82,6 +90,7 @@ from prism import ( prism_cursor_apply_filter, prism_cursor_apply_generate, prism_cursor_apply_group_by, + prism_cursor_apply_left_join, prism_cursor_apply_join, prism_cursor_apply_limit, prism_cursor_apply_order_by, @@ -99,8 +108,8 @@ pub trait DataSet[T with Clone]: def try_to_substrait_plan(self) -> Result[Plan, SubstraitLoweringError] def filter(self, predicate: ColumnExpr) -> Self # TODO(InQL RFC 004): Replace the Self-only join surface with heterogeneous join typing and an explicit output-schema contract. - def join(self, other: Self, on: bool) -> Self - def select(self) -> Self + def join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self + def left_join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self def with_column(self, name: str, expr: ColumnExpr) -> Self def group_by(self, columns: list[ColumnExpr]) -> Self def agg(self, measures: list[AggregateMeasure]) -> Self @@ -132,6 +141,9 @@ pub class DataFrame[T with Clone] with BoundedDataSet: pub _type_witness: list[T] pub _materialization: DataFrameMaterialization pub _substrait_rel: Rel + # A bare Substrait Rel does not carry root field names; the Plan root does. Rel-backed carriers keep the current + # plan-time names here so SELECT aliases survive across further transforms and plan rebuilding. + pub _planned_columns: list[str] def preview_text(self) -> str: """Return the human-readable preview text captured during collection, if available.""" @@ -155,6 +167,8 @@ pub class DataFrame[T with Clone] with BoundedDataSet: def planned_columns(self) -> list[str]: """Return logical output columns inferred from the current relation tree before execution.""" + if len(self._planned_columns) > 0: + return [name for name in self._planned_columns] return relation_output_columns(self._substrait_rel.clone()) def columns(self) -> list[str]: @@ -162,7 +176,10 @@ pub class DataFrame[T with Clone] with BoundedDataSet: resolved = self.resolved_columns() if len(resolved) > 0: return resolved - return self.planned_columns() + planned = self.planned_columns() + if len(planned) > 0: + return planned + return self.declared_columns() def schema(self) -> CarrierSchema: """Return one combined schema view containing declared, planned, and resolved columns.""" @@ -182,59 +199,92 @@ pub class DataFrame[T with Clone] with BoundedDataSet: def filter(self, predicate: ColumnExpr) -> Self: """Return one new DataFrame with a filter stage appended and stale materialization cleared.""" + input_columns = self.columns() return _data_frame_with_invalidated_materialization( - filter_ds_of_columns(self._substrait_rel, self.planned_columns(), predicate), + filter_ds_of_columns(self._substrait_rel, input_columns, predicate), + input_columns, ) - def join(self, other: Self, on: bool) -> Self: + def join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self: """Return one new DataFrame with a join stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization(join_ds(self._substrait_rel, other._substrait_rel, on)) - - def select(self) -> Self: - """Return one new DataFrame with an identity projection stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization( - select_ds_of_columns(self._substrait_rel, self.planned_columns()), + left_columns = self.columns() + right_columns = other.columns() + rel = join_ds_of_columns( + self._substrait_rel, + left_columns, + other._substrait_rel, + right_columns, + on, + relation_name, + ) + return _data_frame_with_invalidated_materialization(rel, relation_output_columns(rel.clone())) + + def left_join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self: + """Return one new DataFrame with a left join stage and stale materialization cleared.""" + left_columns = self.columns() + right_columns = other.columns() + rel = left_join_ds_of_columns( + self._substrait_rel, + left_columns, + other._substrait_rel, + right_columns, + on, + relation_name, + ) + return _data_frame_with_invalidated_materialization(rel, relation_output_columns(rel.clone())) + + def select[U with Clone](self, assignments: list[ProjectionAssignment] = []) -> DataFrame[U]: + """Return one new DataFrame with a SELECT projection stage and stale materialization cleared.""" + input_columns = self.columns() + if len(assignments) == 0: + return _data_frame_with_invalidated_materialization[U]( + select_ds_of_columns(self._substrait_rel, input_columns), + input_columns, + ) + return _data_frame_with_invalidated_materialization[U]( + select_project_ds_of_columns(self._substrait_rel, input_columns, assignments), + select_project_output_columns(assignments), ) def with_column(self, name: str, expr: ColumnExpr) -> Self: """Return one new DataFrame with an add-or-replace projection stage and stale materialization cleared.""" + input_columns = self.columns() return _data_frame_with_invalidated_materialization( - with_column_ds(self._substrait_rel, self.planned_columns(), name, expr), + with_column_ds(self._substrait_rel, input_columns, name, expr), + project_output_columns(input_columns, [with_column_assignment(name, expr)]), ) def group_by(self, columns: list[ColumnExpr]) -> Self: """Return one new DataFrame with a grouping stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization( - group_by_ds_of_columns(self._substrait_rel, self.planned_columns(), columns), - ) + rel = group_by_ds_of_columns(self._substrait_rel, self.columns(), columns) + return _data_frame_with_invalidated_materialization(rel, relation_output_columns(rel.clone())) def agg(self, measures: list[AggregateMeasure]) -> Self: """Return one new DataFrame with an aggregation stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization( - agg_ds_of_columns(self._substrait_rel, self.planned_columns(), measures), - ) + rel = agg_ds_of_columns(self._substrait_rel, self.columns(), measures) + return _data_frame_with_invalidated_materialization(rel, relation_output_columns(rel.clone())) def generate(self, generator: GeneratorApplication) -> Self: """Return one new DataFrame with a generator stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization( - generate_ds_of_columns(self._substrait_rel, self.planned_columns(), generator), - ) + rel = generate_ds_of_columns(self._substrait_rel, self.columns(), generator) + return _data_frame_with_invalidated_materialization(rel, relation_output_columns(rel.clone())) def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self: """Return one new DataFrame with a named window projection stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization( - with_window_column_ds(self._substrait_rel, self.planned_columns(), name, application), - ) + rel = with_window_column_ds(self._substrait_rel, self.columns(), name, application) + return _data_frame_with_invalidated_materialization(rel, relation_output_columns(rel.clone())) def order_by(self, columns: list[ColumnExpr]) -> Self: """Return one new DataFrame with an ordering stage and stale materialization cleared.""" + input_columns = self.columns() return _data_frame_with_invalidated_materialization( - order_by_ds_of_columns(self._substrait_rel, self.planned_columns(), columns), + order_by_ds_of_columns(self._substrait_rel, input_columns, columns), + input_columns, ) def limit(self, n: int) -> Self: """Return one new DataFrame with a row-limit stage and stale materialization cleared.""" - return _data_frame_with_invalidated_materialization(limit_ds(self._substrait_rel, n)) + return _data_frame_with_invalidated_materialization(limit_ds(self._substrait_rel, n), self.columns()) pub class LazyFrame[T with Clone] with BoundedDataSet: @@ -261,7 +311,10 @@ pub class LazyFrame[T with Clone] with BoundedDataSet: def columns(self) -> list[str]: """Return plan-time output columns for lazy carriers.""" - return self.planned_columns() + planned = self.planned_columns() + if len(planned) > 0: + return planned + return self.declared_columns() def schema(self) -> CarrierSchema: """Return lazy schema metadata (declared/planned columns only; resolved columns are unavailable pre-collect).""" @@ -283,13 +336,17 @@ pub class LazyFrame[T with Clone] with BoundedDataSet: """Return one new lazy carrier with an appended filter stage.""" return LazyFrame(_cursor=prism_cursor_apply_filter(self._cursor, predicate)) - def join(self, other: Self, on: bool) -> Self: + def join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self: """Return one new lazy carrier with an appended join stage against another lazy carrier.""" - return LazyFrame(_cursor=prism_cursor_apply_join(self._cursor, other._cursor.clone(), on)) + return LazyFrame(_cursor=prism_cursor_apply_join(self._cursor, other._cursor.clone(), on, relation_name)) - def select(self) -> Self: - """Return one new lazy carrier with an appended identity projection stage.""" - return LazyFrame(_cursor=prism_cursor_apply_select(self._cursor)) + def left_join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self: + """Return one new lazy carrier with an appended left join stage against another lazy carrier.""" + return LazyFrame(_cursor=prism_cursor_apply_left_join(self._cursor, other._cursor.clone(), on, relation_name)) + + def select[U with Clone](self, assignments: list[ProjectionAssignment] = []) -> LazyFrame[U]: + """Return one new lazy carrier with an appended SELECT projection stage.""" + return LazyFrame(_cursor=prism_cursor_apply_select[T, U](self._cursor, assignments)) def with_column(self, name: str, expr: ColumnExpr) -> Self: """Return one new lazy carrier with an appended add-or-replace projection stage.""" @@ -334,12 +391,16 @@ def _empty_type_witness[T with Clone]() -> list[T]: return [] -def _data_frame_with_invalidated_materialization[T with Clone](rel: Rel) -> DataFrame[T]: +def _data_frame_with_invalidated_materialization[T with Clone]( + rel: Rel, + planned_columns: list[str] = [], +) -> DataFrame[T]: """Build one DataFrame whose logical relation changed and whose prior collected materialization is no longer valid.""" return DataFrame( _type_witness=_empty_type_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=rel, + _planned_columns=planned_columns, ) @@ -356,8 +417,10 @@ pub trait UnboundedDataSet[T with Clone] with DataSet[T]: pub class DataStream[T with Clone] with UnboundedDataSet: - pub _row_schema_marker: T + pub _type_witness: list[T] pub _substrait_rel: Rel + # Streaming carriers are also Rel-backed, so they need the same plan-time name sidecar as DataFrame. + pub _planned_columns: list[str] def declared_columns(self) -> list[str]: """Return source-bound declared schema columns inferred from the first reachable named-table root.""" @@ -365,11 +428,16 @@ pub class DataStream[T with Clone] with UnboundedDataSet: def planned_columns(self) -> list[str]: """Return logical output columns for the stream carrier before execution.""" + if len(self._planned_columns) > 0: + return [name for name in self._planned_columns] return relation_output_columns(self._substrait_rel.clone()) def columns(self) -> list[str]: """Return plan-time output columns for stream carriers.""" - return self.planned_columns() + planned = self.planned_columns() + if len(planned) > 0: + return planned + return self.declared_columns() def schema(self) -> CarrierSchema: """Return stream schema metadata (declared/planned columns only; resolved columns are unavailable pre-collect).""" @@ -389,103 +457,122 @@ pub class DataStream[T with Clone] with UnboundedDataSet: def filter(self, predicate: ColumnExpr) -> Self: """Return one new DataStream with a filter stage.""" + input_columns = self.columns() return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=filter_ds_of_columns( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - predicate, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=filter_ds_of_columns(self._substrait_rel, input_columns, predicate), + _planned_columns=input_columns, ) - def join(self, other: Self, on: bool) -> Self: + def join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self: """Return one new DataStream with a join stage against another stream.""" + left_columns = self.columns() + right_columns = other.columns() + rel = join_ds_of_columns( + self._substrait_rel, + left_columns, + other._substrait_rel, + right_columns, + on, + relation_name, + ) + return DataStream( + _type_witness=_empty_type_witness(), + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), + ) + + def left_join(self, other: Self, on: ColumnExpr, relation_name: str = "") -> Self: + """Return one new DataStream with a left join stage against another stream.""" + left_columns = self.columns() + right_columns = other.columns() + rel = left_join_ds_of_columns( + self._substrait_rel, + left_columns, + other._substrait_rel, + right_columns, + on, + relation_name, + ) return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=join_ds(self._substrait_rel, other._substrait_rel, on), + _type_witness=_empty_type_witness(), + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), ) - def select(self) -> Self: - """Return one new DataStream with an identity projection stage.""" + def select[U with Clone](self, assignments: list[ProjectionAssignment] = []) -> DataStream[U]: + """Return one new DataStream with a SELECT projection stage.""" + input_columns = self.columns() + if len(assignments) == 0: + return DataStream( + _type_witness=_empty_type_witness[U](), + _substrait_rel=select_ds_of_columns(self._substrait_rel, input_columns), + _planned_columns=input_columns, + ) return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=select_ds_of_columns( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - ), + _type_witness=_empty_type_witness[U](), + _substrait_rel=select_project_ds_of_columns(self._substrait_rel, input_columns, assignments), + _planned_columns=select_project_output_columns(assignments), ) def with_column(self, name: str, expr: ColumnExpr) -> Self: """Return one new DataStream with an add-or-replace projection stage.""" + input_columns = self.columns() return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=with_column_ds( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - name, - expr, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=with_column_ds(self._substrait_rel, input_columns, name, expr), + _planned_columns=project_output_columns(input_columns, [with_column_assignment(name, expr)]), ) def group_by(self, columns: list[ColumnExpr]) -> Self: """Return one new DataStream with a grouping stage.""" + rel = group_by_ds_of_columns(self._substrait_rel, self.columns(), columns) return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=group_by_ds_of_columns( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - columns, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), ) def agg(self, measures: list[AggregateMeasure]) -> Self: """Return one new DataStream with an aggregation stage.""" + rel = agg_ds_of_columns(self._substrait_rel, self.columns(), measures) return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=agg_ds_of_columns( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - measures, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), ) def generate(self, generator: GeneratorApplication) -> Self: """Return one new DataStream with a generator stage.""" + rel = generate_ds_of_columns(self._substrait_rel, self.columns(), generator) return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=generate_ds_of_columns( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - generator, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), ) def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self: """Return one new DataStream with a named window projection stage.""" + rel = with_window_column_ds(self._substrait_rel, self.columns(), name, application) return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=with_window_column_ds( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - name, - application, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), ) def order_by(self, columns: list[ColumnExpr]) -> Self: """Return one new DataStream with an ordering stage.""" + input_columns = self.columns() return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), - _substrait_rel=order_by_ds_of_columns( - self._substrait_rel, - relation_output_columns(self._substrait_rel.clone()), - columns, - ), + _type_witness=_empty_type_witness(), + _substrait_rel=order_by_ds_of_columns(self._substrait_rel, input_columns, columns), + _planned_columns=input_columns, ) def limit(self, n: int) -> Self: """Return one new DataStream with a row-limit stage.""" return DataStream( - _row_schema_marker=self._row_schema_marker.clone(), + _type_witness=_empty_type_witness(), _substrait_rel=limit_ds(self._substrait_rel, n), + _planned_columns=self.columns(), ) diff --git a/src/dataset/ops.incn b/src/dataset/ops.incn index f7aa816..5a4f33e 100644 --- a/src/dataset/ops.incn +++ b/src/dataset/ops.incn @@ -17,8 +17,11 @@ from substrait.relations import ( fetch_rel, filter_rel_of_columns, join_rel, + join_rel_of_columns, project_rel_of_columns, + select_project_rel_of_columns, sort_rel_of_columns, + SubstraitJoinKind, generator_rel_of_columns, window_rel_of_columns, ) @@ -43,7 +46,7 @@ pub def filter_ds_of_columns(rel: Rel, input_columns: list[str], predicate: Colu return filter_rel_of_columns(rel, input_columns, predicate) -pub def join_ds(left_rel: Rel, right_rel: Rel, on: bool) -> Rel: +pub def join_ds(left_rel: Rel, right_rel: Rel, on: ColumnExpr) -> Rel: """ Apply dataset-level join intent to two relations. @@ -58,6 +61,51 @@ pub def join_ds(left_rel: Rel, right_rel: Rel, on: bool) -> Rel: return join_rel(left_rel, right_rel, on) +pub def join_ds_of_columns( + left_rel: Rel, + left_columns: list[str], + right_rel: Rel, + right_columns: list[str], + on: ColumnExpr, + right_relation_name: str = "", +) -> Rel: + """Apply dataset-level inner join intent using explicit input-column names.""" + return join_rel_of_columns( + left_rel, + left_columns, + right_rel, + _qualified_right_columns(right_columns, right_relation_name), + on, + SubstraitJoinKind.Inner, + ) + + +pub def left_join_ds_of_columns( + left_rel: Rel, + left_columns: list[str], + right_rel: Rel, + right_columns: list[str], + on: ColumnExpr, + right_relation_name: str = "", +) -> Rel: + """Apply dataset-level left join intent using explicit input-column names.""" + return join_rel_of_columns( + left_rel, + left_columns, + right_rel, + _qualified_right_columns(right_columns, right_relation_name), + on, + SubstraitJoinKind.Left, + ) + + +def _qualified_right_columns(columns: list[str], relation_name: str) -> list[str]: + """Return right-side join predicate names using a relation prefix when provided.""" + if len(relation_name) == 0: + return columns + return [f"{relation_name}.{column}" for column in columns] + + pub def select_ds(rel: Rel) -> Rel: """ Apply dataset-level identity projection intent to one relation. @@ -86,6 +134,15 @@ pub def project_ds_of_columns(rel: Rel, input_columns: list[str], assignments: l return project_rel_of_columns(rel, input_columns, assignments) +pub def select_project_ds_of_columns( + rel: Rel, + input_columns: list[str], + assignments: list[ProjectionAssignment], +) -> Rel: + """Apply dataset-level exclusive projection assignments using explicit input-column names.""" + return select_project_rel_of_columns(rel, input_columns, assignments) + + pub def group_by_ds(rel: Rel, columns: list[ColumnExpr]) -> Rel: """ Apply dataset-level grouping intent to one relation. diff --git a/src/lib.incn b/src/lib.incn index 3b51204..a192466 100644 --- a/src/lib.incn +++ b/src/lib.incn @@ -15,10 +15,11 @@ pub from dataset.ops import ( join_ds, limit_ds, order_by_ds, + select_project_ds_of_columns, select_ds, with_window_column_ds, ) -pub from aggregate_builders import AggregateKind, AggregateMeasure +pub from aggregate_builders import AggregateKind, AggregateMeasure, aggregate_as pub from sketch_types import ( SKETCH_FAMILY_OPTION, SKETCH_FORMAT_OPTION, @@ -89,6 +90,8 @@ pub from projection_builders import ( column_expr_kind, column_expr_name, column_expr_option_value, + select_project_output_columns, + with_column_assignment, ) pub from functions.registry import function_registry pub from functions import ( @@ -426,6 +429,7 @@ pub from substrait.relations import ( generator_rel, generator_rel_of_columns, join_rel, + join_rel_of_columns, join_rel_of_kind, project_rel, read_local_files_rel, @@ -434,6 +438,7 @@ pub from substrait.relations import ( reference_rel, set_rel, set_rel_of_kind, + select_project_rel_of_columns, sort_rel, sort_rel_of_columns, window_rel, diff --git a/src/prism/lower.incn b/src/prism/lower.incn index 06cf205..53609b6 100644 --- a/src/prism/lower.incn +++ b/src/prism/lower.incn @@ -2,22 +2,24 @@ from rust::substrait::proto import Plan, Rel from rust::incan_stdlib::errors import raise_value_error -from prism.output_columns import rewritten_output_schema +from prism.output_columns import rewritten_output_columns, rewritten_output_schema from substrait.plans import plan_from_root_relation from substrait.relations import ( + SubstraitJoinKind, fetch_rel, - join_rel, read_named_table_rel, try_sort_rel_for_columns, try_aggregate_rel_for_columns, try_filter_rel_for_columns, try_generator_rel_for_columns, try_project_rel_for_columns, + try_join_rel_of_columns, + try_select_project_rel_for_columns, try_window_rel_for_columns, ) from substrait.errors import SubstraitLoweringError from prism.rewrite import derive_rewritten_view, rewritten_node_at -from prism.types import PrismNodeKind, PrismOptimizedView, PrismStoreId +from prism.types import PrismJoinKind, PrismNodeKind, PrismOptimizedView, PrismStoreId pub def lower_prism_tip(store_id: PrismStoreId, tip_id: int) -> Rel: @@ -83,12 +85,13 @@ def _try_lower_node(view: PrismOptimizedView, node_id: int) -> Result[Rel, Subst node.filter_predicate, ) PrismNodeKind.Join => - return Ok( - join_rel( - _try_lower_node(view, node.input_ids[0])?, - _try_lower_node(view, node.input_ids[1])?, - node.join_predicate, - ), + return try_join_rel_of_columns( + _try_lower_node(view, node.input_ids[0])?, + rewritten_output_columns(view, node.input_ids[0]), + _try_lower_node(view, node.input_ids[1])?, + _right_join_predicate_columns(rewritten_output_columns(view, node.input_ids[1]), node.join_relation_name), + node.join_predicate, + _substrait_join_kind(node.join_kind), ) PrismNodeKind.Project => return try_project_rel_for_columns( @@ -96,6 +99,12 @@ def _try_lower_node(view: PrismOptimizedView, node_id: int) -> Result[Rel, Subst rewritten_output_schema(view, node.input_ids[0]), node.projection_assignments, ) + PrismNodeKind.SelectProject => + return try_select_project_rel_for_columns( + _try_lower_node(view, node.input_ids[0])?, + rewritten_output_schema(view, node.input_ids[0]), + node.projection_assignments, + ) PrismNodeKind.GroupBy => return try_aggregate_rel_for_columns( _try_lower_node(view, node.input_ids[0])?, @@ -144,3 +153,17 @@ def _try_lower_node(view: PrismOptimizedView, node_id: int) -> Result[Rel, Subst node.sort_columns, ) PrismNodeKind.Limit => return Ok(fetch_rel(_try_lower_node(view, node.input_ids[0])?, 0, node.limit_count)) + + +def _right_join_predicate_columns(columns: list[str], relation_name: str) -> list[str]: + """Return right-side join columns using relation-qualified names when a query relation alias is available.""" + if len(relation_name) == 0: + return columns + return [f"{relation_name}.{column}" for column in columns] + + +def _substrait_join_kind(kind: PrismJoinKind) -> SubstraitJoinKind: + """Map Prism logical join variants to Substrait join variants.""" + match kind: + PrismJoinKind.Left => return SubstraitJoinKind.Left + _ => return SubstraitJoinKind.Inner diff --git a/src/prism/mod.incn b/src/prism/mod.incn index 365547f..716b3d3 100644 --- a/src/prism/mod.incn +++ b/src/prism/mod.incn @@ -37,7 +37,14 @@ from prism.store import ( reachable_node_ids, store_nodes, ) -from prism.types import authored_node_kind_name, PrismNodeKind, PrismOptimizedView, PrismRewriteExplain, PrismStoreId +from prism.types import ( + authored_node_kind_name, + PrismJoinKind, + PrismNodeKind, + PrismOptimizedView, + PrismRewriteExplain, + PrismStoreId, +) from substrait.errors import SubstraitLoweringError @@ -64,36 +71,36 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Filter, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=predicate, - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) - def join(self, other: Self, predicate: bool) -> Self: + def join(self, other: PrismCursor[T], predicate: ColumnExpr, relation_name: str = "") -> Self: """Append one join against another cursor, adopting foreign authored state only when stores differ.""" + return self._join_with_kind(other, predicate, relation_name, PrismJoinKind.Inner) + + def left_join(self, other: PrismCursor[T], predicate: ColumnExpr, relation_name: str = "") -> Self: + """Append one left join against another cursor.""" + return self._join_with_kind(other, predicate, relation_name, PrismJoinKind.Left) + + def _join_with_kind( + self, + other: PrismCursor[T], + predicate: ColumnExpr, + relation_name: str, + kind: PrismJoinKind, + ) -> Self: + """Append one typed join node, adopting foreign authored state only when stores differ.""" if self.store_id.0 == other.store_id.0: next_tip_id = append_node( store_id=self.store_id, kind=PrismNodeKind.Join, input_ids=[self.tip_id, other.tip_id], - named_table="", + join_kind=kind, + join_relation_name=relation_name, join_predicate=predicate, filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) adoption = adopt_cursor_subgraph(self.store_id, other.store_id, other.tip_id) @@ -101,35 +108,31 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Join, input_ids=[self.tip_id, adoption.adopted_tip_id], - named_table="", + join_kind=kind, + join_relation_name=relation_name, join_predicate=predicate, filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) - def select(self) -> Self: - """Append one identity projection node and return the derived tip.""" + def select[U with Clone](self, assignments: list[ProjectionAssignment] = []) -> PrismCursor[U]: + """Append one SELECT projection node and return the derived tip.""" + if len(assignments) > 0: + next_tip_id = append_node( + store_id=self.store_id, + kind=PrismNodeKind.SelectProject, + input_ids=[self.tip_id], + join_predicate=always_true(), + filter_predicate=always_true(), + projection_assignments=assignments, + ) + return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) next_tip_id = append_node( store_id=self.store_id, kind=PrismNodeKind.Project, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -139,15 +142,8 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Project, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], projection_assignments=[with_column_assignment(name, expr)], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -158,16 +154,9 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.GroupBy, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, group_columns=columns, - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -177,16 +166,9 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Aggregate, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], aggregate_measures=measures, - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -196,16 +178,9 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.OrderBy, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], sort_columns=columns, - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -215,16 +190,9 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Limit, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), limit_count=n, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -234,16 +202,9 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Generate, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], generator_applications=[generator], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -253,16 +214,9 @@ pub class PrismCursor[T with Clone]: store_id=self.store_id, kind=PrismNodeKind.Window, input_ids=[self.tip_id], - named_table="", - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], window_projections=[window_projection(name, application)], - projection_assignments=[], ) return PrismCursor(store_id=self.store_id, tip_id=next_tip_id, _type_marker=[]) @@ -297,17 +251,9 @@ pub def prism_cursor_named_table[T with Clone](table_name: str) -> PrismCursor[T tip_id = append_node( store_id=store_id, kind=PrismNodeKind.ReadNamedTable, - input_ids=[], named_table=table_name, - join_predicate=false, + join_predicate=always_true(), filter_predicate=always_true(), - limit_count=0, - group_columns=[], - sort_columns=[], - aggregate_measures=[], - generator_applications=[], - window_projections=[], - projection_assignments=[], ) return PrismCursor(store_id=store_id, tip_id=tip_id, _type_marker=[]) @@ -320,15 +266,29 @@ pub def prism_cursor_apply_filter[T with Clone](cursor: PrismCursor[T], predicat pub def prism_cursor_apply_join[T with Clone]( left: PrismCursor[T], right: PrismCursor[T], - predicate: bool, + predicate: ColumnExpr, + relation_name: str = "", ) -> PrismCursor[T]: """Apply dataset-level join intent while preserving same-store cheapness and cross-store correctness.""" - return left.join(right, predicate) + return left.join(right, predicate, relation_name) -pub def prism_cursor_apply_select[T with Clone](cursor: PrismCursor[T]) -> PrismCursor[T]: - """Apply dataset-level identity projection intent through Prism.""" - return cursor.select() +pub def prism_cursor_apply_left_join[T with Clone]( + left: PrismCursor[T], + right: PrismCursor[T], + predicate: ColumnExpr, + relation_name: str = "", +) -> PrismCursor[T]: + """Apply dataset-level left join intent through Prism.""" + return left.left_join(right, predicate, relation_name) + + +pub def prism_cursor_apply_select[T with Clone, U with Clone]( + cursor: PrismCursor[T], + assignments: list[ProjectionAssignment] = [], +) -> PrismCursor[U]: + """Apply dataset-level SELECT projection intent through Prism.""" + return cursor.select[U](assignments) pub def prism_cursor_apply_with_column[T with Clone]( diff --git a/src/prism/output_columns.incn b/src/prism/output_columns.incn index 07d1a13..6f2c631 100644 --- a/src/prism/output_columns.incn +++ b/src/prism/output_columns.incn @@ -48,6 +48,11 @@ pub def authored_output_schema(store_id: PrismStoreId, tip_id: int) -> list[Scal return authored_output_schema(store_id, node.input_ids[0]) if node.kind == PrismNodeKind.Project: return _project_output_schema(authored_output_schema(store_id, node.input_ids[0]), node.projection_assignments) + if node.kind == PrismNodeKind.SelectProject: + return _select_project_output_schema( + authored_output_schema(store_id, node.input_ids[0]), + node.projection_assignments, + ) if node.kind == PrismNodeKind.Generate: return _generator_output_schema( authored_output_schema(store_id, node.input_ids[0]), @@ -57,7 +62,10 @@ pub def authored_output_schema(store_id: PrismStoreId, tip_id: int) -> list[Scal return _window_output_schema(authored_output_schema(store_id, node.input_ids[0]), node.window_projections) if node.kind == PrismNodeKind.Join: lhs_columns = authored_output_schema(store_id, node.input_ids[0]) - rhs_columns = authored_output_schema(store_id, node.input_ids[1]) + rhs_columns = _right_join_output_schema( + authored_output_schema(store_id, node.input_ids[1]), + node.join_relation_name, + ) mut columns: list[ScalarColumnSpec] = [] columns.extend(lhs_columns) columns.extend(rhs_columns) @@ -87,13 +95,21 @@ pub def rewritten_output_schema(view: PrismOptimizedView, node_id: int) -> list[ return rewritten_output_schema(view, node.input_ids[0]) if node.kind == PrismNodeKind.Project: return _project_output_schema(rewritten_output_schema(view, node.input_ids[0]), node.projection_assignments) + if node.kind == PrismNodeKind.SelectProject: + return _select_project_output_schema( + rewritten_output_schema(view, node.input_ids[0]), + node.projection_assignments, + ) if node.kind == PrismNodeKind.Generate: return _generator_output_schema(rewritten_output_schema(view, node.input_ids[0]), node.generator_applications[0]) if node.kind == PrismNodeKind.Window: return _window_output_schema(rewritten_output_schema(view, node.input_ids[0]), node.window_projections) if node.kind == PrismNodeKind.Join: lhs_columns = rewritten_output_schema(view, node.input_ids[0]) - rhs_columns = rewritten_output_schema(view, node.input_ids[1]) + rhs_columns = _right_join_output_schema( + rewritten_output_schema(view, node.input_ids[1]), + node.join_relation_name, + ) mut columns: list[ScalarColumnSpec] = [] columns.extend(lhs_columns) columns.extend(rhs_columns) @@ -126,6 +142,25 @@ def _project_output_schema( return output_columns +def _select_project_output_schema( + input_columns: list[ScalarColumnSpec], + assignments: list[ProjectionAssignment], +) -> list[ScalarColumnSpec]: + """Return exclusive select output schema facts while honoring lateral aliases.""" + mut bindings: list[ScalarColumnSpec] = [] + bindings.extend(input_columns) + mut output_columns: list[ScalarColumnSpec] = [] + for assignment in assignments: + output = _expression_output_schema(bindings, assignment.expr, assignment.output_name) + output_columns.append(output) + existing_idx = _index_of_column(bindings, assignment.output_name) + if existing_idx >= 0: + bindings[existing_idx] = output + else: + bindings.append(output) + return output_columns + + def _expression_output_schema( input_columns: list[ScalarColumnSpec], expr: ColumnExpr, @@ -185,6 +220,13 @@ def _window_output_schema( return output_columns +def _right_join_output_schema(columns: list[ScalarColumnSpec], relation_name: str) -> list[ScalarColumnSpec]: + """Qualify right-side join output schema names when query lowering supplied a relation alias.""" + if len(relation_name) == 0: + return columns + return [ScalarColumnSpec(name=f"{relation_name}.{column.name}", kind=column.kind, nullable=column.nullable) for column in columns] + + def _authored_aggregate_prefix_schema(store_id: PrismStoreId, node: PrismNode) -> list[ScalarColumnSpec]: """Return grouping-prefix schema facts that survive into one authored aggregate node output.""" if len(node.group_columns) > 0: diff --git a/src/prism/rewrite.incn b/src/prism/rewrite.incn index 58d3eda..5ecce62 100644 --- a/src/prism/rewrite.incn +++ b/src/prism/rewrite.incn @@ -1,7 +1,7 @@ """Prism optimized-view derivation, canonical rewrite rules, and explain artifacts.""" from filter_builders import always_true -from prism.types import PrismNode, PrismNodeKind, PrismOptimizedView, PrismRewriteExplain, PrismStoreId +from prism.types import PrismJoinKind, PrismNode, PrismNodeKind, PrismOptimizedView, PrismRewriteExplain, PrismStoreId from prism.store import node_at, reachable_node_ids, remap_input_ids, store_nodes, window_specs_structurally_equal from projection_builders import is_bool_literal_expr from window_builders import WindowProjection @@ -181,7 +181,9 @@ def _build_collapsed_limit_node( kind=PrismNodeKind.Limit, input_ids=[limit_input.input_ids[0]], named_table=str(""), - join_predicate=false, + join_kind=PrismJoinKind.Inner, + join_relation_name=str(""), + join_predicate=always_true(), filter_predicate=always_true(), limit_count=_min_int(limit_input.limit_count, node.limit_count), group_columns=[], @@ -219,7 +221,9 @@ def _build_collapsed_project_node( kind=PrismNodeKind.Project, input_ids=[project_input.input_ids[0]], named_table=str(""), - join_predicate=false, + join_kind=PrismJoinKind.Inner, + join_relation_name=str(""), + join_predicate=always_true(), filter_predicate=always_true(), limit_count=0, group_columns=[], @@ -264,7 +268,9 @@ def _build_collapsed_aggregate_node( kind=PrismNodeKind.Aggregate, input_ids=[aggregate_input.input_ids[0]], named_table=str(""), - join_predicate=false, + join_kind=PrismJoinKind.Inner, + join_relation_name=str(""), + join_predicate=always_true(), filter_predicate=always_true(), limit_count=0, group_columns=aggregate_input.group_columns, @@ -289,7 +295,9 @@ def _build_fused_group_by_aggregate_node( kind=PrismNodeKind.Aggregate, input_ids=[group_input.input_ids[0]], named_table=str(""), - join_predicate=false, + join_kind=PrismJoinKind.Inner, + join_relation_name=str(""), + join_predicate=always_true(), filter_predicate=always_true(), limit_count=0, group_columns=group_input.group_columns, @@ -325,7 +333,9 @@ def _build_collapsed_order_by_node( kind=PrismNodeKind.OrderBy, input_ids=[order_input.input_ids[0]], named_table=str(""), - join_predicate=false, + join_kind=PrismJoinKind.Inner, + join_relation_name=str(""), + join_predicate=always_true(), filter_predicate=always_true(), limit_count=0, group_columns=[], @@ -366,7 +376,9 @@ def _build_collapsed_window_node( kind=PrismNodeKind.Window, input_ids=[window_input.input_ids[0]], named_table=str(""), - join_predicate=false, + join_kind=PrismJoinKind.Inner, + join_relation_name=str(""), + join_predicate=always_true(), filter_predicate=always_true(), limit_count=0, group_columns=[], @@ -399,6 +411,8 @@ def _build_rewritten_node(node: PrismNode, remapped_inputs: list[int], rewritten kind=node.kind, input_ids=remapped_inputs, named_table=node.named_table, + join_kind=node.join_kind, + join_relation_name=node.join_relation_name, join_predicate=node.join_predicate, filter_predicate=node.filter_predicate.clone(), limit_count=node.limit_count, @@ -446,6 +460,8 @@ def _compact_optimized_view(view: PrismOptimizedView) -> PrismOptimizedView: kind=old_node.kind, input_ids=remapped_inputs, named_table=old_node.named_table, + join_kind=old_node.join_kind, + join_relation_name=old_node.join_relation_name, join_predicate=old_node.join_predicate, filter_predicate=old_node.filter_predicate.clone(), limit_count=old_node.limit_count, diff --git a/src/prism/store.incn b/src/prism/store.incn index fe40abc..b4ce564 100644 --- a/src/prism/store.incn +++ b/src/prism/store.incn @@ -16,7 +16,7 @@ from projection_builders import ( StringLiteralExpr, untyped_column_expr, ) -from prism.types import PrismNode, PrismNodeKind, PrismStoreAdoption, PrismStoreId +from prism.types import PrismJoinKind, PrismNode, PrismNodeKind, PrismStoreAdoption, PrismStoreId model PrismStoredNode: @@ -50,17 +50,19 @@ pub def allocate_prism_store_id() -> PrismStoreId: pub def append_node( store_id: PrismStoreId, kind: PrismNodeKind, - input_ids: list[int], - named_table: str, - join_predicate: bool, + join_predicate: ColumnExpr, filter_predicate: ColumnExpr, - limit_count: int, - group_columns: list[ColumnExpr], - sort_columns: list[ColumnExpr], - aggregate_measures: list[AggregateMeasure], - generator_applications: list[GeneratorApplication], - window_projections: list[WindowProjection], - projection_assignments: list[ProjectionAssignment], + input_ids: list[int] = [], + named_table: str = "", + join_kind: PrismJoinKind = PrismJoinKind.Inner, + join_relation_name: str = "", + limit_count: int = 0, + group_columns: list[ColumnExpr] = [], + sort_columns: list[ColumnExpr] = [], + aggregate_measures: list[AggregateMeasure] = [], + generator_applications: list[GeneratorApplication] = [], + window_projections: list[WindowProjection] = [], + projection_assignments: list[ProjectionAssignment] = [], ) -> int: """ Append one immutable node to the target store. @@ -74,6 +76,8 @@ pub def append_node( kind=kind, input_ids=input_ids, named_table=named_table, + join_kind=join_kind, + join_relation_name=join_relation_name, join_predicate=join_predicate, filter_predicate=filter_predicate, limit_count=limit_count, @@ -177,6 +181,8 @@ pub def adopt_cursor_subgraph( kind=source_node.kind, input_ids=remapped_input_ids, named_table=source_node.named_table, + join_kind=source_node.join_kind, + join_relation_name=source_node.join_relation_name, join_predicate=source_node.join_predicate, filter_predicate=source_node.filter_predicate.clone(), limit_count=source_node.limit_count, @@ -193,6 +199,8 @@ pub def adopt_cursor_subgraph( kind=source_node.kind, input_ids=remapped_input_ids, named_table=source_node.named_table, + join_kind=source_node.join_kind, + join_relation_name=source_node.join_relation_name, join_predicate=source_node.join_predicate, filter_predicate=source_node.filter_predicate.clone(), limit_count=source_node.limit_count, @@ -272,7 +280,11 @@ def _nodes_structurally_equal(candidate: PrismNode, source_node: PrismNode, rema return false if candidate.named_table != source_node.named_table: return false - if candidate.join_predicate != source_node.join_predicate: + if candidate.join_kind != source_node.join_kind: + return false + if candidate.join_relation_name != source_node.join_relation_name: + return false + if not _filter_predicates_structurally_equal(candidate.join_predicate, source_node.join_predicate): return false if not _filter_predicates_structurally_equal(candidate.filter_predicate, source_node.filter_predicate): return false @@ -323,6 +335,10 @@ def _aggregate_measures_structurally_equal(left: AggregateMeasure, right: Aggreg return false if left.has_filter != right.has_filter: return false + if left.has_output_name != right.has_output_name: + return false + if left.output_name != right.output_name: + return false if not _column_exprs_structurally_equal(left.expr, right.expr): return false if not _column_expr_lists_structurally_equal(left.arguments, right.arguments): diff --git a/src/prism/types.incn b/src/prism/types.incn index 0750b9d..8379860 100644 --- a/src/prism/types.incn +++ b/src/prism/types.incn @@ -17,6 +17,7 @@ pub enum PrismNodeKind(str): Filter = "Filter" Join = "Join" Project = "Project" + SelectProject = "SelectProject" GroupBy = "GroupBy" Aggregate = "Aggregate" Generate = "Generate" @@ -25,6 +26,14 @@ pub enum PrismNodeKind(str): Limit = "Limit" +@derive(Clone) +pub enum PrismJoinKind(str): + """Logical join variants Prism carries before Substrait lowering.""" + + Inner = "inner" + Left = "left" + + @derive(Clone) pub model PrismNode: """ @@ -38,7 +47,9 @@ pub model PrismNode: pub kind: PrismNodeKind pub input_ids: list[int] pub named_table: str - pub join_predicate: bool + pub join_kind: PrismJoinKind + pub join_relation_name: str + pub join_predicate: ColumnExpr pub filter_predicate: ColumnExpr pub limit_count: int pub group_columns: list[ColumnExpr] diff --git a/src/projection_builders.incn b/src/projection_builders.incn index b3d7784..ba3d77f 100644 --- a/src/projection_builders.incn +++ b/src/projection_builders.incn @@ -335,6 +335,11 @@ pub def project_output_columns(input_columns: list[str], assignments: list[Proje return output_columns +pub def select_project_output_columns(assignments: list[ProjectionAssignment]) -> list[str]: + """Return output column names for an exclusive projection assignment list.""" + return [assignment.output_name for assignment in assignments] + + def _index_of_column(columns: list[str], name: str) -> int: """Return the first matching column index for `name`, or `-1` when absent.""" for idx, column_name in enumerate(columns): diff --git a/src/session/datafusion_backend.incn b/src/session/datafusion_backend.incn index 62e0f4a..9bbd2ec 100644 --- a/src/session/datafusion_backend.incn +++ b/src/session/datafusion_backend.incn @@ -31,7 +31,6 @@ from rust::substrait::proto::expression::window_function import Bound, BoundsTyp from rust::substrait::proto::expression::window_function::bound import Kind as BoundKind from rust::substrait::proto::function_argument import ArgType from rust::substrait::proto::rel import RelType -from rust::substrait::proto::join_rel import JoinType from rust::substrait::proto::sort_field import SortDirection, SortKind from rust::datafusion::common import Column, ScalarValue, UnnestOptions from rust::datafusion::arrow::datatypes import DataType as ArrowDataType @@ -250,10 +249,32 @@ async def _dataframe_from_plan(ctx: SessionContext, plan: Plan) -> Result[RustDa """Build a DataFusion DataFrame, including InQL-owned relation bridges.""" root = root_rel(plan.clone()) match _window_rel(root.clone()): - Some(window_rel) => return await _dataframe_from_window_rel(ctx, window_rel, root_names(plan.clone())) + Some(_) => return await _dataframe_from_window_root_plan(ctx, plan) None => return await _dataframe_from_non_window_plan(ctx, plan) +async def _dataframe_from_window_root_plan(ctx: SessionContext, plan: Plan) -> Result[RustDataFrame, BackendError]: + """Build a DataFusion DataFrame for plans whose semantic root is a window relation.""" + mut rewritten_root = root_rel(plan.clone()) + materializations = _collect_materialized_generator_relations(rewritten_root.clone(), "root") + if len(materializations) > 0: + for materialization in materializations: + table_name = _materialized_generator_table_name(materialization.path) + df = await _dataframe_from_generator_extension(ctx.clone(), materialization.extension)? + await _register_materialized_dataframe(ctx.clone(), table_name, df)? + rewritten_root = _replace_materialized_generator_relations(rewritten_root, "root") + + match _window_rel(rewritten_root): + Some(window_rel) => return await _dataframe_from_window_rel(ctx, window_rel, root_names(plan)) + None => + return Err( + backend_error( + BackendErrorKind.BackendPlanningError, + "window root planning lost the window relation during adapter materialization", + ), + ) + + async def _dataframe_from_non_window_plan(ctx: SessionContext, plan: Plan) -> Result[RustDataFrame, BackendError]: """Build a DataFusion DataFrame for plans that do not have a window root.""" root = root_rel(plan.clone()) @@ -483,6 +504,11 @@ def _materialization_child_paths(rel: Rel, path: str) -> list[RelationChildPath] if let Some(child) = fetch.input: return [RelationChildPath(rel=child.as_ref().clone(), path=f"{path}_fetch")] return [] + Some(RelType.Window(window_rel)) => + window = window_rel.as_ref().clone() + if let Some(child) = window.input: + return [RelationChildPath(rel=child.as_ref().clone(), path=f"{path}_window")] + return [] Some(RelType.Set(set_rel)) => mut children: list[RelationChildPath] = [] for idx, input in enumerate(set_rel.inputs): @@ -530,6 +556,8 @@ def _replace_materialized_generator_children(rel: Rel, path: str) -> Rel: return _replace_aggregate_generator_child(rel, aggregate_rel.as_ref().clone(), path) Some(RelType.Sort(sort_rel)) => return _replace_sort_generator_child(rel, sort_rel.as_ref().clone(), path) Some(RelType.Fetch(fetch_rel)) => return _replace_fetch_generator_child(rel, fetch_rel.as_ref().clone(), path) + Some(RelType.Window(window_rel)) => + return _replace_window_generator_child(rel, window_rel.as_ref().clone(), path) Some(RelType.Set(set_rel)) => return _replace_set_generator_children(set_rel, path) Some(RelType.ExtensionSingle(extension_rel)) => return _replace_extension_generator_child(rel, extension_rel.as_ref().clone(), path) @@ -591,7 +619,6 @@ def _replace_join_generator_children(join: JoinRel, path: str) -> Rel: left = Some(_rewritten_generator_child(child, f"{path}_join_left")) if let Some(child) = right: right = Some(_rewritten_generator_child(child, f"{path}_join_right")) - # Public InQL joins currently lower only inner joins; preserve that contract while rewriting child inputs. return Rel( rel_type=Some( RelType.Join( @@ -602,7 +629,7 @@ def _replace_join_generator_children(join: JoinRel, path: str) -> Rel: right=right, expression=join.expression, post_join_filter=join.post_join_filter, - type=JoinType.Inner.into(), + type=join.type, advanced_extension=join.advanced_extension, ), ), @@ -696,6 +723,29 @@ def _replace_fetch_generator_child(original: Rel, fetch: FetchRel, path: str) -> None => return original +def _replace_window_generator_child(original: Rel, window: ConsistentPartitionWindowRel, path: str) -> Rel: + """Rebuild a WindowRel with its input child rewritten when present.""" + match window.input: + Some(child) => + return Rel( + rel_type=Some( + RelType.Window( + Box.new( + ConsistentPartitionWindowRel( + common=window.common, + input=Some(_rewritten_generator_child(child, f"{path}_window")), + window_functions=window.window_functions, + partition_expressions=window.partition_expressions, + sorts=window.sorts, + advanced_extension=window.advanced_extension, + ), + ), + ), + ), + ) + None => return original + + def _replace_set_generator_children(set_rel: SetRel, path: str) -> Rel: """Rebuild a SetRel with every input child rewritten.""" mut inputs: list[Rel] = [] @@ -865,7 +915,7 @@ def _replace_join_window_children(join: JoinRel, path: str) -> Rel: right=right, expression=join.expression, post_join_filter=join.post_join_filter, - type=JoinType.Inner.into(), + type=join.type, advanced_extension=join.advanced_extension, ), ), @@ -1050,6 +1100,16 @@ def _register_materialized_schema_from_batch(table_name: str, batch: RecordBatch def _substrait_kind_for_arrow_type_name(type_name: str) -> SubstraitPrimitiveKind: """Map the current DataFusion primitive output type names into InQL's minimal Substrait schema kinds.""" + if type_name.starts_with("List(") or type_name.starts_with("LargeList("): + if type_name.contains("Int64") or type_name.contains("Int32") or type_name.contains("UInt64") or type_name.contains( + "UInt32", + ): + return SubstraitPrimitiveKind.I64List + if type_name.contains("Float64") or type_name.contains("Float32"): + return SubstraitPrimitiveKind.F64List + if type_name.contains("Boolean"): + return SubstraitPrimitiveKind.BoolList + return SubstraitPrimitiveKind.StringList if type_name == "Int64" or type_name == "Int32" or type_name == "UInt64" or type_name == "UInt32": return SubstraitPrimitiveKind.I64 if type_name == "Float64" or type_name == "Float32": diff --git a/src/session/types.incn b/src/session/types.incn index 2bb529a..86ee38c 100644 --- a/src/session/types.incn +++ b/src/session/types.incn @@ -32,7 +32,7 @@ from session.backend_dispatch import backend_collect_plan, backend_execute_plan, from session.backend_types import BackendError, BackendErrorKind, BackendRegistration from substrait.errors import SubstraitLoweringError, SubstraitLoweringErrorKind from substrait.schema_registry import register_named_table_schema -from substrait.inspect import root_rel, read_named_table_name +from substrait.inspect import relation_output_columns, root_rel, read_named_table_name @derive(Clone) @@ -182,7 +182,12 @@ pub class Session: match backend_collect_plan(self._backend, _to_backend_registrations(self._registrations), plan): Ok(materialization) => return Ok( - DataFrame(_type_witness=_empty_type_witness(), _materialization=materialization, _substrait_rel=rel), + DataFrame( + _type_witness=_empty_type_witness(), + _materialization=materialization, + _substrait_rel=rel, + _planned_columns=relation_output_columns(rel.clone()), + ), ) Err(err) => return Err(_session_error_from_backend_error(err)) diff --git a/src/substrait/expr_lowering.incn b/src/substrait/expr_lowering.incn index b93e0a7..06db5b0 100644 --- a/src/substrait/expr_lowering.incn +++ b/src/substrait/expr_lowering.incn @@ -478,14 +478,18 @@ def _function_input_type_name(kind: FunctionInputType) -> str: FunctionInputType.Temporal => return "temporal" -def _schema_kind_scalar_type(kind: SubstraitPrimitiveKind) -> FunctionInputType: - """Convert one concrete schema kind into the matching scalar type category.""" +def _schema_kind_scalar_type(kind: SubstraitPrimitiveKind) -> Option[FunctionInputType]: + """Convert one scalar schema kind into the matching scalar type category when possible.""" match kind: - SubstraitPrimitiveKind.Bool => return FunctionInputType.Bool - SubstraitPrimitiveKind.I64 => return FunctionInputType.Int - SubstraitPrimitiveKind.F64 => return FunctionInputType.Float - SubstraitPrimitiveKind.Timestamp => return FunctionInputType.Timestamp - SubstraitPrimitiveKind.String => return FunctionInputType.String + SubstraitPrimitiveKind.Bool => return Some(FunctionInputType.Bool) + SubstraitPrimitiveKind.I64 => return Some(FunctionInputType.Int) + SubstraitPrimitiveKind.F64 => return Some(FunctionInputType.Float) + SubstraitPrimitiveKind.Timestamp => return Some(FunctionInputType.Timestamp) + SubstraitPrimitiveKind.String => return Some(FunctionInputType.String) + SubstraitPrimitiveKind.BoolList => return None + SubstraitPrimitiveKind.I64List => return None + SubstraitPrimitiveKind.F64List => return None + SubstraitPrimitiveKind.StringList => return None def _argument_kind_context(bindings: list[ResolvedProjectionBinding], expr: ColumnExpr) -> str: @@ -547,7 +551,7 @@ def _known_expression_scalar_type( ScalarFunctionApplicationExpr(_) => return Ok(None) _ => if let Some(kind) = _known_expression_kind(bindings, expr)?: - return Ok(Some(_schema_kind_scalar_type(kind))) + return Ok(_schema_kind_scalar_type(kind)) return Ok(None) @@ -766,6 +770,56 @@ pub def lower_project_for_columns( ) +pub def lower_select_project( + input_columns: list[str], + assignments: list[ProjectionAssignment], +) -> Result[LoweredProject, SubstraitLoweringError]: + """Lower one exclusive projection, preserving lateral aliases for later expressions in the same list.""" + return lower_select_project_for_columns(scalar_columns_from_names(input_columns), assignments) + + +pub def lower_select_project_for_columns( + input_columns: list[ScalarColumnSpec], + assignments: list[ProjectionAssignment], +) -> Result[LoweredProject, SubstraitLoweringError]: + """Lower one exclusive projection against typed-or-name-only input column facts.""" + input_column_count = len(input_columns) + mut bindings = _resolved_projection_bindings_for_column_specs(input_columns) + mut expressions: list[ResolvedProjectExpression] = [] + mut mapping_indexes: list[int] = [] + for assignment in assignments: + resolved_expr = _resolved_projection_expr(bindings, assignment.expr)? + output_kind = _known_expression_kind(bindings, assignment.expr)? + expressions.append(ResolvedProjectExpression(expr=resolved_expr.clone())) + output_index = input_column_count + len(expressions) - 1 + mapping_indexes.append(output_index) + existing_idx = _index_of_binding(bindings, assignment.output_name) + if existing_idx >= 0: + bindings[existing_idx] = ResolvedProjectionBinding( + name=assignment.output_name, + expr=resolved_expr, + output_index=output_index, + kind=output_kind, + nullable=true, + ) + else: + bindings.append( + ResolvedProjectionBinding( + name=assignment.output_name, + expr=resolved_expr, + output_index=output_index, + kind=output_kind, + nullable=true, + ), + ) + return Ok( + LoweredProject( + expressions=[resolved.expr.clone() for resolved in expressions], + output_mapping=[to_rust_i32(index) for index in mapping_indexes], + ), + ) + + pub def scalar_expr(input_columns: list[str], expr: ColumnExpr) -> Result[Expression, SubstraitLoweringError]: """Lower one scalar-expression builder against an input-column list.""" return _resolved_projection_expr(_resolved_projection_bindings_for_input_columns(input_columns), expr) diff --git a/src/substrait/extensions.incn b/src/substrait/extensions.incn index 34e49fb..06aecac 100644 --- a/src/substrait/extensions.incn +++ b/src/substrait/extensions.incn @@ -414,6 +414,14 @@ def _rel_uses_scalar_function_anchor(rel: Rel, expected_anchor: u32) -> bool: for window_function in window_rel.window_functions: if _window_function_uses_scalar_function_anchor(window_function, expected_anchor): return true + Some(RelType.Join(join_rel)) => + join = join_rel.as_ref().clone() + if let Some(condition) = join.expression: + if _expr_uses_scalar_function_anchor(condition.as_ref().clone(), expected_anchor): + return true + if let Some(condition) = join.post_join_filter: + if _expr_uses_scalar_function_anchor(condition.as_ref().clone(), expected_anchor): + return true Some(RelType.ExtensionSingle(extension_rel)) => if _generator_extension_uses_scalar_function_anchor(extension_rel.as_ref().clone(), expected_anchor): return true @@ -457,6 +465,14 @@ def _rel_uses_if_then(rel: Rel) -> bool: for window_function in window_rel.window_functions: if _window_function_uses_if_then(window_function): return true + Some(RelType.Join(join_rel)) => + join = join_rel.as_ref().clone() + if let Some(condition) = join.expression: + if _expr_uses_if_then(condition.as_ref().clone()): + return true + if let Some(condition) = join.post_join_filter: + if _expr_uses_if_then(condition.as_ref().clone()): + return true Some(RelType.ExtensionSingle(extension_rel)) => if _generator_extension_uses_if_then(extension_rel.as_ref().clone()): return true diff --git a/src/substrait/mod.incn b/src/substrait/mod.incn index 48e4b33..524e62c 100644 --- a/src/substrait/mod.incn +++ b/src/substrait/mod.incn @@ -29,6 +29,7 @@ pub from substrait.relations import ( generator_rel, generator_rel_of_columns, join_rel, + join_rel_of_columns, join_rel_of_kind, project_rel, project_rel_of_columns, @@ -40,6 +41,7 @@ pub from substrait.relations import ( reference_rel, set_rel, set_rel_of_kind, + select_project_rel_of_columns, sort_rel, sort_rel_of_columns, ) diff --git a/src/substrait/relations.incn b/src/substrait/relations.incn index 084d228..172828d 100644 --- a/src/substrait/relations.incn +++ b/src/substrait/relations.incn @@ -92,6 +92,7 @@ from substrait.expr_lowering import ( i64_expr, lower_project_for_columns, lower_project, + lower_select_project_for_columns, scalar_expr_for_columns, scalar_expr, string_expr, @@ -894,6 +895,48 @@ pub def try_project_rel_for_columns( ) +pub def select_project_rel_of_columns( + input: Rel, + input_columns: list[str], + assignments: list[ProjectionAssignment], +) -> Rel: + """Wrap a child relation in `ProjectRel` with exclusive projection semantics.""" + return _lowered_rel_or_raise(try_select_project_rel_of_columns(input, input_columns, assignments)) + + +pub def try_select_project_rel_of_columns( + input: Rel, + input_columns: list[str], + assignments: list[ProjectionAssignment], +) -> Result[Rel, SubstraitLoweringError]: + """Fallibly wrap a child relation in an exclusive `ProjectRel`.""" + return try_select_project_rel_for_columns(input, scalar_columns_from_names(input_columns), assignments) + + +pub def try_select_project_rel_for_columns( + input: Rel, + input_columns: list[ScalarColumnSpec], + assignments: list[ProjectionAssignment], +) -> Result[Rel, SubstraitLoweringError]: + """Fallibly wrap a child relation in an exclusive `ProjectRel` using typed-or-name-only input columns.""" + lowered = lower_select_project_for_columns(input_columns, assignments)? + common = RelCommon( + hint=None, + advanced_extension=None, + emit_kind=Some(EmitKind.Emit(Emit(output_mapping=lowered.output_mapping))), + ) + return Ok( + _rel_project( + ProjectRel( + common=Some(common), + input=Some(Box.new(input)), + expressions=lowered.expressions, + advanced_extension=None, + ), + ), + ) + + pub def project_rel_with_expressions(input: Rel, expressions: list[Expression]) -> Rel: """Append already-lowered Substrait expressions to a relation.""" # The DataFusion adapter uses this to evaluate generator arguments into temporary columns before unnesting. @@ -907,26 +950,69 @@ pub def project_rel_with_expressions(input: Rel, expressions: list[Expression]) ) -pub def join_rel(left: Rel, right: Rel, on_predicate: bool) -> Rel: +pub def join_rel(left: Rel, right: Rel, on_predicate: ColumnExpr) -> Rel: """Wrap two child relations in an inner `JoinRel`.""" return join_rel_of_kind(left, right, on_predicate, SubstraitJoinKind.Inner) -pub def join_rel_of_kind(left: Rel, right: Rel, on_predicate: bool, kind: SubstraitJoinKind) -> Rel: +pub def join_rel_of_kind(left: Rel, right: Rel, on_predicate: ColumnExpr, kind: SubstraitJoinKind) -> Rel: """Wrap two child relations in `JoinRel` using one explicit Substrait join variant.""" - return _rel_join( - JoinRel( - common=Some(_direct_common()), - left=Some(Box.new(left)), - right=Some(Box.new(right)), - expression=Some(Box.new(bool_expr(on_predicate))), - post_join_filter=None, - type=_join_type_from_kind(kind), - advanced_extension=None, + return join_rel_of_columns( + left.clone(), + relation_output_columns(left), + right.clone(), + relation_output_columns(right), + on_predicate, + kind, + ) + + +pub def join_rel_of_columns( + left: Rel, + left_columns: list[str], + right: Rel, + right_columns: list[str], + on_predicate: ColumnExpr, + kind: SubstraitJoinKind, +) -> Rel: + """Wrap two child relations in `JoinRel` using explicit join predicate input columns.""" + return _lowered_rel_or_raise(try_join_rel_of_columns(left, left_columns, right, right_columns, on_predicate, kind)) + + +pub def try_join_rel_of_columns( + left: Rel, + left_columns: list[str], + right: Rel, + right_columns: list[str], + on_predicate: ColumnExpr, + kind: SubstraitJoinKind, +) -> Result[Rel, SubstraitLoweringError]: + """Fallibly wrap two child relations in a typed `JoinRel`.""" + input_columns = _join_input_columns(left_columns, right_columns) + predicate = filter_predicate_expr(input_columns, on_predicate)? + return Ok( + _rel_join( + JoinRel( + common=Some(_direct_common()), + left=Some(Box.new(left)), + right=Some(Box.new(right)), + expression=Some(Box.new(predicate)), + post_join_filter=None, + type=_join_type_from_kind(kind), + advanced_extension=None, + ), ), ) +def _join_input_columns(left_columns: list[str], right_columns: list[str]) -> list[str]: + """Return the scalar-expression input columns visible to one join predicate.""" + mut input_columns: list[str] = [] + input_columns.extend(left_columns) + input_columns.extend(right_columns) + return input_columns + + pub def cross_rel(left: Rel, right: Rel) -> Rel: """Wrap two child relations in `CrossRel`.""" return _rel_cross( diff --git a/src/substrait/schema.incn b/src/substrait/schema.incn index 936fb9e..5906385 100644 --- a/src/substrait/schema.incn +++ b/src/substrait/schema.incn @@ -14,12 +14,14 @@ from rust::substrait::proto::type import ( Fp64, I64, Kind, + List as SubstraitList, Nullability, PrecisionTimestamp, String as SubstraitString, Struct, ) from rust::incan_stdlib::errors import raise_value_error +from rust::std::boxed import Box @derive(Clone) @@ -29,6 +31,10 @@ pub enum SubstraitPrimitiveKind(str): F64 = "f64" Timestamp = "timestamp" String = "string" + BoolList = "list" + I64List = "list" + F64List = "list" + StringList = "list" @derive(Clone) @@ -59,6 +65,14 @@ def _primitive_kind_from_type_name(type_name: str) -> SubstraitPrimitiveKind: return SubstraitPrimitiveKind.F64 if type_name == "str": return SubstraitPrimitiveKind.String + if type_name == "list[bool]" or type_name == "List[bool]": + return SubstraitPrimitiveKind.BoolList + if type_name == "list[int]" or type_name == "List[int]": + return SubstraitPrimitiveKind.I64List + if type_name == "list[float]" or type_name == "List[float]": + return SubstraitPrimitiveKind.F64List + if type_name == "list[str]" or type_name == "List[str]": + return SubstraitPrimitiveKind.StringList message = f"unsupported model field type `{type_name}` for InQL row shape" return raise_value_error(message) @@ -71,6 +85,10 @@ def _schema_type_name(kind: SubstraitPrimitiveKind) -> str: SubstraitPrimitiveKind.F64 => return "DOUBLE" SubstraitPrimitiveKind.Timestamp => return "TIMESTAMP" SubstraitPrimitiveKind.String => return "STRING" + SubstraitPrimitiveKind.BoolList => return "ARRAY" + SubstraitPrimitiveKind.I64List => return "ARRAY" + SubstraitPrimitiveKind.F64List => return "ARRAY" + SubstraitPrimitiveKind.StringList => return "ARRAY" def _field_type_name(raw_type_name: str) -> str: @@ -137,6 +155,30 @@ def _type_from_primitive(kind: SubstraitPrimitiveKind, nullable: bool) -> Type: ) SubstraitPrimitiveKind.String => return Type(kind=Some(Kind.String(SubstraitString(type_variation_reference=0, nullability=n.into())))) + SubstraitPrimitiveKind.BoolList => + return _list_type(_type_from_primitive(SubstraitPrimitiveKind.Bool, true), nullable) + SubstraitPrimitiveKind.I64List => + return _list_type(_type_from_primitive(SubstraitPrimitiveKind.I64, true), nullable) + SubstraitPrimitiveKind.F64List => + return _list_type(_type_from_primitive(SubstraitPrimitiveKind.F64, true), nullable) + SubstraitPrimitiveKind.StringList => + return _list_type(_type_from_primitive(SubstraitPrimitiveKind.String, true), nullable) + + +def _list_type(element_type: Type, nullable: bool) -> Type: + """Lower a primitive list element type into a Substrait list type.""" + mut n = Nullability.Required + if nullable: + n = Nullability.Nullable + return Type( + kind=Some( + Kind.List( + Box.new( + SubstraitList(type=Some(Box.new(element_type)), type_variation_reference=0, nullability=n.into()), + ), + ), + ), + ) def _row_shape_field_types(shape: RowShapeSpec) -> list[Type]: diff --git a/tests/test_dataset.incn b/tests/test_dataset.incn index b9bfa8d..d28abd3 100644 --- a/tests/test_dataset.incn +++ b/tests/test_dataset.incn @@ -37,7 +37,7 @@ from functions import ( stack, window, ) -from projection_builders import ColumnExprKind, column_expr_kind, column_expr_name +from projection_builders import ColumnExprKind, column_expr_kind, column_expr_name, with_column_assignment from substrait.function_extensions import ( EXPLODE_EXTENSION_URI, EXPLODE_OUTER_EXTENSION_URI, @@ -61,6 +61,11 @@ model Order: id: int +@derive(Clone) +model OrderProjection: + order_id: int + + @derive(Clone) model Event: id: int @@ -143,6 +148,10 @@ def _order_witness() -> list[Order]: return [Order(id=0)] +def _event_witness() -> list[Event]: + return [Event(id=0)] + + def _preview_materialization(columns: list[str], row_count: int, preview_text: str) -> DataFrameMaterialization: """Build one structured materialization fixture for direct DataFrame tests.""" return DataFrameMaterialization(resolved_columns=columns, row_count=row_count, preview_text=preview_text) @@ -214,9 +223,13 @@ def test_type_contracts__signature_tiers_compile() -> None: _type_witness=_order_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], + ) + st: DataStream[Event] = DataStream( + _type_witness=_event_witness(), + _substrait_rel=read_named_table_rel("events"), + _planned_columns=[], ) - ev = Event(id=2) - st: DataStream[Event] = DataStream(_row_schema_marker=ev, _substrait_rel=read_named_table_rel("events")) # -- Act -- any_df = _accept_data_set_generic(df) @@ -239,10 +252,14 @@ def test_type_contracts__concrete_carriers_compile() -> None: _type_witness=_order_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], ) lf: LazyFrame[Order] = lazy_frame_named_table("orders") - ev = Event(id=4) - st: DataStream[Event] = DataStream(_row_schema_marker=ev, _substrait_rel=read_named_table_rel("events")) + st: DataStream[Event] = DataStream( + _type_witness=_event_witness(), + _substrait_rel=read_named_table_rel("events"), + _planned_columns=[], + ) # -- Act -- concrete_df = _accept_data_frame_concrete(df) @@ -262,10 +279,14 @@ def test_hierarchy__concrete_and_supertrait_assignability() -> None: _type_witness=_order_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], ) lf: LazyFrame[Order] = lazy_frame_named_table("orders") - ev = Event(id=6) - st: DataStream[Event] = DataStream(_row_schema_marker=ev, _substrait_rel=read_named_table_rel("events")) + st: DataStream[Event] = DataStream( + _type_witness=_event_witness(), + _substrait_rel=read_named_table_rel("events"), + _planned_columns=[], + ) # -- Act -- _compile_hierarchy_assignability(df, lf, st) @@ -283,10 +304,14 @@ def test_type_contracts__concrete_and_trait_types_match_generic_arguments() -> N _type_witness=_order_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], ) lf: LazyFrame[Order] = lazy_frame_named_table("orders") - ev = Event(id=8) - st: DataStream[Event] = DataStream(_row_schema_marker=ev, _substrait_rel=read_named_table_rel("events")) + st: DataStream[Event] = DataStream( + _type_witness=_event_witness(), + _substrait_rel=read_named_table_rel("events"), + _planned_columns=[], + ) # -- Act -- bounded_df = _accept_bounded_generic(df) @@ -318,6 +343,7 @@ def test_dataset_ops__method_wrapper_matches_canonical_function() -> None: _type_witness=_order_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], ) # -- Act -- @@ -337,6 +363,7 @@ def test_dataset_ops__data_frame_transforms_invalidate_stale_materialization() - _type_witness=_order_witness(), _materialization=_preview_materialization(["id"], 1, "| id |\n| 1 |"), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], ) # -- Act -- @@ -355,9 +382,10 @@ def test_dataset_ops__all_carriers_emit_real_plans() -> None: _type_witness=_order_witness(), _materialization=DataFrameMaterialization.empty(), _substrait_rel=read_named_table_rel("orders"), + _planned_columns=[], ) lf: LazyFrame[Order] = lazy_frame_named_table("orders") - st = DataStream(_row_schema_marker=Event(id=12), _substrait_rel=read_named_table_rel("events")) + st = DataStream(_type_witness=_event_witness(), _substrait_rel=read_named_table_rel("events"), _planned_columns=[]) # -- Act -- df_plan = df.to_substrait_plan() @@ -391,13 +419,42 @@ def test_dataset_ops__with_column_updates_lazy_planned_columns_with_add_or_repla assert replaced_cols[1] == "double_id", "later projected columns should keep their relative order after replacement" +def test_dataset_ops__select_retargets_output_row_type_while_preserving_carrier_kind() -> None: + """SELECT is the schema-changing boundary used by RFC003 query blocks.""" + # -- Arrange -- + _register_projection_test_schema("orders_projection_types") + assignments = [with_column_assignment("order_id", col("id"))] + base_df: DataFrame[Order] = DataFrame( + _type_witness=_order_witness(), + _materialization=DataFrameMaterialization.empty(), + _substrait_rel=read_named_table_rel("orders_projection_types"), + _planned_columns=[], + ) + base_lazy: LazyFrame[Order] = lazy_frame_named_table("orders_projection_types") + base_stream: DataStream[Event] = DataStream( + _type_witness=_event_witness(), + _substrait_rel=read_named_table_rel("orders_projection_types"), + _planned_columns=[], + ) + + # -- Act -- + projected_df: DataFrame[OrderProjection] = base_df.select(assignments) + projected_lazy: LazyFrame[OrderProjection] = base_lazy.select(assignments) + projected_stream: DataStream[OrderProjection] = base_stream.select(assignments) + + # -- Assert -- + assert projected_df.planned_columns() == ["order_id"], "DataFrame SELECT should expose the projected row shape" + assert projected_lazy.planned_columns() == ["order_id"], "LazyFrame SELECT should expose the projected row shape" + assert projected_stream.planned_columns() == ["order_id"], "DataStream SELECT should expose the projected row shape" + + def test_dataset_ops__api_lowered_boundary_facts_stay_stable() -> None: # -- Arrange -- left = read_named_table_rel("orders") right = read_named_table_rel("orders_archive") # -- Act -- - joined_plan = plan_from_root_relation(join_ds(left, right, true), ["id"]) + joined_plan = plan_from_root_relation(join_ds(left, right, always_true()), ["id"]) # -- Assert -- assert relation_kind_name(root_rel(joined_plan)) == "JoinRel", "canonical join function should still lower to a JoinRel root" @@ -422,7 +479,7 @@ def test_lazy_frame__independent_roots_can_join_and_lower() -> None: right: LazyFrame[Order] = lazy_frame_named_table("orders_archive").filter(always_false()) # -- Act -- - joined: LazyFrame[Order] = left.join(right, true) + joined: LazyFrame[Order] = left.join(right, always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -444,7 +501,7 @@ def test_lazy_frame__native_prism_ops_preserve_current_boundary_shapes() -> None ) # -- Act -- - projected: LazyFrame[Order] = lazy_frame_named_table("orders").select() + projected: LazyFrame[Order] = lazy_frame_named_table[Order]("orders").select() grouped: LazyFrame[Order] = lazy_frame_named_table("orders").group_by([col("id")]) aggregated = grouped.agg([count()]) @@ -510,9 +567,9 @@ def test_lazy_frame__deeper_independent_roots_still_lower_with_stable_shapes() - # -- Act -- right_joined: LazyFrame[Order] = right_base.filter(always_false()).order_by([col("id")]).join( right_base.filter(always_false()).order_by([col("id")]), - true, + always_true(), ) - joined: LazyFrame[Order] = left.join(right_joined, true) + joined: LazyFrame[Order] = left.join(right_joined, always_true()) plan = joined.to_substrait_plan() # -- Assert -- diff --git a/tests/test_prism.incn b/tests/test_prism.incn index f3478e5..fc777a9 100644 --- a/tests/test_prism.incn +++ b/tests/test_prism.incn @@ -107,7 +107,7 @@ def test_prism__same_store_join_reuses_shared_history() -> None: right: PrismCursor[Order] = base.select() # -- Act -- - joined: PrismCursor[Order] = left.join(right.clone(), true) + joined: PrismCursor[Order] = left.join(right.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -131,7 +131,7 @@ def test_prism__same_store_join_with_longer_branches_is_still_one_append() -> No right: PrismCursor[Order] = base.select().limit(5) # -- Act -- - joined: PrismCursor[Order] = left.join(right.clone(), true) + joined: PrismCursor[Order] = left.join(right.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -151,7 +151,7 @@ def test_prism__cross_store_join_adopts_reachable_subgraph() -> None: right: PrismCursor[Order] = prism_cursor_named_table(str("orders_archive")).filter(always_false()) # -- Act -- - joined: PrismCursor[Order] = left.join(right.clone(), true) + joined: PrismCursor[Order] = left.join(right.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -172,8 +172,8 @@ def test_prism__cross_store_join_dedups_equivalent_reachable_rhs_nodes() -> None right_right: PrismCursor[Order] = right_base.filter(always_false()) # -- Act -- - right_joined: PrismCursor[Order] = right_left.join(right_right, true) - joined: PrismCursor[Order] = left.join(right_joined.clone(), true) + right_joined: PrismCursor[Order] = right_left.join(right_right, always_true()) + joined: PrismCursor[Order] = left.join(right_joined.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -195,8 +195,8 @@ def test_prism__cross_store_join_dedups_equivalent_rhs_multistep_branches() -> N right_right: PrismCursor[Order] = right_base.filter(always_false()).order_by([col("id")]) # -- Act -- - right_joined: PrismCursor[Order] = right_left.join(right_right.clone(), true) - joined: PrismCursor[Order] = left.join(right_joined.clone(), true) + right_joined: PrismCursor[Order] = right_left.join(right_right.clone(), always_true()) + joined: PrismCursor[Order] = left.join(right_joined.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -219,8 +219,8 @@ def test_prism__cross_store_adoption_keeps_distinct_aggregate_modifier_state() - right_distinct: PrismCursor[Order] = right_base.group_by([col("id")]).agg([count(col("id")).distinct()]) # -- Act -- - right_joined: PrismCursor[Order] = right_plain.join(right_distinct.clone(), true) - joined: PrismCursor[Order] = left.join(right_joined.clone(), true) + right_joined: PrismCursor[Order] = right_plain.join(right_distinct.clone(), always_true()) + joined: PrismCursor[Order] = left.join(right_joined.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -241,8 +241,8 @@ def test_prism__cross_store_adoption_keeps_aggregate_arguments() -> None: right_tail: PrismCursor[Order] = right_base.group_by([col("id")]).agg([approx_percentile(col("id"), 0.95)]) # -- Act -- - right_joined: PrismCursor[Order] = right_median.join(right_tail.clone(), true) - joined: PrismCursor[Order] = left.join(right_joined.clone(), true) + right_joined: PrismCursor[Order] = right_median.join(right_tail.clone(), always_true()) + joined: PrismCursor[Order] = left.join(right_joined.clone(), always_true()) plan = joined.to_substrait_plan() # -- Assert -- @@ -257,7 +257,7 @@ def test_prism__cursor_native_nodes_cover_current_method_surface() -> None: # -- Arrange -- _register_projection_test_schema(str("orders")) _register_generator_test_schema(str("orders_generator_prism")) - projected: PrismCursor[Order] = prism_cursor_named_table(str("orders")).select() + projected: PrismCursor[Order] = prism_cursor_named_table[Order](str("orders")).select() grouped: PrismCursor[Order] = prism_cursor_named_table(str("orders")).group_by([col("id")]) # -- Act -- @@ -318,7 +318,7 @@ def test_prism__rewrite_eliminates_filter_true_by_default() -> None: def test_prism__rewrite_collapses_adjacent_limits_projects_and_order_by() -> None: # -- Arrange -- limited: PrismCursor[Order] = prism_cursor_named_table(str("orders")).limit(10).limit(3) - projected: PrismCursor[Order] = prism_cursor_named_table(str("orders")).select().select() + projected: PrismCursor[Order] = prism_cursor_named_table[Order](str("orders")).select[Order]().select[Order]() # -- Act -- ordered: PrismCursor[Order] = prism_cursor_named_table(str("orders")).order_by([col("id")]).order_by([col("id")]) @@ -435,8 +435,11 @@ def test_prism__cursor_methods_match_apply_helpers() -> None: base: PrismCursor[Order] = prism_cursor_named_table(str("orders")) # -- Act -- - via_methods = base.filter(always_false()).select().limit(3) - via_helpers = prism_cursor_apply_limit(prism_cursor_apply_select(prism_cursor_apply_filter(base, always_false())), 3) + via_methods: PrismCursor[Order] = base.filter(always_false()).select[Order]().limit(3) + via_helpers: PrismCursor[Order] = prism_cursor_apply_limit( + prism_cursor_apply_select[Order, Order](prism_cursor_apply_filter(base, always_false())), + 3, + ) # -- Assert -- assert prism_cursor_tip_kind_name(via_methods) == prism_cursor_tip_kind_name(via_helpers), "method and helper paths should produce the same tip kind" diff --git a/tests/test_session_generators.incn b/tests/test_session_generators.incn index 871c9aa..4a27c6b 100644 --- a/tests/test_session_generators.incn +++ b/tests/test_session_generators.incn @@ -271,3 +271,23 @@ def test_session_generators__generate_after_window_materializes_nested_window() assert df.row_count() == 6, "generate above a window should execute the nested window first" assert resolved == ["customer_id", "amount", "row_num", "tag"], "generated window output columns should stay stable" _assert_preview_row_contains(payload, ["A", "10", "1", "paid"], "generated rows should retain window values") + + +def test_session_generators__window_after_generate_materializes_nested_generator() -> None: + # -- Arrange -- + mut session = Session.default() + lazy = _aggregate_orders(session) + spec = window().partition_by([col("tag")]).order_by([col("amount")]) + + # -- Act -- + df = _collect_or_fail( + session, + lazy.generate(explode(_tags(), "tag")).with_window_column("row_num", row_number().over(spec)), + ) + payload = df.preview_text() + resolved = df.resolved_columns() + + # -- Assert -- + assert df.row_count() == 6, "window above generate should execute the nested generator first" + assert resolved == ["customer_id", "amount", "tag", "row_num"], "window output should append after generated columns" + _assert_preview_row_contains(payload, ["B", "7", "B", "1"], "window functions should see generated columns") diff --git a/tests/test_session_projection.incn b/tests/test_session_projection.incn index aabe04e..f40dac6 100644 --- a/tests/test_session_projection.incn +++ b/tests/test_session_projection.incn @@ -681,7 +681,7 @@ def test_session_projection__collect_executes_identity_select() -> None: session.read_csv("aggregate_orders", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - selected = lazy.select() + selected: LazyFrame[AggregateOrder] = lazy.select() df = assert_is_ok(session.collect(selected), "identity select collect should execute") resolved = df.resolved_columns() diff --git a/tests/test_substrait_plan.incn b/tests/test_substrait_plan.incn index 9e559ad..52952aa 100644 --- a/tests/test_substrait_plan.incn +++ b/tests/test_substrait_plan.incn @@ -1245,7 +1245,7 @@ def test_plan__enum_backed_join_and_set_builders_expose_boundary_facts() -> None left_join_rel = join_rel_of_kind( read_named_table_rel("orders"), read_named_table_rel("customers"), - true, + always_true(), SubstraitJoinKind.Left, ) union_distinct_rel = set_rel_of_kind( @@ -1373,7 +1373,7 @@ def test_conformance__core_scenarios_validate_emission_output() -> None: join_rel_of_kind( read_named_table_rel(_fixture_table_main()), read_named_table_rel(_fixture_table_aux()), - true, + always_true(), SubstraitJoinKind.Left, ), [_fixture_col_primary()], diff --git a/vocab_companion/Cargo.lock b/vocab_companion/Cargo.lock new file mode 100644 index 0000000..a7a37aa --- /dev/null +++ b/vocab_companion/Cargo.lock @@ -0,0 +1,114 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "incan_vocab" +version = "0.3.0" +dependencies = [ + "serde", + "serde_json", +] + +[[package]] +name = "inql_vocab_companion" +version = "0.1.0" +dependencies = [ + "incan_vocab", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "memchr" +version = "2.8.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6b947ae49db0d222b1dbc6b113ce7248a3fc3a6ca21b696717bfc000ba4484d8" + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.150" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" diff --git a/vocab_companion/Cargo.toml b/vocab_companion/Cargo.toml new file mode 100644 index 0000000..e870d1e --- /dev/null +++ b/vocab_companion/Cargo.toml @@ -0,0 +1,10 @@ +[package] +name = "inql_vocab_companion" +version = "0.1.0" +edition = "2021" + +[lib] +crate-type = ["rlib", "cdylib"] + +[dependencies] +incan_vocab = { path = "../incan/crates/incan_vocab" } diff --git a/vocab_companion/src/desugar.rs b/vocab_companion/src/desugar.rs new file mode 100644 index 0000000..a46bf8b --- /dev/null +++ b/vocab_companion/src/desugar.rs @@ -0,0 +1,636 @@ +use incan_vocab::{ + DesugarError, DesugarOutput, IncanBinaryOp, IncanExpr, IncanScopedSurfacePayload, IncanUnaryOp, + VocabBodyItem, VocabClause, VocabClauseBody, VocabDeclaration, VocabDesugarer, + VocabExpressionItem, VocabFieldSpec, VocabSyntaxNode, +}; + +use crate::{QUERY_FIELD_DESCRIPTOR, QUERY_KW}; + +#[derive(Default)] +pub struct InqlQueryDesugarer; + +struct PendingJoin { + source: IncanExpr, + relation_name: String, + method_name: &'static str, +} + +impl VocabDesugarer for InqlQueryDesugarer { + fn desugar(&self, node: &VocabSyntaxNode) -> Result { + let declaration = match node { + VocabSyntaxNode::Declaration(declaration) if declaration.keyword == QUERY_KW => { + declaration + } + VocabSyntaxNode::Declaration(_) => { + return Err(DesugarError::new( + "InQL query desugarer expected a `query:` declaration", + )); + } + _ => { + return Err(DesugarError::new( + "InQL query desugarer expected a declaration node", + )) + } + }; + Ok(DesugarOutput::Expression(lower_query(declaration)?)) + } +} + +fn lower_query(declaration: &VocabDeclaration) -> Result { + let from_clause = required_clause(declaration, "FROM")?; + let mut current = required_expression(from_clause, "FROM")?.clone(); + let mut saw_select = false; + let mut pending_join: Option = None; + + for item in &declaration.body { + let VocabBodyItem::Clause(clause) = item else { + continue; + }; + let spelling = clause_spelling(clause); + match spelling.as_str() { + "FROM" => {} + "JOIN" | "LEFT JOIN" => { + ensure_no_pending_join(&pending_join)?; + let source = required_expression(clause, &spelling)?.clone(); + pending_join = Some(PendingJoin { + relation_name: join_relation_name(&source)?, + source, + method_name: if spelling == "LEFT JOIN" { + "left_join" + } else { + "join" + }, + }); + } + "ON" => { + let Some(join) = pending_join.take() else { + return Err(DesugarError::new( + "ON clauses require a preceding JOIN or LEFT JOIN clause", + )); + }; + current = method_call( + current, + join.method_name, + vec![ + join.source, + lower_column_expr(required_expression(clause, "ON")?)?, + IncanExpr::Str(join.relation_name), + ], + ); + } + "WHERE" => { + ensure_no_pending_join(&pending_join)?; + current = method_call( + current, + "filter", + vec![lower_column_expr(required_expression(clause, "WHERE")?)?], + ); + } + "GROUP BY" => { + ensure_no_pending_join(&pending_join)?; + current = method_call( + current, + "group_by", + vec![IncanExpr::List(lower_expression_list(clause)?)], + ); + } + "EXPLODE" => { + ensure_no_pending_join(&pending_join)?; + let generator = lower_explode_clause(clause)?; + current = method_call(current, "generate", vec![generator]); + } + "WINDOW BY" => { + ensure_no_pending_join(&pending_join)?; + for field in field_set(clause)? { + let application = field.default_value.as_ref().ok_or_else(|| { + DesugarError::new("WINDOW BY entries require `name = window_call`") + })?; + current = method_call( + current, + "with_window_column", + vec![ + IncanExpr::Str(field.name.clone()), + lower_column_expr(application)?, + ], + ); + } + } + "SELECT" => { + ensure_no_pending_join(&pending_join)?; + let (items, distinct) = select_items_and_distinct(clause, false)?; + current = lower_select(current, &items, distinct)?; + saw_select = true; + } + "SELECT DISTINCT" => { + ensure_no_pending_join(&pending_join)?; + let (items, distinct) = select_items_and_distinct(clause, true)?; + current = lower_select(current, &items, distinct)?; + saw_select = true; + } + "ORDER BY" => { + ensure_no_pending_join(&pending_join)?; + current = method_call( + current, + "order_by", + vec![IncanExpr::List(lower_expression_list(clause)?)], + ); + } + "LIMIT" => { + ensure_no_pending_join(&pending_join)?; + current = method_call( + current, + "limit", + vec![required_expression(clause, "LIMIT")?.clone()], + ); + } + _ => {} + } + } + + ensure_no_pending_join(&pending_join)?; + if !saw_select { + return Err(DesugarError::new( + "query blocks require a SELECT or SELECT DISTINCT clause", + )); + } + Ok(current) +} + +fn ensure_no_pending_join(pending_join: &Option) -> Result<(), DesugarError> { + if pending_join.is_some() { + return Err(DesugarError::new( + "JOIN clauses must be followed immediately by ON", + )); + } + Ok(()) +} + +fn join_relation_name(expr: &IncanExpr) -> Result { + match expr { + IncanExpr::Name(name) => Ok(name.clone()), + _ => Err(DesugarError::new( + "JOIN sources must be a named relation so relation.column references can be resolved", + )), + } +} + +fn lower_explode_clause(clause: &VocabClause) -> Result { + let items = expression_items(clause)?; + if items.len() != 1 { + return Err(DesugarError::new("EXPLODE requires exactly one expression")); + } + let Some(alias) = &items[0].alias else { + return Err(DesugarError::new( + "EXPLODE requires `as ` for the generated column", + )); + }; + Ok(helper_call( + "explode", + vec![ + lower_column_expr(&items[0].expr)?, + IncanExpr::Str(alias.clone()), + ], + )) +} + +fn select_items_and_distinct( + clause: &VocabClause, + explicit_distinct: bool, +) -> Result<(Vec, bool), DesugarError> { + let mut items = expression_items(clause)?.to_vec(); + let distinct = explicit_distinct || strip_distinct_prefix(&mut items)?; + Ok((items, distinct)) +} + +fn strip_distinct_prefix(items: &mut [VocabExpressionItem]) -> Result { + let Some(first) = items.first_mut() else { + return Ok(false); + }; + match &mut first.expr { + IncanExpr::ScopedSurface(surface) if surface.descriptor_key == QUERY_FIELD_DESCRIPTOR => { + let IncanScopedSurfacePayload::LeadingDotPath { segments, .. } = &mut surface.payload + else { + return Ok(false); + }; + return strip_distinct_segments(segments); + } + IncanExpr::CurrentField(field) | IncanExpr::Name(field) => { + if let Some(stripped) = field.strip_prefix("DISTINCT.") { + if stripped.is_empty() { + return Err(DesugarError::new( + "SELECT DISTINCT requires an expression after DISTINCT", + )); + } + *field = stripped.to_string(); + return Ok(true); + } + } + IncanExpr::RelationField { relation, field } if relation == "DISTINCT" => { + if field.is_empty() { + return Err(DesugarError::new( + "SELECT DISTINCT requires an expression after DISTINCT", + )); + } + first.expr = IncanExpr::CurrentField(field.clone()); + return Ok(true); + } + _ => {} + } + Ok(false) +} + +fn strip_distinct_segments(segments: &mut Vec) -> Result { + let Some(first) = segments.first() else { + return Ok(false); + }; + if first == "DISTINCT" { + segments.remove(0); + } else if let Some(stripped) = first.strip_prefix("DISTINCT.") { + segments[0] = stripped.to_string(); + } else { + return Ok(false); + } + if segments.is_empty() || segments[0].is_empty() { + return Err(DesugarError::new( + "SELECT DISTINCT requires an expression after DISTINCT", + )); + } + Ok(true) +} + +fn lower_select( + source: IncanExpr, + items: &[VocabExpressionItem], + distinct: bool, +) -> Result { + if items.iter().any(item_is_aggregate) { + let mut measures = Vec::new(); + let mut assignments = Vec::new(); + for item in items { + if expr_is_aggregate(&item.expr) { + let measure = lower_column_expr(&item.expr)?; + let output_name = aggregate_item_output_name(item)?; + measures.push(match &item.alias { + Some(alias) => { + helper_call("aggregate_as", vec![measure, IncanExpr::Str(alias.clone())]) + } + None => measure, + }); + assignments.push(helper_call( + "with_column_assignment", + vec![ + IncanExpr::Str(output_name.clone()), + helper_call("col", vec![IncanExpr::Str(output_name)]), + ], + )); + } else { + let output_name = select_item_output_name(item)?; + assignments.push(helper_call( + "with_column_assignment", + vec![IncanExpr::Str(output_name), lower_column_expr(&item.expr)?], + )); + } + } + let mut current = method_call(source, "agg", vec![IncanExpr::List(measures)]); + current = method_call(current, "select", vec![IncanExpr::List(assignments)]); + if distinct { + current = method_call( + current, + "group_by", + vec![IncanExpr::List(select_output_columns(items)?)], + ); + } + return Ok(current); + } + + let assignments = items + .iter() + .map(|item| { + let output_name = select_item_output_name(item)?; + let expr = lower_column_expr(&item.expr)?; + Ok(helper_call( + "with_column_assignment", + vec![IncanExpr::Str(output_name), expr], + )) + }) + .collect::, DesugarError>>()?; + let mut current = method_call(source, "select", vec![IncanExpr::List(assignments)]); + if distinct { + current = method_call( + current, + "group_by", + vec![IncanExpr::List(select_output_columns(items)?)], + ); + } + Ok(current) +} + +fn aggregate_item_output_name(item: &VocabExpressionItem) -> Result { + if let Some(alias) = &item.alias { + return Ok(alias.clone()); + } + match &item.expr { + IncanExpr::Call { callee, args } => aggregate_call_output_name(callee, args), + IncanExpr::ScopedSymbolCall(call) => { + aggregate_default_output_name(&call.symbol, &call.args) + } + _ => Err(DesugarError::new( + "aggregate SELECT expressions require `as `", + )), + } +} + +fn aggregate_call_output_name( + callee: &IncanExpr, + args: &[IncanExpr], +) -> Result { + match callee { + IncanExpr::Name(name) => aggregate_default_output_name(name, args), + _ => Err(DesugarError::new( + "aggregate SELECT expects a direct aggregate helper call", + )), + } +} + +fn aggregate_default_output_name(name: &str, args: &[IncanExpr]) -> Result { + if !is_aggregate_name(name) { + return Err(DesugarError::new( + "aggregate SELECT expected an aggregate call", + )); + } + if args.is_empty() { + return Ok(name.to_string()); + } + Ok(format!("{}_{}", name, scalar_default_output_name(&args[0]))) +} + +fn scalar_default_output_name(expr: &IncanExpr) -> String { + match expr { + IncanExpr::ScopedSurface(surface) if surface.descriptor_key == QUERY_FIELD_DESCRIPTOR => { + if let IncanScopedSurfacePayload::LeadingDotPath { segments, .. } = &surface.payload { + return segments.join("."); + } + "expr".to_string() + } + IncanExpr::CurrentField(field) | IncanExpr::Name(field) => field.clone(), + IncanExpr::RelationField { relation, field } => format!("{relation}.{field}"), + _ => "expr".to_string(), + } +} + +fn select_output_columns(items: &[VocabExpressionItem]) -> Result, DesugarError> { + items + .iter() + .map(|item| { + Ok(helper_call( + "col", + vec![IncanExpr::Str(select_item_output_name(item)?)], + )) + }) + .collect() +} + +fn select_item_output_name(item: &VocabExpressionItem) -> Result { + if let Some(alias) = &item.alias { + return Ok(alias.clone()); + } + match &item.expr { + IncanExpr::ScopedSurface(surface) if surface.descriptor_key == QUERY_FIELD_DESCRIPTOR => { + if let IncanScopedSurfacePayload::LeadingDotPath { segments, .. } = &surface.payload { + return Ok(segments.join(".")); + } + Err(DesugarError::new( + "query field shorthand expected a leading-dot path", + )) + } + IncanExpr::CurrentField(field) | IncanExpr::Name(field) => Ok(field.clone()), + IncanExpr::RelationField { relation, field } => Ok(format!("{relation}.{field}")), + _ => Err(DesugarError::new( + "computed SELECT expressions require `as `", + )), + } +} + +fn lower_expression_list(clause: &VocabClause) -> Result, DesugarError> { + match &clause.body { + VocabClauseBody::ExpressionList(expressions) => expressions + .iter() + .map(|item| lower_column_expr(&item.expr)) + .collect(), + VocabClauseBody::Expression(expr) => Ok(vec![lower_column_expr(expr)?]), + _ => Err(DesugarError::new(format!( + "{} requires an expression list", + clause_spelling(clause) + ))), + } +} + +fn lower_column_expr(expr: &IncanExpr) -> Result { + match expr { + IncanExpr::ScopedSurface(surface) if surface.descriptor_key == QUERY_FIELD_DESCRIPTOR => { + if let IncanScopedSurfacePayload::LeadingDotPath { segments, .. } = &surface.payload { + return Ok(helper_call("col", vec![IncanExpr::Str(segments.join("."))])); + } + Err(DesugarError::new( + "query field shorthand expected a leading-dot path", + )) + } + IncanExpr::CurrentField(field) => { + Ok(helper_call("col", vec![IncanExpr::Str(field.clone())])) + } + IncanExpr::RelationField { relation, field } => Ok(helper_call( + "col", + vec![IncanExpr::Str(format!("{relation}.{field}"))], + )), + IncanExpr::Str(_) | IncanExpr::Int(_) | IncanExpr::Bool(_) => { + Ok(helper_call("lit", vec![expr.clone()])) + } + IncanExpr::Binary(left, op, right) => Ok(helper_call( + binary_helper(*op)?, + vec![lower_value_expr(left)?, lower_value_expr(right)?], + )), + IncanExpr::Unary(op, inner) => Ok(helper_call( + unary_helper(*op)?, + vec![lower_value_expr(inner)?], + )), + IncanExpr::Call { callee, args } => lower_call(callee, args), + IncanExpr::ScopedSymbolCall(call) => Ok(helper_call( + call.symbol.as_str(), + call.args + .iter() + .map(lower_value_expr) + .collect::, DesugarError>>()?, + )), + IncanExpr::Name(name) => Ok(helper_call("col", vec![IncanExpr::Str(name.clone())])), + IncanExpr::Field { object, field } => Ok(IncanExpr::Field { + object: Box::new(lower_column_expr(object)?), + field: field.clone(), + }), + IncanExpr::List(items) => Ok(IncanExpr::List( + items + .iter() + .map(lower_column_expr) + .collect::, _>>()?, + )), + _ => Err(DesugarError::new( + "query expression form is not part of the RFC003 grammar", + )), + } +} + +fn lower_value_expr(expr: &IncanExpr) -> Result { + match expr { + IncanExpr::Str(_) | IncanExpr::Int(_) | IncanExpr::Bool(_) => Ok(expr.clone()), + _ => lower_column_expr(expr), + } +} + +fn lower_call(callee: &IncanExpr, args: &[IncanExpr]) -> Result { + match callee { + IncanExpr::Name(name) => Ok(helper_call( + helper_name(name), + args.iter() + .map(lower_value_expr) + .collect::, _>>()?, + )), + IncanExpr::Field { object, field } => Ok(IncanExpr::Call { + callee: Box::new(IncanExpr::Field { + object: Box::new(lower_column_expr(object)?), + field: field.clone(), + }), + args: args + .iter() + .map(lower_value_expr) + .collect::, _>>()?, + }), + _ => Err(DesugarError::new( + "query calls require a direct helper or method callee", + )), + } +} + +fn helper_name(name: &str) -> &str { + match name { + "mod" => "modulo", + other => other, + } +} + +fn binary_helper(op: IncanBinaryOp) -> Result<&'static str, DesugarError> { + match op { + IncanBinaryOp::Add => Ok("add"), + IncanBinaryOp::Sub => Ok("sub"), + IncanBinaryOp::Mul => Ok("mul"), + IncanBinaryOp::Div => Ok("div"), + IncanBinaryOp::Mod => Ok("modulo"), + IncanBinaryOp::Eq => Ok("eq"), + IncanBinaryOp::NotEq => Ok("ne"), + IncanBinaryOp::Lt => Ok("lt"), + IncanBinaryOp::LtEq => Ok("lte"), + IncanBinaryOp::Gt => Ok("gt"), + IncanBinaryOp::GtEq => Ok("gte"), + IncanBinaryOp::And => Ok("and_"), + IncanBinaryOp::Or => Ok("or_"), + _ => Err(DesugarError::new( + "query binary operator is not part of the RFC003 grammar", + )), + } +} + +fn unary_helper(op: IncanUnaryOp) -> Result<&'static str, DesugarError> { + match op { + IncanUnaryOp::Not => Ok("not_"), + IncanUnaryOp::Neg => Ok("neg"), + _ => Err(DesugarError::new( + "query unary operator is not part of the RFC003 grammar", + )), + } +} + +fn method_call(receiver: IncanExpr, method: &str, args: Vec) -> IncanExpr { + IncanExpr::Call { + callee: Box::new(IncanExpr::Field { + object: Box::new(receiver), + field: method.to_string(), + }), + args, + } +} + +fn helper_call(name: &str, args: Vec) -> IncanExpr { + IncanExpr::Call { + callee: Box::new(IncanExpr::Helper(name.to_string())), + args, + } +} + +fn required_clause<'a>( + declaration: &'a VocabDeclaration, + spelling: &str, +) -> Result<&'a VocabClause, DesugarError> { + declaration + .body + .iter() + .find_map(|item| match item { + VocabBodyItem::Clause(clause) if clause_spelling(clause) == spelling => Some(clause), + _ => None, + }) + .ok_or_else(|| DesugarError::new(format!("query blocks require a {spelling} clause"))) +} + +fn required_expression<'a>( + clause: &'a VocabClause, + spelling: &str, +) -> Result<&'a IncanExpr, DesugarError> { + match &clause.body { + VocabClauseBody::Expression(expr) => Ok(expr), + _ => Err(DesugarError::new(format!( + "{spelling} requires one expression" + ))), + } +} + +fn field_set(clause: &VocabClause) -> Result<&[VocabFieldSpec], DesugarError> { + match &clause.body { + VocabClauseBody::FieldSet(fields) => Ok(fields), + _ => Err(DesugarError::new(format!( + "{} requires a field body", + clause_spelling(clause) + ))), + } +} + +fn expression_items(clause: &VocabClause) -> Result<&[VocabExpressionItem], DesugarError> { + match &clause.body { + VocabClauseBody::ExpressionList(items) => Ok(items), + _ => Err(DesugarError::new(format!( + "{} requires an expression list", + clause_spelling(clause) + ))), + } +} + +fn clause_spelling(clause: &VocabClause) -> String { + if clause.compound_tokens.is_empty() { + return clause.keyword.clone(); + } + format!("{} {}", clause.keyword, clause.compound_tokens.join(" ")) +} + +fn item_is_aggregate(item: &VocabExpressionItem) -> bool { + expr_is_aggregate(&item.expr) +} + +fn expr_is_aggregate(expr: &IncanExpr) -> bool { + match expr { + IncanExpr::Call { callee, .. } => { + matches!(callee.as_ref(), IncanExpr::Name(name) if is_aggregate_name(name)) + } + IncanExpr::ScopedSymbolCall(call) => is_aggregate_name(&call.symbol), + _ => false, + } +} + +fn is_aggregate_name(name: &str) -> bool { + matches!(name, "sum" | "count" | "avg" | "min" | "max") +} diff --git a/vocab_companion/src/lib.rs b/vocab_companion/src/lib.rs new file mode 100644 index 0000000..04c4801 --- /dev/null +++ b/vocab_companion/src/lib.rs @@ -0,0 +1,139 @@ +//! InQL query-block vocabulary companion. +//! +//! The Incan compiler owns the generic vocabulary contract. InQL owns this package-specific `query:` surface and +//! lowers it into ordinary InQL helper/method calls that continue through Prism, Substrait, and the active backend. + +mod desugar; + +use incan_vocab::{ + ClauseSurface, DeclarationSurface, DslSurface, HelperBinding, LibraryManifest, + ScopedSurfaceDescriptor, ScopedSurfaceDiagnosticKind, ScopedSurfaceDiagnosticTemplate, + ScopedSurfaceEligibility, ScopedSurfaceMisuseScope, ScopedSurfaceReceiver, VocabRegistration, +}; + +pub use desugar::InqlQueryDesugarer; + +pub const NAMESPACE: &str = "inql"; +pub const QUERY_KW: &str = "query"; +pub const QUERY_FIELD_DESCRIPTOR: &str = "inql.query.field"; + +const HELPER_EXPORTS: &[&str] = &[ + "col", + "lit", + "array", + "add", + "sub", + "mul", + "div", + "modulo", + "eq", + "ne", + "lt", + "lte", + "gt", + "gte", + "and_", + "or_", + "not_", + "neg", + "asc", + "desc", + "explode", + "sum", + "count", + "avg", + "min", + "max", + "aggregate_as", + "with_column_assignment", + "window", + "row_number", + "rank", + "dense_rank", + "percent_rank", + "cume_dist", + "ntile", + "lag", + "lead", + "first_value", + "last_value", + "nth_value", +]; + +#[must_use] +pub fn library_vocab() -> VocabRegistration { + VocabRegistration::new() + .with_surface( + DslSurface::on_import(NAMESPACE) + .with_declaration( + DeclarationSurface::named(QUERY_KW) + .with_clause_body() + .desugars_to_expression() + .with_clauses([ + ClauseSurface::expr("FROM").required(), + ClauseSurface::expr("JOIN").repeating().after("FROM"), + ClauseSurface::expr("LEFT JOIN").repeating().after("FROM"), + ClauseSurface::expr("ON").repeating().after("JOIN"), + ClauseSurface::expr("WHERE").repeating().after("FROM"), + ClauseSurface::expr_list("GROUP BY") + .optional() + .after("WHERE"), + ClauseSurface::expr_list("EXPLODE") + .repeating() + .after("FROM"), + ClauseSurface::fields("WINDOW BY") + .optional() + .after("GROUP BY"), + ClauseSurface::expr_list("SELECT").optional().after("FROM"), + ClauseSurface::expr_list("SELECT DISTINCT") + .optional() + .after("FROM"), + ClauseSurface::expr_list("ORDER BY") + .optional() + .after("SELECT"), + ClauseSurface::expr("LIMIT").optional().after("ORDER BY"), + ]), + ) + .with_scoped_surface( + ScopedSurfaceDescriptor::leading_dot_path(QUERY_FIELD_DESCRIPTOR) + .with_eligibilities([ + ScopedSurfaceEligibility::clause_body(QUERY_KW, "WHERE"), + ScopedSurfaceEligibility::clause_body(QUERY_KW, "GROUP"), + ScopedSurfaceEligibility::clause_body(QUERY_KW, "SELECT"), + ScopedSurfaceEligibility::clause_body(QUERY_KW, "ORDER"), + ScopedSurfaceEligibility::clause_body(QUERY_KW, "EXPLODE"), + ScopedSurfaceEligibility::clause_body(QUERY_KW, "WINDOW"), + ScopedSurfaceEligibility::clause_body(QUERY_KW, "ON"), + ]) + .with_receiver(ScopedSurfaceReceiver::OwningDeclaration) + .with_misuse_scope(ScopedSurfaceMisuseScope::ActivatingFile) + .with_diagnostic( + ScopedSurfaceDiagnosticTemplate::new( + "inql-query-field-outside-scope", + ScopedSurfaceDiagnosticKind::OutsideScope, + "query field shorthand is only valid inside InQL query clauses", + ) + .with_help( + "move the leading-dot field reference into a `query:` clause", + ), + ), + ), + ) + .with_library_manifest(LibraryManifest { + helper_bindings: helper_bindings(), + ..LibraryManifest::default() + }) + .with_desugarer(InqlQueryDesugarer) +} + +fn helper_bindings() -> Vec { + HELPER_EXPORTS + .iter() + .map(|name| HelperBinding { + key: (*name).to_string(), + exported_name: (*name).to_string(), + }) + .collect() +} + +incan_vocab::export_wasm_desugarer!(InqlQueryDesugarer); From 512f01ad3c243b302e52f40ad11ff5ef2eb43fa8 Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Thu, 4 Jun 2026 20:10:34 +0200 Subject: [PATCH 02/11] feature - add RFC003 query-block closeout example (#4) --- Makefile | 22 +- examples/README.md | 50 +- examples/advanced_retail_analytics.incn | 160 + .../advanced_retail_query_blocks/incan.lock | 3691 +++++++++++++++++ .../advanced_retail_query_blocks/incan.toml | 9 + .../src/main.incn | 251 ++ ..._read_transform_write_order_lines_csv.incn | 8 +- src/window_builders.incn | 8 +- tests/fixtures/advanced_retail_orders.csv | 101 + tests/test_dataset.incn | 8 +- tests/test_prism.incn | 27 +- tests/test_session_aggregates.incn | 28 +- tests/test_session_filters.incn | 8 +- tests/test_session_projection.incn | 237 +- tests/test_session_windows.incn | 21 +- tests/test_window_functions.incn | 8 +- vocab_companion/src/desugar.rs | 5 +- vocab_companion/src/lib.rs | 14 + 18 files changed, 4471 insertions(+), 185 deletions(-) create mode 100644 examples/advanced_retail_analytics.incn create mode 100644 examples/advanced_retail_query_blocks/incan.lock create mode 100644 examples/advanced_retail_query_blocks/incan.toml create mode 100644 examples/advanced_retail_query_blocks/src/main.incn create mode 100644 tests/fixtures/advanced_retail_orders.csv diff --git a/Makefile b/Makefile index c20879c..6ad1bbb 100644 --- a/Makefile +++ b/Makefile @@ -62,14 +62,17 @@ test-locked: ## Run tests with `--locked` @$(INCAN) test $(INQL_TEST_DIR) --locked # ============================================================================= -# Formatting (Incan source — package only) +# Formatting (Incan source) # ============================================================================= # -# Scope to `src/`, `tests/`, and `examples/` only. CI checks out the Incan -# compiler under `./incan/`; formatting `.` would walk that tree and fail on -# stdlib snapshots and test fixtures that are not meant for `incan fmt`. +# Scope to InQL-owned source paths. CI checks out the Incan compiler under +# `./incan/`; formatting `.` would walk that tree and fail on stdlib snapshots +# and test fixtures that are not meant for `incan fmt`. Standalone example +# packages are listed by source directory so generated `target/` output stays +# outside the formatting walk. -INQL_FMT_DIRS := src tests examples +INQL_FMT_DIRS := src tests examples/advanced_retail_query_blocks/src +INQL_FMT_FILES := examples/*.incn .PHONY: fmt fmt: ## Format package `.incn` sources (`incan fmt` per directory) @@ -77,6 +80,9 @@ fmt: ## Format package `.incn` sources (`incan fmt` per directory) @for d in $(INQL_FMT_DIRS); do \ if [ -d "$$d" ]; then $(INCAN) fmt "$$d"; fi; \ done + @for f in $(INQL_FMT_FILES); do \ + if [ -f "$$f" ]; then $(INCAN) fmt "$$f"; fi; \ + done .PHONY: fmt-check fmt-check: ## Check formatting without writing (`incan fmt --check` per directory) @@ -87,6 +93,12 @@ fmt-check: ## Check formatting without writing (`incan fmt --check` per director $(INCAN) fmt --check "$$d" || exit $$?; \ fi; \ done + @for f in $(INQL_FMT_FILES); do \ + if [ -f "$$f" ]; then \ + echo "\033[1m -> $$f\033[0m"; \ + $(INCAN) fmt --check "$$f" || exit $$?; \ + fi; \ + done # ============================================================================= # Aggregates (local gates) diff --git a/examples/README.md b/examples/README.md index 15860ac..ba2d66b 100644 --- a/examples/README.md +++ b/examples/README.md @@ -17,6 +17,8 @@ ordinary Incan code. - `session_read_transform_write_order_lines_csv.incn` — Same flow with a realistic multi-column `OrderLine` model and fixture - `session_grouped_aggregate_csv.incn` — Grouped aggregate over `LazyFrame[AggregateOrder]` using `col(...)`, `sum(...)`, and `count()` - `session_with_column_csv.incn` — Derived-column example over `LazyFrame[AggregateOrder]` using `with_column(...)`, `mul(...)`, and `lit(...)` +- `advanced_retail_analytics.incn` — Larger 100-row retail method-chain spike covering scalar functions, JSON, URL parsing, hashing, aggregates, windows, and generators +- `advanced_retail_query_blocks/` — Dependency-consumer query-block version of the retail spike, covering RFC 003 vocab over the same 100-row fixture - `models.incn` — Shared `@derive(Clone)` row models for examples ## Running examples @@ -29,10 +31,55 @@ incan run examples/session_read_transform_write_csv.incn incan run examples/session_read_transform_write_order_lines_csv.incn incan run examples/session_grouped_aggregate_csv.incn incan run examples/session_with_column_csv.incn +incan run examples/advanced_retail_analytics.incn +(cd examples/advanced_retail_query_blocks && incan lock && incan run src/main.incn) ``` > Note: Session examples expect repo fixtures in `tests/fixtures/` and write output files to `tests/target/`. +## Advanced spike + +`advanced_retail_analytics.incn` reads `tests/fixtures/advanced_retail_orders.csv`, a 100-row CSV fixture with +quoted JSON event payloads. It materializes three outputs: + +- an enriched high-value order view with string cleanup, math, date extraction, JSON validation/extraction, URL query + extraction, hashing, regex, and nested array helpers +- a grouped paid-order rollup using `sum`, `avg`, `min`, `max`, `count`, and `count_distinct` +- a generated tag view that composes window ranking with `explode(...)` + +`advanced_retail_query_blocks/` is the same fixture exercised from a standalone dependency consumer. It imports +`pub::inql` and runs real RFC 003 query blocks for the high-value projection, grouped rollup, and generated-tag window +view: + +```incan +high_value = query { + FROM paid + SELECT + .order_id as order_id, + .customer_id as customer_id, + .region_norm as region_norm, + .net_amount as net_amount, + .campaign as campaign, + .channel as channel, + WHERE .net_amount > 100 + ORDER BY desc(.net_amount) + LIMIT 8 +} + +rollup = query { + FROM enriched + WHERE eq(.status_norm, "paid") + GROUP BY .region_norm, .channel + SELECT + .region_norm as region_norm, + .channel as channel, + sum(.net_amount) as total_net_amount, + avg(.net_amount) as avg_net_amount, + count() as order_count + ORDER BY desc(.total_net_amount) +} +``` + ## What these examples show These examples document the API patterns for the InQL dataset and Session surface: @@ -41,7 +88,8 @@ These examples document the API patterns for the InQL dataset and Session surfac 2. Carrier transformations remain typed Incan functions rather than stringly runtime scripts 3. Builder-based aggregation runs through `col(...)`, `sum(...)`, and `count()` 4. Builder-based scalar expressions run through `col(...)`, `lit(...)`, `eq(...)`, `gt(...)`, `add(...)`, and `mul(...)` -5. Session execution provides `collect`, `display`, and write sinks over DataFusion +5. Query blocks activate through `pub::inql` in dependency consumers and lower into the same Dataset/Prism/Substrait path +6. Session execution provides `collect`, `display`, and write sinks over DataFusion They serve three purposes: diff --git a/examples/advanced_retail_analytics.incn b/examples/advanced_retail_analytics.incn new file mode 100644 index 0000000..a0ef6aa --- /dev/null +++ b/examples/advanced_retail_analytics.incn @@ -0,0 +1,160 @@ +""" +Example: advanced retail analytics over a 100-row CSV fixture. + +This is a compact spike of the larger InQL surface: + +- typed Session CSV ingestion +- scalar cleanup and derivation +- JSON validation and path extraction +- arbitrary URL query-parameter extraction +- hashing, regex, math, and date/time helpers +- grouped aggregate measures +- window functions composed with generators +""" + +from dataset import DataFrame, LazyFrame +from functions import ( + array, + array_contains, + array_distinct, + avg, + check_json, + col, + count, + count_distinct, + date_part, + desc, + eq, + explode, + gt, + json_array_length, + json_extract_path_text, + lower, + max, + min, + modulo, + mul, + parse_url, + regexp_extract, + regexp_like, + round, + row_number, + sha256, + sub, + sum, + trim, + upper, + window, +) +from projection_builders import ColumnExpr +from session import Session, SessionError, report_session_error + + +@derive(Clone) +pub model RetailOrder: + pub order_id: int + pub customer_id: str + pub region: str + pub status: str + pub quantity: int + pub unit_price: float + pub discount_pct: float + pub created_at: str + pub product_url: str + pub event_json: str + + +const ADVANCED_RETAIL_CSV_FIXTURE: str = "tests/fixtures/advanced_retail_orders.csv" + + +def main() -> None: + mut session = Session.default() + match run_advanced_retail_spike(session): + Ok(_) => println("advanced retail analytics spike complete") + Err(err) => report_session_error("example.advanced_retail_analytics", err) + + +def run_advanced_retail_spike(mut session: Session) -> Result[None, SessionError]: + orders: LazyFrame[RetailOrder] = session.read_csv("advanced_retail_orders", ADVANCED_RETAIL_CSV_FIXTURE)? + enriched = enrich_orders(orders.clone()) + + _print_collected( + session.clone(), + "enriched high-value orders", + enriched + .filter(eq(col("status_norm"), "paid")) + .filter(gt(col("net_amount"), 100.0)) + .order_by([desc(col("net_amount"))]) + .limit(8), + )? + _print_collected(session.clone(), "paid rollup by region and channel", paid_rollup(enriched.clone()))? + _print_collected(session.clone(), "generated tags with customer order rank", generated_ranked_tags(orders))? + return Ok(None) + + +def enrich_orders(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailOrder]: + gross_amount = round(mul(col("quantity"), col("unit_price")), 2) + net_amount = round(mul(gross_amount.clone(), sub(1.0, col("discount_pct"))), 2) + return orders + .with_column("region_norm", upper(trim(col("region")))) + .with_column("status_norm", lower(trim(col("status")))) + .with_column("gross_amount", gross_amount) + .with_column("net_amount", net_amount) + .with_column("order_bucket", modulo(col("order_id"), 10)) + .with_column("order_year", date_part("year", col("created_at"))) + .with_column("customer_hash", sha256(col("customer_id"))) + .with_column("json_valid", check_json(col("event_json"))) + .with_column("event_type", json_extract_path_text(col("event_json"), "$.type")) + .with_column("channel", json_extract_path_text(col("event_json"), "$.channel")) + .with_column("tag_count", json_array_length(json_extract_path_text(col("event_json"), "$.tags"))) + .with_column("campaign", parse_url(col("product_url"), "campaign")) + .with_column("product_page", parse_url(col("product_url"), "page")) + .with_column("sku_family", regexp_extract(parse_url(col("product_url"), "sku"), "^SKU-([A-Z]+)", 1)) + .with_column("status_looks_clean", regexp_like(col("status_norm"), "^[a-z]+$")) + .with_column("is_paid_event", array_contains(_derived_tags(), "paid")) + + +def paid_rollup(enriched: LazyFrame[RetailOrder]) -> LazyFrame[RetailOrder]: + paid = enriched.filter(eq(col("status_norm"), "paid")) + mut measures = [sum(col("net_amount"))] + measures.append(avg(col("net_amount"))) + measures.append(min(col("net_amount"))) + measures.append(max(col("net_amount"))) + measures.append(count()) + measures.append(count_distinct(col("customer_id"))) + return paid.group_by([col("region_norm"), col("channel")]).agg(measures).order_by([desc(col("sum_net_amount"))]) + + +def generated_ranked_tags(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailOrder]: + tagged = orders + .with_column("status_norm", lower(trim(col("status")))) + .with_column("json_valid", check_json(col("event_json"))) + .with_column("campaign", parse_url(col("product_url"), "campaign")) + .with_column("channel", json_extract_path_text(col("event_json"), "$.channel")) + ranked = tagged.with_window_column( + "customer_order_rank", + row_number().over(window().partition_by([col("customer_id")]).order_by([desc(col("unit_price"))])), + ) + return ranked + .generate(explode(_derived_tags(), "derived_tag")) + .filter(eq(col("json_valid"), true)) + .order_by([col("customer_id"), col("customer_order_rank")]) + .limit(12) + + +def _derived_tags() -> ColumnExpr: + return array_distinct(array([col("status_norm"), col("campaign"), col("channel")])) + + +def _print_collected[T with Clone](mut session: Session, label: str, frame: LazyFrame[T]) -> Result[None, SessionError]: + df = session.collect(frame)? + _print_data_frame(label, df) + return Ok(None) + + +def _print_data_frame[T with Clone](label: str, df: DataFrame[T]) -> None: + println("") + println(label) + println(f"columns: {df.resolved_columns():?}") + println(f"rows: {df.row_count()}") + println(df.preview_text()) diff --git a/examples/advanced_retail_query_blocks/incan.lock b/examples/advanced_retail_query_blocks/incan.lock new file mode 100644 index 0000000..5953fe3 --- /dev/null +++ b/examples/advanced_retail_query_blocks/incan.lock @@ -0,0 +1,3691 @@ +# Auto-generated by Incan - do not edit manually +# Regenerate with: incan lock + +[incan] +format = 1 +incan-version = "0.3.0-rc49" +deps-fingerprint = "sha256:d9b25e5e158c43d31394908f35a1310ece1dee50b3ad7ee523a3e736a091b10b" +cargo-features = [] +cargo-no-default-features = false +cargo-all-features = false + +[cargo] +lock = """ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "adler2" +version = "2.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa" + +[[package]] +name = "advanced_retail_query_blocks" +version = "0.3.0-rc49" +dependencies = [ + "incan_derive", + "incan_stdlib", + "inql", +] + +[[package]] +name = "ahash" +version = "0.8.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5a15f179cd60c4584b8a8c596927aadc462e27f2ca70c04e0071964a73ba7a75" +dependencies = [ + "cfg-if", + "const-random", + "getrandom 0.3.4", + "once_cell", + "version_check", + "zerocopy", +] + +[[package]] +name = "aho-corasick" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301" +dependencies = [ + "memchr", +] + +[[package]] +name = "alloc-no-stdlib" +version = "2.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cc7bb162ec39d46ab1ca8c77bf72e890535becd1751bb45f64c597edb4c8c6b3" + +[[package]] +name = "alloc-stdlib" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "94fb8275041c72129eb51b7d0322c29b8387a0386127718b096429201a5d6ece" +dependencies = [ + "alloc-no-stdlib", +] + +[[package]] +name = "allocator-api2" +version = "0.2.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "683d7910e743518b0e34f1186f92494becacb047c7b6bf616c96772180fef923" + +[[package]] +name = "android_system_properties" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "819e7219dbd41043ac279b19830f2efc897156490d7fd6ea916720117ee66311" +dependencies = [ + "libc", +] + +[[package]] +name = "anyhow" +version = "1.0.102" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" + +[[package]] +name = "ar_archive_writer" +version = "0.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7eb93bbb63b9c227414f6eb3a0adfddca591a8ce1e9b60661bb08969b87e340b" +dependencies = [ + "object", +] + +[[package]] +name = "arrayref" +version = "0.3.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "76a2e8124351fda1ef8aaaa3bbd7ebbcb486bbcd4225aca0aa0d84bb2db8fecb" + +[[package]] +name = "arrayvec" +version = "0.7.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" + +[[package]] +name = "arrow" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "378530e55cd479eda3c14eb345310799717e6f76d0c332041e8487022166b471" +dependencies = [ + "arrow-arith", + "arrow-array", + "arrow-buffer", + "arrow-cast", + "arrow-csv", + "arrow-data", + "arrow-ipc", + "arrow-json", + "arrow-ord", + "arrow-row", + "arrow-schema", + "arrow-select", + "arrow-string", +] + +[[package]] +name = "arrow-arith" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a0ab212d2c1886e802f51c5212d78ebbcbb0bec980fff9dadc1eb8d45cd0b738" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "chrono", + "num-traits", +] + +[[package]] +name = "arrow-array" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cfd33d3e92f207444098c75b42de99d329562be0cf686b307b097cc52b4e999e" +dependencies = [ + "ahash", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "chrono", + "chrono-tz", + "half", + "hashbrown 0.17.1", + "num-complex", + "num-integer", + "num-traits", +] + +[[package]] +name = "arrow-buffer" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c6cd424c2693bcdbc150d843dc9d4d137dd2de4782ce6df491ad11a3a0416c0" +dependencies = [ + "bytes", + "half", + "num-bigint", + "num-traits", +] + +[[package]] +name = "arrow-cast" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4c5aefb56a2c02e9e2b30746241058b85f8983f0fcff2ba0c6d09006e1cded7f" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-ord", + "arrow-schema", + "arrow-select", + "atoi", + "base64", + "chrono", + "comfy-table", + "half", + "lexical-core", + "num-traits", + "ryu", +] + +[[package]] +name = "arrow-csv" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e94e8cf7e517657a52b91ea1263acf38c4ca62a84655d72458a3359b12ab97de" +dependencies = [ + "arrow-array", + "arrow-cast", + "arrow-schema", + "chrono", + "csv", + "csv-core", + "regex", +] + +[[package]] +name = "arrow-data" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3c88210023a2bfee1896af366309a3028fc3bcbd6515fa29a7990ee1baa08ee0" +dependencies = [ + "arrow-buffer", + "arrow-schema", + "half", + "num-integer", + "num-traits", +] + +[[package]] +name = "arrow-ipc" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "238438f0834483703d88896db6fe5a7138b2230debc31b34c0336c2996e3c64f" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "arrow-select", + "flatbuffers", + "lz4_flex", + "zstd", +] + +[[package]] +name = "arrow-json" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "205ca2119e6d679d5c133c6f30e68f027738d95ed948cf77677ea69c7800036b" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-cast", + "arrow-ord", + "arrow-schema", + "arrow-select", + "chrono", + "half", + "indexmap", + "itoa", + "lexical-core", + "memchr", + "num-traits", + "ryu", + "serde_core", + "serde_json", + "simdutf8", +] + +[[package]] +name = "arrow-ord" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1bffd8fd2579286a5d63bac898159873e5094a79009940bcb42bbfce4f19f1d0" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "arrow-select", +] + +[[package]] +name = "arrow-row" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bab5994731204603c73ba69267616c50f80780774c6bb0476f1f830625115e0c" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "half", +] + +[[package]] +name = "arrow-schema" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f633dbfdf39c039ada1bf9e34c694816eb71fbb7dc78f613993b7245e078a1ed" +dependencies = [ + "serde_core", + "serde_json", +] + +[[package]] +name = "arrow-select" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8cd065c54172ac787cf3f2f8d4107e0d3fdc26edba76fdf4f4cc170258942222" +dependencies = [ + "ahash", + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "num-traits", +] + +[[package]] +name = "arrow-string" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "29dd7cda3ab9692f43a2e4acc444d760cc17b12bb6d8232ddf64e9bab7c06b42" +dependencies = [ + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-schema", + "arrow-select", + "memchr", + "num-traits", + "regex", + "regex-syntax", +] + +[[package]] +name = "async-compression" +version = "0.4.42" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e79b3f8a79cccc2898f31920fc69f304859b3bd567490f75ebf51ae1c792a9ac" +dependencies = [ + "compression-codecs", + "compression-core", + "pin-project-lite", + "tokio", +] + +[[package]] +name = "async-recursion" +version = "1.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3b43422f69d8ff38f95f1b2bb76517c91589a924d1559a0e935d7c8ce0274c11" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "async-trait" +version = "0.1.89" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "atoi" +version = "2.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f28d99ec8bfea296261ca1af174f24225171fea9664ba9003cbebee704810528" +dependencies = [ + "num-traits", +] + +[[package]] +name = "autocfg" +version = "1.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f2032f911046de80f0a198e0901378627c33f59ea0ac00e363d481118bd70a53" + +[[package]] +name = "base64" +version = "0.22.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6" + +[[package]] +name = "bigdecimal" +version = "0.4.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4d6867f1565b3aad85681f1015055b087fcfd840d6aeee6eee7f2da317603695" +dependencies = [ + "autocfg", + "libm", + "num-bigint", + "num-integer", + "num-traits", +] + +[[package]] +name = "bitflags" +version = "2.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "84d7ced0ae9557296835c32bf1b1e02b44c746701f898460fb000d7eaa84f00a" + +[[package]] +name = "blake2" +version = "0.10.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46502ad458c9a52b69d4d4d32775c788b7a1b85e8bc9d482d92250fc0e3f8efe" +dependencies = [ + "digest", +] + +[[package]] +name = "blake3" +version = "1.8.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0aa83c34e62843d924f905e0f5c866eb1dd6545fc4d719e803d9ba6030371fce" +dependencies = [ + "arrayref", + "arrayvec", + "cc", + "cfg-if", + "constant_time_eq", + "cpufeatures 0.3.0", +] + +[[package]] +name = "block-buffer" +version = "0.10.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71" +dependencies = [ + "generic-array", +] + +[[package]] +name = "brotli" +version = "8.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8119e4516436f5708bbc474a9d395bf12f1b5395e93a92a56e647ac3388c8610" +dependencies = [ + "alloc-no-stdlib", + "alloc-stdlib", + "brotli-decompressor", +] + +[[package]] +name = "brotli-decompressor" +version = "5.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5962523e1b92ce1b5e793d9169b9943eece10d39f62550bc04bb605d75b94924" +dependencies = [ + "alloc-no-stdlib", + "alloc-stdlib", +] + +[[package]] +name = "bumpalo" +version = "3.20.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72f5acc6cb2ba439de613abc23857ec3d78374d8ed5ac84e9d11336e87da8649" + +[[package]] +name = "byteorder" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b" + +[[package]] +name = "bytes" +version = "1.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33" + +[[package]] +name = "bzip2" +version = "0.6.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f3a53fac24f34a81bc9954b5d6cfce0c21e18ec6959f44f56e8e90e4bb7c346c" +dependencies = [ + "libbz2-rs-sys", +] + +[[package]] +name = "cc" +version = "1.2.63" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "556e016178bb5662a08681bbe0f00f8e17631781a4dfc8c45e466e4b185ec27f" +dependencies = [ + "find-msvc-tools", + "jobserver", + "libc", + "shlex", +] + +[[package]] +name = "cfg-if" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" + +[[package]] +name = "chrono" +version = "0.4.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1aa79e62e7697b8e29b513a68abacf485adcd1fe8284a4316c5ae868e6633327" +dependencies = [ + "iana-time-zone", + "num-traits", + "windows-link", +] + +[[package]] +name = "chrono-tz" +version = "0.10.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a6139a8597ed92cf816dfb33f5dd6cf0bb93a6adc938f11039f371bc5bcd26c3" +dependencies = [ + "chrono", + "phf", +] + +[[package]] +name = "cmake" +version = "0.1.58" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c0f78a02292a74a88ac736019ab962ece0bc380e3f977bf72e376c5d78ff0678" +dependencies = [ + "cc", +] + +[[package]] +name = "comfy-table" +version = "7.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "958c5d6ecf1f214b4c2bbbbf6ab9523a864bd136dcf71a7e8904799acfe1ad47" +dependencies = [ + "unicode-segmentation", + "unicode-width", +] + +[[package]] +name = "compression-codecs" +version = "0.4.38" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ce2548391e9c1929c21bf6aa2680af86fe4c1b33e6cea9ac1cfeec0bd11218cf" +dependencies = [ + "bzip2", + "compression-core", + "flate2", + "liblzma", + "memchr", + "zstd", + "zstd-safe", +] + +[[package]] +name = "compression-core" +version = "0.4.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cc14f565cf027a105f7a44ccf9e5b424348421a1d8952a8fc9d499d313107789" + +[[package]] +name = "const-random" +version = "0.1.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "87e00182fe74b066627d63b85fd550ac2998d4b0bd86bfed477a0ae4c7c71359" +dependencies = [ + "const-random-macro", +] + +[[package]] +name = "const-random-macro" +version = "0.1.16" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f9d839f2a20b0aee515dc581a6172f2321f96cab76c1a38a4c584a194955390e" +dependencies = [ + "getrandom 0.2.17", + "once_cell", + "tiny-keccak", +] + +[[package]] +name = "constant_time_eq" +version = "0.4.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3d52eff69cd5e647efe296129160853a42795992097e8af39800e1060caeea9b" + +[[package]] +name = "core-foundation-sys" +version = "0.8.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b" + +[[package]] +name = "cpufeatures" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280" +dependencies = [ + "libc", +] + +[[package]] +name = "cpufeatures" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8b2a41393f66f16b0823bb79094d54ac5fbd34ab292ddafb9a0456ac9f87d201" +dependencies = [ + "libc", +] + +[[package]] +name = "crc32fast" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9481c1c90cbf2ac953f07c8d4a58aa3945c425b7185c9154d67a65e4230da511" +dependencies = [ + "cfg-if", +] + +[[package]] +name = "crossbeam-utils" +version = "0.8.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28" + +[[package]] +name = "crunchy" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5" + +[[package]] +name = "crypto-common" +version = "0.1.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "78c8292055d1c1df0cce5d180393dc8cce0abec0a7102adb6c7b1eef6016d60a" +dependencies = [ + "generic-array", + "typenum", +] + +[[package]] +name = "csv" +version = "1.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "52cd9d68cf7efc6ddfaaee42e7288d3a99d613d4b50f76ce9827ae0c6e14f938" +dependencies = [ + "csv-core", + "itoa", + "ryu", + "serde_core", +] + +[[package]] +name = "csv-core" +version = "0.1.13" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "704a3c26996a80471189265814dbc2c257598b96b8a7feae2d31ace646bb9782" +dependencies = [ + "memchr", +] + +[[package]] +name = "dashmap" +version = "6.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6361d5c062261c78a176addb82d4c821ae42bed6089de0e12603cd25de2059c" +dependencies = [ + "cfg-if", + "crossbeam-utils", + "hashbrown 0.14.5", + "lock_api", + "once_cell", + "parking_lot_core", +] + +[[package]] +name = "datafusion" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "93db0e623840612f7f2cd757f7e8a8922064192363732c88692e0870016e141b" +dependencies = [ + "arrow", + "arrow-schema", + "async-trait", + "bytes", + "bzip2", + "chrono", + "datafusion-catalog", + "datafusion-catalog-listing", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-datasource", + "datafusion-datasource-arrow", + "datafusion-datasource-csv", + "datafusion-datasource-json", + "datafusion-datasource-parquet", + "datafusion-execution", + "datafusion-expr", + "datafusion-expr-common", + "datafusion-functions", + "datafusion-functions-aggregate", + "datafusion-functions-nested", + "datafusion-functions-table", + "datafusion-functions-window", + "datafusion-optimizer", + "datafusion-physical-expr", + "datafusion-physical-expr-adapter", + "datafusion-physical-expr-common", + "datafusion-physical-optimizer", + "datafusion-physical-plan", + "datafusion-session", + "datafusion-sql", + "flate2", + "futures", + "itertools", + "liblzma", + "log", + "object_store", + "parking_lot", + "parquet", + "rand", + "regex", + "sqlparser", + "tempfile", + "tokio", + "url", + "uuid", + "zstd", +] + +[[package]] +name = "datafusion-catalog" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "37cefde60b26a7f4ff61e9d2ff2833322f91df2b568d7238afe67bde5bdffb66" +dependencies = [ + "arrow", + "async-trait", + "dashmap", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-datasource", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-expr", + "datafusion-physical-plan", + "datafusion-session", + "futures", + "itertools", + "log", + "object_store", + "parking_lot", + "tokio", +] + +[[package]] +name = "datafusion-catalog-listing" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "17e112307715d6a7a331111a4c2330ff54bc237183511c319e3708a4cff431fb" +dependencies = [ + "arrow", + "async-trait", + "datafusion-catalog", + "datafusion-common", + "datafusion-datasource", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-expr", + "datafusion-physical-expr-adapter", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "futures", + "itertools", + "log", + "object_store", +] + +[[package]] +name = "datafusion-common" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d72a11ca44a95e1081870d3abb80c717496e8a7acb467a1d3e932bb636af5cc2" +dependencies = [ + "ahash", + "arrow", + "arrow-ipc", + "chrono", + "half", + "hashbrown 0.16.1", + "indexmap", + "itertools", + "libc", + "log", + "object_store", + "parquet", + "paste", + "recursive", + "sqlparser", + "tokio", + "web-time", +] + +[[package]] +name = "datafusion-common-runtime" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "89f4afaed29670ec4fd6053643adc749fe3f4bc9d1ce1b8c5679b22c67d12def" +dependencies = [ + "futures", + "log", + "tokio", +] + +[[package]] +name = "datafusion-datasource" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e9fb386e1691355355a96419978a0022b7947b44d4a24a6ea99f00b6b485cbb6" +dependencies = [ + "arrow", + "async-compression", + "async-trait", + "bytes", + "bzip2", + "chrono", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-expr", + "datafusion-physical-expr-adapter", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "datafusion-session", + "flate2", + "futures", + "glob", + "itertools", + "liblzma", + "log", + "object_store", + "rand", + "tokio", + "tokio-util", + "url", + "zstd", +] + +[[package]] +name = "datafusion-datasource-arrow" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ffa6c52cfed0734c5f93754d1c0175f558175248bf686c944fb05c373e5fc096" +dependencies = [ + "arrow", + "arrow-ipc", + "async-trait", + "bytes", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-datasource", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "datafusion-session", + "futures", + "itertools", + "object_store", + "tokio", +] + +[[package]] +name = "datafusion-datasource-csv" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "503f29e0582c1fc189578d665ff57d9300da1f80c282777d7eb67bb79fb8cdca" +dependencies = [ + "arrow", + "async-trait", + "bytes", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-datasource", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "datafusion-session", + "futures", + "object_store", + "regex", + "tokio", +] + +[[package]] +name = "datafusion-datasource-json" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e33804749abc8d0c8cb7473228483cb8070e524c6f6086ee1b85a64debe2b3d2" +dependencies = [ + "arrow", + "async-trait", + "bytes", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-datasource", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "datafusion-session", + "futures", + "object_store", + "serde_json", + "tokio", + "tokio-stream", +] + +[[package]] +name = "datafusion-datasource-parquet" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32a8e0365e0e08e8ff94d912f0ababcf9065a1a304018ba90b1fc83c855b4997" +dependencies = [ + "arrow", + "async-trait", + "bytes", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-datasource", + "datafusion-execution", + "datafusion-expr", + "datafusion-functions-aggregate-common", + "datafusion-physical-expr", + "datafusion-physical-expr-adapter", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "datafusion-pruning", + "datafusion-session", + "futures", + "itertools", + "log", + "object_store", + "parking_lot", + "parquet", + "tokio", +] + +[[package]] +name = "datafusion-doc" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8de6ac0df1662b9148ad3c987978b32cbec7c772f199b1d53520c8fa764a87ee" + +[[package]] +name = "datafusion-execution" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c03c7fbdaefcca4ef6ffe425a5fc2325763bfb426599bb0bf4536466efabe709" +dependencies = [ + "arrow", + "arrow-buffer", + "async-trait", + "chrono", + "dashmap", + "datafusion-common", + "datafusion-expr", + "datafusion-physical-expr-common", + "futures", + "log", + "object_store", + "parking_lot", + "rand", + "tempfile", + "url", +] + +[[package]] +name = "datafusion-expr" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "574b9b6977fedbd2a611cbff12e5caf90f31640ad9dc5870f152836d94bad0dd" +dependencies = [ + "arrow", + "async-trait", + "chrono", + "datafusion-common", + "datafusion-doc", + "datafusion-expr-common", + "datafusion-functions-aggregate-common", + "datafusion-functions-window-common", + "datafusion-physical-expr-common", + "indexmap", + "itertools", + "paste", + "recursive", + "serde_json", + "sqlparser", +] + +[[package]] +name = "datafusion-expr-common" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7d7c3adf3db8bf61e92eb90cb659c8e8b734593a8f7c8e12a843c7ddba24b87e" +dependencies = [ + "arrow", + "datafusion-common", + "indexmap", + "itertools", + "paste", +] + +[[package]] +name = "datafusion-functions" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f28aa4e10384e782774b10e72aca4d93ef7b31aa653095d9d4536b0a3dbc51b6" +dependencies = [ + "arrow", + "arrow-buffer", + "base64", + "blake2", + "blake3", + "chrono", + "chrono-tz", + "datafusion-common", + "datafusion-doc", + "datafusion-execution", + "datafusion-expr", + "datafusion-expr-common", + "datafusion-macros", + "hex", + "itertools", + "log", + "md-5", + "memchr", + "num-traits", + "rand", + "regex", + "sha2", + "unicode-segmentation", + "uuid", +] + +[[package]] +name = "datafusion-functions-aggregate" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "00aa6217e56098ba84e0a338176fe52f0a84cca398021512c6c8c5eff806d0ad" +dependencies = [ + "ahash", + "arrow", + "datafusion-common", + "datafusion-doc", + "datafusion-execution", + "datafusion-expr", + "datafusion-functions-aggregate-common", + "datafusion-macros", + "datafusion-physical-expr", + "datafusion-physical-expr-common", + "half", + "log", + "num-traits", + "paste", +] + +[[package]] +name = "datafusion-functions-aggregate-common" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b511250349407db7c43832ab2de63f5557b19a20dfd236b39ca2c04468b50d47" +dependencies = [ + "ahash", + "arrow", + "datafusion-common", + "datafusion-expr-common", + "datafusion-physical-expr-common", +] + +[[package]] +name = "datafusion-functions-nested" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ef13a858e20d50f0a9bb5e96e7ac82b4e7597f247515bccca4fdd2992df0212a" +dependencies = [ + "arrow", + "arrow-ord", + "datafusion-common", + "datafusion-doc", + "datafusion-execution", + "datafusion-expr", + "datafusion-expr-common", + "datafusion-functions", + "datafusion-functions-aggregate", + "datafusion-functions-aggregate-common", + "datafusion-macros", + "datafusion-physical-expr-common", + "hashbrown 0.16.1", + "itertools", + "itoa", + "log", + "paste", +] + +[[package]] +name = "datafusion-functions-table" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b40d3f5bbb3905f9ccb1ce9485a9595c77b69758a7c24d3ba79e334ff51e7e" +dependencies = [ + "arrow", + "async-trait", + "datafusion-catalog", + "datafusion-common", + "datafusion-expr", + "datafusion-physical-plan", + "parking_lot", + "paste", +] + +[[package]] +name = "datafusion-functions-window" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d4e88ec9d57c9b685d02f58bfee7be62d72610430ddcedb82a08e5d9925dbfb6" +dependencies = [ + "arrow", + "datafusion-common", + "datafusion-doc", + "datafusion-expr", + "datafusion-functions-window-common", + "datafusion-macros", + "datafusion-physical-expr", + "datafusion-physical-expr-common", + "log", + "paste", +] + +[[package]] +name = "datafusion-functions-window-common" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8307bb93519b1a91913723a1130cfafeee3f72200d870d88e91a6fc5470ede5c" +dependencies = [ + "datafusion-common", + "datafusion-physical-expr-common", +] + +[[package]] +name = "datafusion-macros" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2e367e6a71051d0ebdd29b2f85d12059b38b1d1f172c6906e80016da662226bd" +dependencies = [ + "datafusion-doc", + "quote", + "syn", +] + +[[package]] +name = "datafusion-optimizer" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e929015451a67f77d9d8b727b2bf3a40c4445fdef6cdc53281d7d97c76888ace" +dependencies = [ + "arrow", + "chrono", + "datafusion-common", + "datafusion-expr", + "datafusion-expr-common", + "datafusion-physical-expr", + "indexmap", + "itertools", + "log", + "recursive", + "regex", + "regex-syntax", +] + +[[package]] +name = "datafusion-physical-expr" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4b1e68aba7a4b350401cfdf25a3d6f989ad898a7410164afe9ca52080244cb59" +dependencies = [ + "ahash", + "arrow", + "datafusion-common", + "datafusion-expr", + "datafusion-expr-common", + "datafusion-functions-aggregate-common", + "datafusion-physical-expr-common", + "half", + "hashbrown 0.16.1", + "indexmap", + "itertools", + "parking_lot", + "paste", + "petgraph", + "recursive", + "tokio", +] + +[[package]] +name = "datafusion-physical-expr-adapter" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ea22315f33cf2e0adc104e8ec42e285f6ed93998d565c65e82fec6a9ee9f9db4" +dependencies = [ + "arrow", + "datafusion-common", + "datafusion-expr", + "datafusion-functions", + "datafusion-physical-expr", + "datafusion-physical-expr-common", + "itertools", +] + +[[package]] +name = "datafusion-physical-expr-common" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b04b45ea8ad3ac2d78f2ea2a76053e06591c9629c7a603eda16c10649ecf4362" +dependencies = [ + "ahash", + "arrow", + "chrono", + "datafusion-common", + "datafusion-expr-common", + "hashbrown 0.16.1", + "indexmap", + "itertools", + "parking_lot", +] + +[[package]] +name = "datafusion-physical-optimizer" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7cb13397809a425918f608dfe8653f332015a3e330004ab191b4404187238b95" +dependencies = [ + "arrow", + "datafusion-common", + "datafusion-execution", + "datafusion-expr", + "datafusion-expr-common", + "datafusion-physical-expr", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "datafusion-pruning", + "itertools", + "recursive", +] + +[[package]] +name = "datafusion-physical-plan" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5edc023675791af9d5fb4cc4c24abf5f7bd3bd4dcf9e5bd90ea1eff6976dcc79" +dependencies = [ + "ahash", + "arrow", + "arrow-ord", + "arrow-schema", + "async-trait", + "datafusion-common", + "datafusion-common-runtime", + "datafusion-execution", + "datafusion-expr", + "datafusion-functions", + "datafusion-functions-aggregate-common", + "datafusion-functions-window-common", + "datafusion-physical-expr", + "datafusion-physical-expr-common", + "futures", + "half", + "hashbrown 0.16.1", + "indexmap", + "itertools", + "log", + "num-traits", + "parking_lot", + "pin-project-lite", + "tokio", +] + +[[package]] +name = "datafusion-pruning" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ac8c76860e355616555081cab5968cec1af7a80701ff374510860bcd567e365a" +dependencies = [ + "arrow", + "datafusion-common", + "datafusion-datasource", + "datafusion-expr-common", + "datafusion-physical-expr", + "datafusion-physical-expr-common", + "datafusion-physical-plan", + "itertools", + "log", +] + +[[package]] +name = "datafusion-session" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5412111aa48e2424ba926112e192f7a6b7e4ccb450145d25ce5ede9f19dc491e" +dependencies = [ + "async-trait", + "datafusion-common", + "datafusion-execution", + "datafusion-expr", + "datafusion-physical-plan", + "parking_lot", +] + +[[package]] +name = "datafusion-sql" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fa0d133ddf8b9b3b872acac900157f783e7b879fe9a6bccf389abebbfac45ec1" +dependencies = [ + "arrow", + "bigdecimal", + "chrono", + "datafusion-common", + "datafusion-expr", + "datafusion-functions-nested", + "indexmap", + "log", + "recursive", + "regex", + "sqlparser", +] + +[[package]] +name = "datafusion-substrait" +version = "53.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "98494539a5468979cc42d86c7bc5f0f8cb71ee5c742694c26fc34efdd29dd2e5" +dependencies = [ + "async-recursion", + "async-trait", + "chrono", + "datafusion", + "half", + "itertools", + "object_store", + "pbjson-types", + "prost", + "substrait 0.62.2", + "tokio", + "url", +] + +[[package]] +name = "digest" +version = "0.10.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292" +dependencies = [ + "block-buffer", + "crypto-common", + "subtle", +] + +[[package]] +name = "displaydoc" +version = "0.2.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ac70aa55017e108007fbaf5aa0f54b021c98f92ff8af59d42eda9da96e3dd4f" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "dyn-clone" +version = "1.0.20" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0881ea181b1df73ff77ffaaf9c7544ecc11e82fba9b5f27b262a3c73a332555" + +[[package]] +name = "either" +version = "1.16.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "91622ff5e7162018101f2fea40d6ebf4a78bbe5a49736a2020649edf9693679e" + +[[package]] +name = "encoding_rs" +version = "0.8.35" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "75030f3c4f45dafd7586dd6780965a8c7e8e285a5ecb86713e63a79c5b2766f3" +dependencies = [ + "cfg-if", +] + +[[package]] +name = "equivalent" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" + +[[package]] +name = "errno" +version = "0.3.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "39cab71617ae0d63f51a36d69f866391735b51691dbda63cf6f96d042b63efeb" +dependencies = [ + "libc", + "windows-sys", +] + +[[package]] +name = "fastrand" +version = "2.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9f1f227452a390804cdb637b74a86990f2a7d7ba4b7d5693aac9b4dd6defd8d6" + +[[package]] +name = "find-msvc-tools" +version = "0.1.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5baebc0774151f905a1a2cc41989300b1e6fbb29aff0ceffa1064fdd3088d582" + +[[package]] +name = "fixedbitset" +version = "0.5.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1d674e81391d1e1ab681a28d99df07927c6d4aa5b027d7da16ba32d1d21ecd99" + +[[package]] +name = "flatbuffers" +version = "25.12.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "35f6839d7b3b98adde531effaf34f0c2badc6f4735d26fe74709d8e513a96ef3" +dependencies = [ + "bitflags", + "rustc_version", +] + +[[package]] +name = "flate2" +version = "1.1.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "843fba2746e448b37e26a819579957415c8cef339bf08564fe8b7ddbd959573c" +dependencies = [ + "crc32fast", + "miniz_oxide", + "zlib-rs", +] + +[[package]] +name = "foldhash" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2" + +[[package]] +name = "foldhash" +version = "0.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "77ce24cb58228fbb8aa041425bb1050850ac19177686ea6e0f41a70416f56fdb" + +[[package]] +name = "form_urlencoded" +version = "1.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cb4cb245038516f5f85277875cdaa4f7d2c9a0fa0468de06ed190163b1581fcf" +dependencies = [ + "percent-encoding", +] + +[[package]] +name = "futures" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8b147ee9d1f6d097cef9ce628cd2ee62288d963e16fb287bd9286455b241382d" +dependencies = [ + "futures-channel", + "futures-core", + "futures-executor", + "futures-io", + "futures-sink", + "futures-task", + "futures-util", +] + +[[package]] +name = "futures-channel" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "07bbe89c50d7a535e539b8c17bc0b49bdb77747034daa8087407d655f3f7cc1d" +dependencies = [ + "futures-core", + "futures-sink", +] + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-executor" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "baf29c38818342a3b26b5b923639e7b1f4a61fc5e76102d4b1981c6dc7a7579d" +dependencies = [ + "futures-core", + "futures-task", + "futures-util", +] + +[[package]] +name = "futures-io" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cecba35d7ad927e23624b22ad55235f2239cfa44fd10428eecbeba6d6a717718" + +[[package]] +name = "futures-macro" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-sink" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c39754e157331b013978ec91992bde1ac089843443c49cbc7f46150b0fad0893" + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-channel", + "futures-core", + "futures-io", + "futures-macro", + "futures-sink", + "futures-task", + "memchr", + "pin-project-lite", + "slab", +] + +[[package]] +name = "generic-array" +version = "0.14.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" +dependencies = [ + "typenum", + "version_check", +] + +[[package]] +name = "getrandom" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ff2abc00be7fca6ebc474524697ae276ad847ad0a6b3faa4bcb027e9a4614ad0" +dependencies = [ + "cfg-if", + "libc", + "wasi", +] + +[[package]] +name = "getrandom" +version = "0.3.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd" +dependencies = [ + "cfg-if", + "libc", + "r-efi 5.3.0", + "wasip2", +] + +[[package]] +name = "getrandom" +version = "0.4.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0de51e6874e94e7bf76d726fc5d13ba782deca734ff60d5bb2fb2607c7406555" +dependencies = [ + "cfg-if", + "libc", + "r-efi 6.0.0", + "wasip2", + "wasip3", +] + +[[package]] +name = "glob" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280" + +[[package]] +name = "half" +version = "2.7.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ea2d84b969582b4b1864a92dc5d27cd2b77b622a8d79306834f1be5ba20d84b" +dependencies = [ + "cfg-if", + "crunchy", + "num-traits", + "zerocopy", +] + +[[package]] +name = "hashbrown" +version = "0.14.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e5274423e17b7c9fc20b6e7e208532f9b19825d82dfd615708b70edd83df41f1" + +[[package]] +name = "hashbrown" +version = "0.15.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1" +dependencies = [ + "foldhash 0.1.5", +] + +[[package]] +name = "hashbrown" +version = "0.16.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100" +dependencies = [ + "allocator-api2", + "equivalent", + "foldhash 0.2.0", +] + +[[package]] +name = "hashbrown" +version = "0.17.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ed5909b6e89a2db4456e54cd5f673791d7eca6732202bbf2a9cc504fe2f9b84a" + +[[package]] +name = "heck" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea" + +[[package]] +name = "hex" +version = "0.4.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f24254aa9a54b5c858eaee2f5bccdb46aaf0e486a595ed5fd8f86ba55232a70" + +[[package]] +name = "http" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8be7462df143984c4598a256ef469b251d7d7f9e271135073e78fc535414f3d0" +dependencies = [ + "bytes", + "itoa", +] + +[[package]] +name = "humantime" +version = "2.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "135b12329e5e3ce057a9f972339ea52bc954fe1e9358ef27f95e89716fbc5424" + +[[package]] +name = "iana-time-zone" +version = "0.1.65" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e31bc9ad994ba00e440a8aa5c9ef0ec67d5cb5e5cb0cc7f8b744a35b389cc470" +dependencies = [ + "android_system_properties", + "core-foundation-sys", + "iana-time-zone-haiku", + "js-sys", + "log", + "wasm-bindgen", + "windows-core", +] + +[[package]] +name = "iana-time-zone-haiku" +version = "0.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f31827a206f56af32e590ba56d5d2d085f558508192593743f16b2306495269f" +dependencies = [ + "cc", +] + +[[package]] +name = "icu_collections" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2984d1cd16c883d7935b9e07e44071dca8d917fd52ecc02c04d5fa0b5a3f191c" +dependencies = [ + "displaydoc", + "potential_utf", + "utf8_iter", + "yoke", + "zerofrom", + "zerovec", +] + +[[package]] +name = "icu_locale_core" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "92219b62b3e2b4d88ac5119f8904c10f8f61bf7e95b640d25ba3075e6cac2c29" +dependencies = [ + "displaydoc", + "litemap", + "tinystr", + "writeable", + "zerovec", +] + +[[package]] +name = "icu_normalizer" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c56e5ee99d6e3d33bd91c5d85458b6005a22140021cc324cea84dd0e72cff3b4" +dependencies = [ + "icu_collections", + "icu_normalizer_data", + "icu_properties", + "icu_provider", + "smallvec", + "zerovec", +] + +[[package]] +name = "icu_normalizer_data" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "da3be0ae77ea334f4da67c12f149704f19f81d1adf7c51cf482943e84a2bad38" + +[[package]] +name = "icu_properties" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bee3b67d0ea5c2cca5003417989af8996f8604e34fb9ddf96208a033901e70de" +dependencies = [ + "icu_collections", + "icu_locale_core", + "icu_properties_data", + "icu_provider", + "zerotrie", + "zerovec", +] + +[[package]] +name = "icu_properties_data" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8e2bbb201e0c04f7b4b3e14382af113e17ba4f63e2c9d2ee626b720cbce54a14" + +[[package]] +name = "icu_provider" +version = "2.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "139c4cf31c8b5f33d7e199446eff9c1e02decfc2f0eec2c8d71f65befa45b421" +dependencies = [ + "displaydoc", + "icu_locale_core", + "writeable", + "yoke", + "zerofrom", + "zerotrie", + "zerovec", +] + +[[package]] +name = "id-arena" +version = "2.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3d3067d79b975e8844ca9eb072e16b31c3c1c36928edf9c6789548c524d0d954" + +[[package]] +name = "idna" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3b0875f23caa03898994f6ddc501886a45c7d3d62d04d2d90788d47be1b1e4de" +dependencies = [ + "idna_adapter", + "smallvec", + "utf8_iter", +] + +[[package]] +name = "idna_adapter" +version = "1.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cb68373c0d6620ef8105e855e7745e18b0d00d3bdb07fb532e434244cdb9a714" +dependencies = [ + "icu_normalizer", + "icu_properties", +] + +[[package]] +name = "incan_core" +version = "0.3.0-rc49" +dependencies = [ + "serde", +] + +[[package]] +name = "incan_derive" +version = "0.3.0-rc49" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "incan_stdlib" +version = "0.3.0-rc49" +dependencies = [ + "incan_core", + "incan_derive", + "serde", + "serde_json", + "tokio", + "xxhash-rust", +] + +[[package]] +name = "indexmap" +version = "2.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d466e9454f08e4a911e14806c24e16fba1b4c121d1ea474396f396069cf949d9" +dependencies = [ + "equivalent", + "hashbrown 0.17.1", + "serde", + "serde_core", +] + +[[package]] +name = "inql" +version = "0.3.0-rc49" +dependencies = [ + "blake2", + "blake3", + "byteorder", + "crc32fast", + "datafusion", + "datafusion-common", + "datafusion-expr", + "datafusion-substrait", + "encoding_rs", + "incan_derive", + "incan_stdlib", + "md-5", + "prost", + "prost-types", + "regex", + "rustix", + "serde", + "sha1", + "sha2", + "sha3", + "substrait 0.63.0", + "url", + "xxhash-rust", +] + +[[package]] +name = "integer-encoding" +version = "3.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8bb03732005da905c88227371639bf1ad885cc712789c011c31c5fb3ab3ccf02" + +[[package]] +name = "itertools" +version = "0.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2b192c782037fadd9cfa75548310488aabdbf3d2da73885b31bd0abd03351285" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "jobserver" +version = "0.1.34" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9afb3de4395d6b3e67a780b6de64b51c978ecf11cb9a462c66be7d4ca9039d33" +dependencies = [ + "getrandom 0.3.4", + "libc", +] + +[[package]] +name = "js-sys" +version = "0.3.99" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "142bc4740e452c1e57ade0cbc129f139c9093e354346f0872ef985f4f5cf5f11" +dependencies = [ + "cfg-if", + "futures-util", + "once_cell", + "wasm-bindgen", +] + +[[package]] +name = "keccak" +version = "0.1.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cb26cec98cce3a3d96cbb7bced3c4b16e3d13f27ec56dbd62cbc8f39cfb9d653" +dependencies = [ + "cpufeatures 0.2.17", +] + +[[package]] +name = "leb128fmt" +version = "0.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09edd9e8b54e49e587e4f6295a7d29c3ea94d469cb40ab8ca70b288248a81db2" + +[[package]] +name = "lexical-core" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7d8d125a277f807e55a77304455eb7b1cb52f2b18c143b60e766c120bd64a594" +dependencies = [ + "lexical-parse-float", + "lexical-parse-integer", + "lexical-util", + "lexical-write-float", + "lexical-write-integer", +] + +[[package]] +name = "lexical-parse-float" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "52a9f232fbd6f550bc0137dcb5f99ab674071ac2d690ac69704593cb4abbea56" +dependencies = [ + "lexical-parse-integer", + "lexical-util", +] + +[[package]] +name = "lexical-parse-integer" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a7a039f8fb9c19c996cd7b2fcce303c1b2874fe1aca544edc85c4a5f8489b34" +dependencies = [ + "lexical-util", +] + +[[package]] +name = "lexical-util" +version = "1.0.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2604dd126bb14f13fb5d1bd6a66155079cb9fa655b37f875b3a742c705dbed17" + +[[package]] +name = "lexical-write-float" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "50c438c87c013188d415fbabbb1dceb44249ab81664efbd31b14ae55dabb6361" +dependencies = [ + "lexical-util", + "lexical-write-integer", +] + +[[package]] +name = "lexical-write-integer" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "409851a618475d2d5796377cad353802345cba92c867d9fbcde9cf4eac4e14df" +dependencies = [ + "lexical-util", +] + +[[package]] +name = "libbz2-rs-sys" +version = "0.2.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "34b357333733e8260735ba5894eb928c02ecc69c78715f01a8019e7fa7f2db4c" + +[[package]] +name = "libc" +version = "0.2.186" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "68ab91017fe16c622486840e4c83c9a37afeff978bd239b5293d61ece587de66" + +[[package]] +name = "liblzma" +version = "0.4.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6033b77c21d1f56deeae8014eb9fbe7bdf1765185a6c508b5ca82eeaed7f899" +dependencies = [ + "liblzma-sys", +] + +[[package]] +name = "liblzma-sys" +version = "0.4.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1a60851d15cd8c5346eca4ab8babff585be2ae4bc8097c067291d3ffe2add3b6" +dependencies = [ + "cc", + "libc", + "pkg-config", +] + +[[package]] +name = "libm" +version = "0.2.16" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6d2cec3eae94f9f509c767b45932f1ada8350c4bdb85af2fcab4a3c14807981" + +[[package]] +name = "linux-raw-sys" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53" + +[[package]] +name = "litemap" +version = "0.8.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "92daf443525c4cce67b150400bc2316076100ce0b3686209eb8cf3c31612e6f0" + +[[package]] +name = "lock_api" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "224399e74b87b5f3557511d98dff8b14089b3dadafcab6bb93eab67d3aace965" +dependencies = [ + "scopeguard", +] + +[[package]] +name = "log" +version = "0.4.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "953f07c43838f8e6f9758cab68bf5bed85465e7587ebe0b823f1bcd81978ad3a" + +[[package]] +name = "lz4_flex" +version = "0.13.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7ef0d4ed8669f8f8826eb00dc878084aa8f253506c4fd5e8f58f5bce72ddb97e" +dependencies = [ + "twox-hash", +] + +[[package]] +name = "md-5" +version = "0.10.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d89e7ee0cfbedfc4da3340218492196241d89eefb6dab27de5df917a6d2e78cf" +dependencies = [ + "cfg-if", + "digest", +] + +[[package]] +name = "memchr" +version = "2.8.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6b947ae49db0d222b1dbc6b113ce7248a3fc3a6ca21b696717bfc000ba4484d8" + +[[package]] +name = "miniz_oxide" +version = "0.8.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316" +dependencies = [ + "adler2", + "simd-adler32", +] + +[[package]] +name = "mio" +version = "1.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "02bd0af71c67b473010cbbc60715ee815645a4dc942899111f494b4b737d6fda" +dependencies = [ + "libc", + "wasi", + "windows-sys", +] + +[[package]] +name = "multimap" +version = "0.10.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1d87ecb2933e8aeadb3e3a02b828fed80a7528047e68b4f424523a0981a3a084" + +[[package]] +name = "num-bigint" +version = "0.4.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a5e44f723f1133c9deac646763579fdb3ac745e418f2a7af9cd0c431da1f20b9" +dependencies = [ + "num-integer", + "num-traits", +] + +[[package]] +name = "num-complex" +version = "0.4.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "73f88a1307638156682bada9d7604135552957b7818057dcef22705b4d509495" +dependencies = [ + "num-traits", +] + +[[package]] +name = "num-integer" +version = "0.1.46" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7969661fd2958a5cb096e56c8e1ad0444ac2bbcd0061bd28660485a44879858f" +dependencies = [ + "num-traits", +] + +[[package]] +name = "num-traits" +version = "0.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" +dependencies = [ + "autocfg", + "libm", +] + +[[package]] +name = "object" +version = "0.37.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ff76201f031d8863c38aa7f905eca4f53abbfa15f609db4277d44cd8938f33fe" +dependencies = [ + "memchr", +] + +[[package]] +name = "object_store" +version = "0.13.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "622acbc9100d3c10e2ee15804b0caa40e55c933d5aa53814cd520805b7958a49" +dependencies = [ + "async-trait", + "bytes", + "chrono", + "futures-channel", + "futures-core", + "futures-util", + "http", + "humantime", + "itertools", + "parking_lot", + "percent-encoding", + "thiserror", + "tokio", + "tracing", + "url", + "walkdir", + "wasm-bindgen-futures", + "web-time", +] + +[[package]] +name = "once_cell" +version = "1.21.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9f7c3e4beb33f85d45ae3e3a1792185706c8e16d043238c593331cc7cd313b50" + +[[package]] +name = "ordered-float" +version = "2.10.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "68f19d67e5a2795c94e73e0bb1cc1a7edeb2e28efd39e2e1c9b7a40c1108b11c" +dependencies = [ + "num-traits", +] + +[[package]] +name = "parking_lot" +version = "0.12.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "93857453250e3077bd71ff98b6a65ea6621a19bb0f559a85248955ac12c45a1a" +dependencies = [ + "lock_api", + "parking_lot_core", +] + +[[package]] +name = "parking_lot_core" +version = "0.9.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2621685985a2ebf1c516881c026032ac7deafcda1a2c9b7850dc81e3dfcb64c1" +dependencies = [ + "cfg-if", + "libc", + "redox_syscall", + "smallvec", + "windows-link", +] + +[[package]] +name = "parquet" +version = "58.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5dafa7d01085b62a47dd0c1829550a0a36710ea9c4fe358a05a85477cec8a908" +dependencies = [ + "ahash", + "arrow-array", + "arrow-buffer", + "arrow-data", + "arrow-ipc", + "arrow-schema", + "arrow-select", + "base64", + "brotli", + "bytes", + "chrono", + "flate2", + "futures", + "half", + "hashbrown 0.17.1", + "lz4_flex", + "num-bigint", + "num-integer", + "num-traits", + "object_store", + "paste", + "seq-macro", + "simdutf8", + "snap", + "thrift", + "tokio", + "twox-hash", + "zstd", +] + +[[package]] +name = "paste" +version = "1.0.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a" + +[[package]] +name = "pbjson" +version = "0.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "898bac3fa00d0ba57a4e8289837e965baa2dee8c3749f3b11d45a64b4223d9c3" +dependencies = [ + "base64", + "serde", +] + +[[package]] +name = "pbjson-build" +version = "0.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "af22d08a625a2213a78dbb0ffa253318c5c79ce3133d32d296655a7bdfb02095" +dependencies = [ + "heck", + "itertools", + "prost", + "prost-types", +] + +[[package]] +name = "pbjson-types" +version = "0.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8e748e28374f10a330ee3bb9f29b828c0ac79831a32bab65015ad9b661ead526" +dependencies = [ + "bytes", + "chrono", + "pbjson", + "pbjson-build", + "prost", + "prost-build", + "serde", +] + +[[package]] +name = "percent-encoding" +version = "2.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220" + +[[package]] +name = "petgraph" +version = "0.8.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8701b58ea97060d5e5b155d383a69952a60943f0e6dfe30b04c287beb0b27455" +dependencies = [ + "fixedbitset", + "hashbrown 0.15.5", + "indexmap", + "serde", +] + +[[package]] +name = "phf" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "913273894cec178f401a31ec4b656318d95473527be05c0752cc41cdc32be8b7" +dependencies = [ + "phf_shared", +] + +[[package]] +name = "phf_shared" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "06005508882fb681fd97892ecff4b7fd0fee13ef1aa569f8695dae7ab9099981" +dependencies = [ + "siphasher", +] + +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + +[[package]] +name = "pkg-config" +version = "0.3.33" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "19f132c84eca552bf34cab8ec81f1c1dcc229b811638f9d283dceabe58c5569e" + +[[package]] +name = "potential_utf" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0103b1cef7ec0cf76490e969665504990193874ea05c85ff9bab8b911d0a0564" +dependencies = [ + "zerovec", +] + +[[package]] +name = "ppv-lite86" +version = "0.2.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85eae3c4ed2f50dcfe72643da4befc30deadb458a9b590d720cde2f2b1e97da9" +dependencies = [ + "zerocopy", +] + +[[package]] +name = "prettyplease" +version = "0.2.37" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "479ca8adacdd7ce8f1fb39ce9ecccbfe93a3f1344b3d0d97f20bc0196208f62b" +dependencies = [ + "proc-macro2", + "syn", +] + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "prost" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d2ea70524a2f82d518bce41317d0fae74151505651af45faf1ffbd6fd33f0568" +dependencies = [ + "bytes", + "prost-derive", +] + +[[package]] +name = "prost-build" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "343d3bd7056eda839b03204e68deff7d1b13aba7af2b2fd16890697274262ee7" +dependencies = [ + "heck", + "itertools", + "log", + "multimap", + "petgraph", + "prettyplease", + "prost", + "prost-types", + "regex", + "syn", + "tempfile", +] + +[[package]] +name = "prost-derive" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "27c6023962132f4b30eb4c172c91ce92d933da334c59c23cddee82358ddafb0b" +dependencies = [ + "anyhow", + "itertools", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "prost-types" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8991c4cbdb8bc5b11f0b074ffe286c30e523de90fee5ba8132f1399f23cb3dd7" +dependencies = [ + "prost", +] + +[[package]] +name = "protobuf-src" +version = "2.1.1+27.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6217c3504da19b85a3a4b2e9a5183d635822d83507ba0986624b5c05b83bfc40" +dependencies = [ + "cmake", +] + +[[package]] +name = "psm" +version = "0.1.31" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "645dbe486e346d9b5de3ef16ede18c26e6c70ad97418f4874b8b1889d6e761ea" +dependencies = [ + "ar_archive_writer", + "cc", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "r-efi" +version = "5.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f" + +[[package]] +name = "r-efi" +version = "6.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8dcc9c7d52a811697d2151c701e0d08956f92b0e24136cf4cf27b57a6a0d9bf" + +[[package]] +name = "rand" +version = "0.9.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "44c5af06bb1b7d3216d91932aed5265164bf384dc89cd6ba05cf59a35f5f76ea" +dependencies = [ + "rand_chacha", + "rand_core", +] + +[[package]] +name = "rand_chacha" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d3022b5f1df60f26e1ffddd6c66e8aa15de382ae63b3a0c1bfc0e4d3e3f325cb" +dependencies = [ + "ppv-lite86", + "rand_core", +] + +[[package]] +name = "rand_core" +version = "0.9.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "76afc826de14238e6e8c374ddcc1fa19e374fd8dd986b0d2af0d02377261d83c" +dependencies = [ + "getrandom 0.3.4", +] + +[[package]] +name = "recursive" +version = "0.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0786a43debb760f491b1bc0269fe5e84155353c67482b9e60d0cfb596054b43e" +dependencies = [ + "recursive-proc-macro-impl", + "stacker", +] + +[[package]] +name = "recursive-proc-macro-impl" +version = "0.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "76009fbe0614077fc1a2ce255e3a1881a2e3a3527097d5dc6d8212c585e7e38b" +dependencies = [ + "quote", + "syn", +] + +[[package]] +name = "redox_syscall" +version = "0.5.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ed2bf2547551a7053d6fdfafda3f938979645c44812fbfcda098faae3f1a362d" +dependencies = [ + "bitflags", +] + +[[package]] +name = "regex" +version = "1.12.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276" +dependencies = [ + "aho-corasick", + "memchr", + "regex-automata", + "regex-syntax", +] + +[[package]] +name = "regex-automata" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.8.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a" + +[[package]] +name = "regress" +version = "0.10.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2057b2325e68a893284d1538021ab90279adac1139957ca2a74426c6f118fb48" +dependencies = [ + "hashbrown 0.16.1", + "memchr", +] + +[[package]] +name = "rustc_version" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92" +dependencies = [ + "semver", +] + +[[package]] +name = "rustix" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6fe4565b9518b83ef4f91bb47ce29620ca828bd32cb7e408f0062e9930ba190" +dependencies = [ + "bitflags", + "errno", + "libc", + "linux-raw-sys", + "windows-sys", +] + +[[package]] +name = "rustversion" +version = "1.0.22" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d" + +[[package]] +name = "ryu" +version = "1.0.23" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f" + +[[package]] +name = "same-file" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502" +dependencies = [ + "winapi-util", +] + +[[package]] +name = "schemars" +version = "0.8.22" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3fbf2ae1b8bc8e02df939598064d22402220cd5bbcca1c76f7d6a310974d5615" +dependencies = [ + "dyn-clone", + "schemars_derive", + "serde", + "serde_json", +] + +[[package]] +name = "schemars_derive" +version = "0.8.22" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32e265784ad618884abaea0600a9adf15393368d840e0222d101a072f3f7534d" +dependencies = [ + "proc-macro2", + "quote", + "serde_derive_internals", + "syn", +] + +[[package]] +name = "scopeguard" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49" + +[[package]] +name = "semver" +version = "1.0.28" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8a7852d02fc848982e0c167ef163aaff9cd91dc640ba85e263cb1ce46fae51cd" +dependencies = [ + "serde", + "serde_core", +] + +[[package]] +name = "seq-macro" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1bc711410fbe7399f390ca1c3b60ad0f53f80e95c5eb935e52268a0e2cd49acc" + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_derive_internals" +version = "0.29.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "18d26a20a969b9e3fdf2fc2d9f21eda6c40e2de84c9408bb5d3b05d499aae711" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.150" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9" +dependencies = [ + "indexmap", + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "serde_tokenstream" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d7c49585c52c01f13c5c2ebb333f14f6885d76daa768d8a037d28017ec538c69" +dependencies = [ + "proc-macro2", + "quote", + "serde", + "syn", +] + +[[package]] +name = "serde_yaml" +version = "0.9.34+deprecated" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6a8b1a1a2ebf674015cc02edccce75287f1a0130d394307b36743c2f5d504b47" +dependencies = [ + "indexmap", + "itoa", + "ryu", + "serde", + "unsafe-libyaml", +] + +[[package]] +name = "sha1" +version = "0.10.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e3bf829a2d51ab4a5ddf1352d8470c140cadc8301b2ae1789db023f01cedd6ba" +dependencies = [ + "cfg-if", + "cpufeatures 0.2.17", + "digest", +] + +[[package]] +name = "sha2" +version = "0.10.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283" +dependencies = [ + "cfg-if", + "cpufeatures 0.2.17", + "digest", +] + +[[package]] +name = "sha3" +version = "0.10.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "77fd7028345d415a4034cf8777cd4f8ab1851274233b45f84e3d955502d93874" +dependencies = [ + "digest", + "keccak", +] + +[[package]] +name = "shlex" +version = "2.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8fadd59c855ef2080decdef8ff161eb6661b86933c9d82e5ba29dc602a55aba" + +[[package]] +name = "simd-adler32" +version = "0.3.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "703d5c7ef118737c72f1af64ad2f6f8c5e1921f818cdcb97b8fe6fc69bf66214" + +[[package]] +name = "simdutf8" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e3a9fe34e3e7a50316060351f37187a3f546bce95496156754b601a5fa71b76e" + +[[package]] +name = "siphasher" +version = "1.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8ee5873ec9cce0195efcb7a4e9507a04cd49aec9c83d0389df45b1ef7ba2e649" + +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + +[[package]] +name = "smallvec" +version = "1.15.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03" + +[[package]] +name = "snap" +version = "1.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1b6b67fb9a61334225b5b790716f609cd58395f895b3fe8b328786812a40bc3b" + +[[package]] +name = "socket2" +version = "0.6.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "52d1cfed4120b4d927bf7c0f86d2087a4a7d6027c906d9f9d525a80573b9be51" +dependencies = [ + "libc", + "windows-sys", +] + +[[package]] +name = "sqlparser" +version = "0.61.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dbf5ea8d4d7c808e1af1cbabebca9a2abe603bcefc22294c5b95018d53200cb7" +dependencies = [ + "log", + "recursive", + "sqlparser_derive", +] + +[[package]] +name = "sqlparser_derive" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a6dd45d8fc1c79299bfbb7190e42ccbbdf6a5f52e4a6ad98d92357ea965bd289" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "stable_deref_trait" +version = "1.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596" + +[[package]] +name = "stacker" +version = "0.1.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "640c8cdd92b6b12f5bcb1803ca3bbf5ab96e5e6b6b96b9ab77dabe9e880b3190" +dependencies = [ + "cc", + "cfg-if", + "libc", + "psm", + "windows-sys", +] + +[[package]] +name = "substrait" +version = "0.62.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "62fc4b483a129b9772ccb9c3f7945a472112fdd9140da87f8a4e7f1d44e045d0" +dependencies = [ + "heck", + "pbjson", + "pbjson-build", + "pbjson-types", + "prettyplease", + "prost", + "prost-build", + "prost-types", + "protobuf-src", + "regress", + "schemars", + "semver", + "serde", + "serde_json", + "serde_yaml", + "syn", + "typify", + "walkdir", +] + +[[package]] +name = "substrait" +version = "0.63.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e620ff4d5c02fd6f7752931aa74b16a26af66a63022cc1ad412c77edbe0bab47" +dependencies = [ + "heck", + "indexmap", + "prettyplease", + "prost", + "prost-build", + "prost-types", + "protobuf-src", + "regress", + "schemars", + "semver", + "serde", + "serde_json", + "serde_yaml", + "syn", + "typify", + "walkdir", +] + +[[package]] +name = "subtle" +version = "2.6.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "13c2bddecc57b384dee18652358fb23172facb8a2c51ccc10d74c157bdea3292" + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "synstructure" +version = "0.13.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "728a70f3dbaf5bab7f0c4b1ac8d7ae5ea60a4b5549c8a5914361c99147a709d2" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "tempfile" +version = "3.27.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32497e9a4c7b38532efcdebeef879707aa9f794296a4f0244f6f69e9bc8574bd" +dependencies = [ + "fastrand", + "getrandom 0.4.2", + "once_cell", + "rustix", + "windows-sys", +] + +[[package]] +name = "thiserror" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "thrift" +version = "0.17.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e54bc85fc7faa8bc175c4bab5b92ba8d9a3ce893d0e9f42cc455c8ab16a9e09" +dependencies = [ + "byteorder", + "integer-encoding", + "ordered-float", +] + +[[package]] +name = "tiny-keccak" +version = "2.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2c9d3793400a45f954c52e73d068316d76b6f4e36977e3fcebb13a2721e80237" +dependencies = [ + "crunchy", +] + +[[package]] +name = "tinystr" +version = "0.8.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8323304221c2a851516f22236c5722a72eaa19749016521d6dff0824447d96d" +dependencies = [ + "displaydoc", + "zerovec", +] + +[[package]] +name = "tokio" +version = "1.52.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fc7f01b389ac15039e4dc9531aa973a135d7a4135281b12d7c1bc79fd57fffe" +dependencies = [ + "bytes", + "libc", + "mio", + "pin-project-lite", + "socket2", + "tokio-macros", + "windows-sys", +] + +[[package]] +name = "tokio-macros" +version = "2.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "385a6cb71ab9ab790c5fe8d67f1645e6c450a7ce006a33de03daa956cf70a496" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "tokio-stream" +version = "0.1.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32da49809aab5c3bc678af03902d4ccddea2a87d028d86392a4b1560c6906c70" +dependencies = [ + "futures-core", + "pin-project-lite", + "tokio", + "tokio-util", +] + +[[package]] +name = "tokio-util" +version = "0.7.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ae9cec805b01e8fc3fd2fe289f89149a9b66dd16786abd8b19cfa7b48cb0098" +dependencies = [ + "bytes", + "futures-core", + "futures-sink", + "pin-project-lite", + "tokio", +] + +[[package]] +name = "tracing" +version = "0.1.44" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "63e71662fa4b2a2c3a26f570f037eb95bb1f85397f3cd8076caed2f026a6d100" +dependencies = [ + "pin-project-lite", + "tracing-attributes", + "tracing-core", +] + +[[package]] +name = "tracing-attributes" +version = "0.1.31" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7490cfa5ec963746568740651ac6781f701c9c5ea257c58e057f3ba8cf69e8da" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "tracing-core" +version = "0.1.36" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "db97caf9d906fbde555dd62fa95ddba9eecfd14cb388e4f491a66d74cd5fb79a" +dependencies = [ + "once_cell", +] + +[[package]] +name = "twox-hash" +version = "2.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ea3136b675547379c4bd395ca6b938e5ad3c3d20fad76e7fe85f9e0d011419c" + +[[package]] +name = "typenum" +version = "1.20.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6f5e870be6c3b371b77fe0ee0bafb859fa4964b4404c27de1d380043c4dda20" + +[[package]] +name = "typify" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6d5bcc6f62eb1fa8aa4098f39b29f93dcb914e17158b76c50360911257aa629" +dependencies = [ + "typify-impl", + "typify-macro", +] + +[[package]] +name = "typify-impl" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a1eb359f7ffa4f9ebe947fa11a1b2da054564502968db5f317b7e37693cb2240" +dependencies = [ + "heck", + "log", + "proc-macro2", + "quote", + "regress", + "schemars", + "semver", + "serde", + "serde_json", + "syn", + "thiserror", + "unicode-ident", +] + +[[package]] +name = "typify-macro" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "911c32f3c8514b048c1b228361bebb5e6d73aeec01696e8cc0e82e2ffef8ab7a" +dependencies = [ + "proc-macro2", + "quote", + "schemars", + "semver", + "serde", + "serde_json", + "serde_tokenstream", + "syn", + "typify-impl", +] + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "unicode-segmentation" +version = "1.13.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c6f5d3c3b1bf09027a88a6bc961fc00497d651009560b5463668dc81b0fa87a8" + +[[package]] +name = "unicode-width" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b4ac048d71ede7ee76d585517add45da530660ef4390e49b098733c6e897f254" + +[[package]] +name = "unicode-xid" +version = "0.2.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ebc1c04c71510c7f702b52b7c350734c9ff1295c464a03335b00bb84fc54f853" + +[[package]] +name = "unsafe-libyaml" +version = "0.2.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "673aac59facbab8a9007c7f6108d11f63b603f7cabff99fabf650fea5c32b861" + +[[package]] +name = "url" +version = "2.5.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ff67a8a4397373c3ef660812acab3268222035010ab8680ec4215f38ba3d0eed" +dependencies = [ + "form_urlencoded", + "idna", + "percent-encoding", + "serde", +] + +[[package]] +name = "utf8_iter" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6c140620e7ffbb22c2dee59cafe6084a59b5ffc27a8859a5f0d494b5d52b6be" + +[[package]] +name = "uuid" +version = "1.23.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d258b83ceec21034727ecee8c382cfa6c3e133699b0742c64571814fb420c9f7" +dependencies = [ + "getrandom 0.4.2", + "js-sys", + "wasm-bindgen", +] + +[[package]] +name = "version_check" +version = "0.9.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" + +[[package]] +name = "walkdir" +version = "2.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "29790946404f91d9c5d06f9874efddea1dc06c5efe94541a7d6863108e3a5e4b" +dependencies = [ + "same-file", + "winapi-util", +] + +[[package]] +name = "wasi" +version = "0.11.1+wasi-snapshot-preview1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ccf3ec651a847eb01de73ccad15eb7d99f80485de043efb2f370cd654f4ea44b" + +[[package]] +name = "wasip2" +version = "1.0.3+wasi-0.2.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "20064672db26d7cdc89c7798c48a0fdfac8213434a1186e5ef29fd560ae223d6" +dependencies = [ + "wit-bindgen 0.57.1", +] + +[[package]] +name = "wasip3" +version = "0.4.0+wasi-0.3.0-rc-2026-01-06" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5428f8bf88ea5ddc08faddef2ac4a67e390b88186c703ce6dbd955e1c145aca5" +dependencies = [ + "wit-bindgen 0.51.0", +] + +[[package]] +name = "wasm-bindgen" +version = "0.2.122" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3ed04576f974d2b2fba0f38c51dbc5518011e38c36bf1143164be765528fd409" +dependencies = [ + "cfg-if", + "once_cell", + "rustversion", + "wasm-bindgen-macro", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-futures" +version = "0.4.72" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9473dbd2991ae90b6291c3c32c30c6187ac49aa32f9905d1cce280ec1e110b0f" +dependencies = [ + "js-sys", + "wasm-bindgen", +] + +[[package]] +name = "wasm-bindgen-macro" +version = "0.2.122" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "916151b09da36bd82f6615cbf3a419e2f0ba23a03c6160e8e92eb6bd4aa1dec6" +dependencies = [ + "quote", + "wasm-bindgen-macro-support", +] + +[[package]] +name = "wasm-bindgen-macro-support" +version = "0.2.122" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "299047362ccbfce148b67ab7e73349f77748e00c8296f9542adfad2ad82c5c5e" +dependencies = [ + "bumpalo", + "proc-macro2", + "quote", + "syn", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-shared" +version = "0.2.122" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a929b2c61f11ba3e9bc35b50c1f25cb38e0e892c0c231ae2b8cf78d5dad4437" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "wasm-encoder" +version = "0.244.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "990065f2fe63003fe337b932cfb5e3b80e0b4d0f5ff650e6985b1048f62c8319" +dependencies = [ + "leb128fmt", + "wasmparser", +] + +[[package]] +name = "wasm-metadata" +version = "0.244.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bb0e353e6a2fbdc176932bbaab493762eb1255a7900fe0fea1a2f96c296cc909" +dependencies = [ + "anyhow", + "indexmap", + "wasm-encoder", + "wasmparser", +] + +[[package]] +name = "wasmparser" +version = "0.244.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "47b807c72e1bac69382b3a6fb3dbe8ea4c0ed87ff5629b8685ae6b9a611028fe" +dependencies = [ + "bitflags", + "hashbrown 0.15.5", + "indexmap", + "semver", +] + +[[package]] +name = "web-time" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5a6580f308b1fad9207618087a65c04e7a10bc77e02c8e84e9b00dd4b12fa0bb" +dependencies = [ + "js-sys", + "wasm-bindgen", +] + +[[package]] +name = "winapi-util" +version = "0.1.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c2a7b1c03c876122aa43f3020e6c3c3ee5c05081c9a00739faf7503aeba10d22" +dependencies = [ + "windows-sys", +] + +[[package]] +name = "windows-core" +version = "0.62.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8e83a14d34d0623b51dce9581199302a221863196a1dde71a7663a4c2be9deb" +dependencies = [ + "windows-implement", + "windows-interface", + "windows-link", + "windows-result", + "windows-strings", +] + +[[package]] +name = "windows-implement" +version = "0.60.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "053e2e040ab57b9dc951b72c264860db7eb3b0200ba345b4e4c3b14f67855ddf" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "windows-interface" +version = "0.59.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3f316c4a2570ba26bbec722032c4099d8c8bc095efccdc15688708623367e358" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "windows-link" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5" + +[[package]] +name = "windows-result" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7781fa89eaf60850ac3d2da7af8e5242a5ea78d1a11c49bf2910bb5a73853eb5" +dependencies = [ + "windows-link", +] + +[[package]] +name = "windows-strings" +version = "0.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7837d08f69c77cf6b07689544538e017c1bfcf57e34b4c0ff58e6c2cd3b37091" +dependencies = [ + "windows-link", +] + +[[package]] +name = "windows-sys" +version = "0.61.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc" +dependencies = [ + "windows-link", +] + +[[package]] +name = "wit-bindgen" +version = "0.51.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d7249219f66ced02969388cf2bb044a09756a083d0fab1e566056b04d9fbcaa5" +dependencies = [ + "wit-bindgen-rust-macro", +] + +[[package]] +name = "wit-bindgen" +version = "0.57.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ebf944e87a7c253233ad6766e082e3cd714b5d03812acc24c318f549614536e" + +[[package]] +name = "wit-bindgen-core" +version = "0.51.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ea61de684c3ea68cb082b7a88508a8b27fcc8b797d738bfc99a82facf1d752dc" +dependencies = [ + "anyhow", + "heck", + "wit-parser", +] + +[[package]] +name = "wit-bindgen-rust" +version = "0.51.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b7c566e0f4b284dd6561c786d9cb0142da491f46a9fbed79ea69cdad5db17f21" +dependencies = [ + "anyhow", + "heck", + "indexmap", + "prettyplease", + "syn", + "wasm-metadata", + "wit-bindgen-core", + "wit-component", +] + +[[package]] +name = "wit-bindgen-rust-macro" +version = "0.51.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c0f9bfd77e6a48eccf51359e3ae77140a7f50b1e2ebfe62422d8afdaffab17a" +dependencies = [ + "anyhow", + "prettyplease", + "proc-macro2", + "quote", + "syn", + "wit-bindgen-core", + "wit-bindgen-rust", +] + +[[package]] +name = "wit-component" +version = "0.244.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9d66ea20e9553b30172b5e831994e35fbde2d165325bec84fc43dbf6f4eb9cb2" +dependencies = [ + "anyhow", + "bitflags", + "indexmap", + "log", + "serde", + "serde_derive", + "serde_json", + "wasm-encoder", + "wasm-metadata", + "wasmparser", + "wit-parser", +] + +[[package]] +name = "wit-parser" +version = "0.244.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ecc8ac4bc1dc3381b7f59c34f00b67e18f910c2c0f50015669dde7def656a736" +dependencies = [ + "anyhow", + "id-arena", + "indexmap", + "log", + "semver", + "serde", + "serde_derive", + "serde_json", + "unicode-xid", + "wasmparser", +] + +[[package]] +name = "writeable" +version = "0.6.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ffae5123b2d3fc086436f8834ae3ab053a283cfac8fe0a0b8eaae044768a4c4" + +[[package]] +name = "xxhash-rust" +version = "0.8.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fdd20c5420375476fbd4394763288da7eb0cc0b8c11deed431a91562af7335d3" + +[[package]] +name = "yoke" +version = "0.8.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "709fe23a0424b6a435d82152b1bd3fdfb0833487d5fa90d05d42762a9891fef5" +dependencies = [ + "stable_deref_trait", + "yoke-derive", + "zerofrom", +] + +[[package]] +name = "yoke-derive" +version = "0.8.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "de844c262c8848816172cef550288e7dc6c7b7814b4ee56b3e1553f275f1858e" +dependencies = [ + "proc-macro2", + "quote", + "syn", + "synstructure", +] + +[[package]] +name = "zerocopy" +version = "0.8.50" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3b065d4f0e55f82fae73202e189638116a87c55ab6b8e6c2721e13dd9d854ad1" +dependencies = [ + "zerocopy-derive", +] + +[[package]] +name = "zerocopy-derive" +version = "0.8.50" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b631b19d36a892ab55420c92dbc83ccd79274f25be714855d3074aa71cab639" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "zerofrom" +version = "0.1.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0ec05a11813ea801ff6d75110ad09cd0824ddba17dfe17128ea0d5f68e6c5272" +dependencies = [ + "zerofrom-derive", +] + +[[package]] +name = "zerofrom-derive" +version = "0.1.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "11532158c46691caf0f2593ea8358fed6bbf68a0315e80aae9bd41fbade684a1" +dependencies = [ + "proc-macro2", + "quote", + "syn", + "synstructure", +] + +[[package]] +name = "zerotrie" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0f9152d31db0792fa83f70fb2f83148effb5c1f5b8c7686c3459e361d9bc20bf" +dependencies = [ + "displaydoc", + "yoke", + "zerofrom", +] + +[[package]] +name = "zerovec" +version = "0.11.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "90f911cbc359ab6af17377d242225f4d75119aec87ea711a880987b18cd7b239" +dependencies = [ + "yoke", + "zerofrom", + "zerovec-derive", +] + +[[package]] +name = "zerovec-derive" +version = "0.11.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "625dc425cab0dca6dc3c3319506e6593dcb08a9f387ea3b284dbd52a92c40555" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "zlib-rs" +version = "0.6.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3be3d40e40a133f9c916ee3f9f4fa2d9d63435b5fbe1bfc6d9dae0aa0ada1513" + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" + +[[package]] +name = "zstd" +version = "0.13.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e91ee311a569c327171651566e07972200e76fcfe2242a4fa446149a3881c08a" +dependencies = [ + "zstd-safe", +] + +[[package]] +name = "zstd-safe" +version = "7.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f49c4d5f0abb602a93fb8736af2a4f4dd9512e36f7f570d66e65ff867ed3b9d" +dependencies = [ + "zstd-sys", +] + +[[package]] +name = "zstd-sys" +version = "2.0.16+zstd.1.5.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "91e19ebc2adc8f83e43039e79776e3fda8ca919132d68a1fed6a5faca2683748" +dependencies = [ + "cc", + "pkg-config", +] +""" diff --git a/examples/advanced_retail_query_blocks/incan.toml b/examples/advanced_retail_query_blocks/incan.toml new file mode 100644 index 0000000..7215eb7 --- /dev/null +++ b/examples/advanced_retail_query_blocks/incan.toml @@ -0,0 +1,9 @@ +[project] +name = "advanced_retail_query_blocks" +version = "0.1.0" + +[dependencies] +inql = { path = "../.." } + +[project.scripts] +main = "src/main.incn" diff --git a/examples/advanced_retail_query_blocks/src/main.incn b/examples/advanced_retail_query_blocks/src/main.incn new file mode 100644 index 0000000..09f03ab --- /dev/null +++ b/examples/advanced_retail_query_blocks/src/main.incn @@ -0,0 +1,251 @@ +""" +Advanced retail analytics over InQL's dependency-activated query-block surface. + +Run from this directory with: + + incan run src/main.incn +""" + +import pub::inql +from pub::inql import ( + DataFrame, + LazyFrame, + Session, + SessionError, + array, + array_distinct, + avg, + check_json, + col, + count, + count_distinct, + date_part, + desc, + eq, + explode, + gt, + json_array_length, + json_extract_path_text, + lower, + max, + min, + modulo, + mul, + parse_url, + regexp_extract, + regexp_like, + round, + row_number, + sha256, + sub, + sum, + trim, + upper, + window, +) + + +@derive(Clone) +pub model RetailOrder: + pub order_id: int + pub customer_id: str + pub region: str + pub status: str + pub quantity: int + pub unit_price: float + pub discount_pct: float + pub created_at: str + pub product_url: str + pub event_json: str + + +@derive(Clone) +pub model RetailHighValueOrder: + pub order_id: int + pub customer_id: str + pub region_norm: str + pub gross_amount: float + pub net_amount: float + pub order_bucket: int + pub order_year: int + pub customer_hash: str + pub json_valid: bool + pub event_type: str + pub channel: str + pub tag_count: int + pub campaign: str + pub product_page: str + pub sku_family: str + pub status_looks_clean: bool + + +@derive(Clone) +pub model RetailRollup: + pub region_norm: str + pub channel: str + pub total_net_amount: float + pub avg_net_amount: float + pub min_net_amount: float + pub max_net_amount: float + pub order_count: int + pub customer_count: int + + +@derive(Clone) +pub model RetailGeneratedTag: + pub order_id: int + pub customer_id: str + pub status_norm: str + pub campaign: str + pub channel: str + pub customer_order_rank: int + pub derived_tag: str + + +const ADVANCED_RETAIL_CSV_FIXTURE: str = "../../tests/fixtures/advanced_retail_orders.csv" + + +def main() -> None: + mut session = Session.default() + match run_query_block_spike(session): + Ok(_) => println("advanced retail query-block spike complete") + Err(err) => println(err.error_message()) + + +def run_query_block_spike(mut session: Session) -> Result[None, SessionError]: + # Load a typed CSV once, then reuse the same lazy source across separate query-block pipelines. + orders: LazyFrame[RetailOrder] = session.read_csv("advanced_retail_orders_query", ADVANCED_RETAIL_CSV_FIXTURE)? + + _print_collected(session.clone(), "query-block high-value orders", high_value_orders(orders.clone()))? + _print_collected(session.clone(), "query-block paid rollup by region and channel", paid_rollup(orders.clone()))? + _print_collected( + session.clone(), + "query-block generated tags with customer order rank", + generated_ranked_tags(orders), + )? + return Ok(None) + + +def high_value_orders(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailHighValueOrder]: + # Do row-level scalar cleanup in the fluent API before the relational query block. + enriched = orders + .with_column("region_norm", upper(trim(col("region")))) + .with_column("status_norm", lower(trim(col("status")))) + .with_column("gross_amount", round(mul(col("quantity"), col("unit_price")), 2)) + .with_column( + "net_amount", + round(mul(round(mul(col("quantity"), col("unit_price")), 2), sub(1.0, col("discount_pct"))), 2), + ) + .with_column("order_bucket", modulo(col("order_id"), 10)) + .with_column("order_year", date_part("year", col("created_at"))) + .with_column("customer_hash", sha256(col("customer_id"))) + .with_column("json_valid", check_json(col("event_json"))) + .with_column("event_type", json_extract_path_text(col("event_json"), "$.type")) + .with_column("channel", json_extract_path_text(col("event_json"), "$.channel")) + .with_column("tag_count", json_array_length(json_extract_path_text(col("event_json"), "$.tags"))) + .with_column("campaign", parse_url(col("product_url"), "campaign")) + .with_column("product_page", parse_url(col("product_url"), "page")) + .with_column("sku_family", regexp_extract(parse_url(col("product_url"), "sku"), "^SKU-([A-Z]+)", 1)) + .with_column("status_looks_clean", regexp_like(col("status_norm"), "^[a-z]+$")) + paid = enriched.filter(eq(col("status_norm"), "paid")) + # The query block can reference ordinary local Incan values, so `paid` stays a normal lazy frame binding. + return query { + FROM paid + SELECT + .order_id as order_id + .customer_id as customer_id + .region_norm as region_norm + .gross_amount as gross_amount + .net_amount as net_amount + .order_bucket as order_bucket + .order_year as order_year + .customer_hash as customer_hash + .json_valid as json_valid + .event_type as event_type + .channel as channel + .tag_count as tag_count + .campaign as campaign + .product_page as product_page + .sku_family as sku_family + .status_looks_clean as status_looks_clean + WHERE .net_amount > 100 + ORDER BY desc(.net_amount) + LIMIT 8 + } + + +def paid_rollup(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailRollup]: + enriched = orders + .with_column("region_norm", upper(trim(col("region")))) + .with_column("status_norm", lower(trim(col("status")))) + .with_column("channel", json_extract_path_text(col("event_json"), "$.channel")) + .with_column( + "net_amount", + round(mul(round(mul(col("quantity"), col("unit_price")), 2), sub(1.0, col("discount_pct"))), 2), + ) + # Keep reusable derivations outside the aggregate query so the grouping block stays focused on relational shape. + return query { + FROM enriched + WHERE eq(.status_norm, "paid") + GROUP BY + .region_norm, + .channel + SELECT + # Aggregate aliases define the typed output model fields. + .region_norm as region_norm + .channel as channel + sum(.net_amount) as total_net_amount + avg(.net_amount) as avg_net_amount + min(.net_amount) as min_net_amount + max(.net_amount) as max_net_amount + count() as order_count + count_distinct(.customer_id) as customer_count + ORDER BY desc(.total_net_amount) + } + + +def generated_ranked_tags(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailGeneratedTag]: + # This block demonstrates generator and window composition inside the query vocabulary. + return query { + FROM orders + WHERE eq(check_json(.event_json), true) + EXPLODE + # Build derived tags from scalar helpers, then expand them into one row per tag. + array_distinct( + array( + [lower(trim(.status)), parse_url(.product_url, "campaign"), json_extract_path_text( + .event_json, + "$.channel", + )], + ), + ) as derived_tag + WINDOW BY + # Window output is available to the later SELECT list like any other derived column. + customer_order_rank = row_number().over(window().partition_by([.customer_id]).order_by([desc(.unit_price)])) + SELECT + .order_id as order_id + .customer_id as customer_id + lower(trim(.status)) as status_norm + parse_url(.product_url, "campaign") as campaign + json_extract_path_text(.event_json, "$.channel") as channel + .customer_order_rank as customer_order_rank + .derived_tag as derived_tag + ORDER BY + .customer_id, + .customer_order_rank + LIMIT 12 + } + + +def _print_collected[T with Clone](mut session: Session, label: str, frame: LazyFrame[T]) -> Result[None, SessionError]: + df = session.collect(frame)? + _print_data_frame(label, df) + return Ok(None) + + +def _print_data_frame[T with Clone](label: str, df: DataFrame[T]) -> None: + println("") + println(label) + println(f"columns: {df.resolved_columns():?}") + println(f"rows: {df.row_count()}") + println(df.preview_text()) diff --git a/examples/session_read_transform_write_order_lines_csv.incn b/examples/session_read_transform_write_order_lines_csv.incn index 0c8b9ad..6b7a953 100644 --- a/examples/session_read_transform_write_order_lines_csv.incn +++ b/examples/session_read_transform_write_order_lines_csv.incn @@ -33,10 +33,10 @@ def _read_transform_collect_display_write( def open_eur_lines_with_discount(lines: LazyFrame[OrderLine]) -> LazyFrame[OrderLine]: - return lines.filter(eq(col("status"), lit("open"))).filter(eq(col("currency"), lit("EUR"))).with_column( - "discounted_unit_price", - mul(col("unit_price"), lit(0.9)), - ) + return lines + .filter(eq(col("status"), lit("open"))) + .filter(eq(col("currency"), lit("EUR"))) + .with_column("discounted_unit_price", mul(col("unit_price"), lit(0.9))) def print_lazy_schema(label: str, frame: LazyFrame[OrderLine]) -> None: diff --git a/src/window_builders.incn b/src/window_builders.incn index 7598154..e525df4 100644 --- a/src/window_builders.incn +++ b/src/window_builders.incn @@ -406,10 +406,10 @@ module tests: def _call_ranking_null_treatment() -> None: row_number().ignore_nulls() def test_window_spec_builders_preserve_partition_order_and_frame() -> None: - spec = (window().partition_by([col("customer_id")]).order_by([col("amount")]).rows_between( - unbounded_preceding(), - current_row(), - )) + spec = (window() + .partition_by([col("customer_id")]) + .order_by([col("amount")]) + .rows_between(unbounded_preceding(), current_row())) assert len(spec.partition_columns) == 1 assert column_expr_name(spec.partition_columns[0]) == "customer_id" assert len(spec.sort_columns) == 1 diff --git a/tests/fixtures/advanced_retail_orders.csv b/tests/fixtures/advanced_retail_orders.csv new file mode 100644 index 0000000..93031ae --- /dev/null +++ b/tests/fixtures/advanced_retail_orders.csv @@ -0,0 +1,101 @@ +order_id,customer_id,region,status,quantity,unit_price,discount_pct,created_at,product_url,event_json +1001,C001,North,paid,2,79.99,0.10,2026-05-01T09:15:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-CHAIR-01,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""vip""]}" +1002,C002,South,open,1,249.50,0.00,2026-05-01T10:05:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-DESK-02,"{""type"":""view"",""channel"":""web"",""tags"":[""open"",""research""]}" +1003,C003,East,paid,4,19.95,0.05,2026-05-01T10:30:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-LAMP-11,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""mobile""]}" +1004,C004,West,cancelled,3,34.25,0.00,2026-05-01T11:10:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-MAT-07,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""support""]}" +1005,C005,North,paid,2,199.00,0.15,2026-05-01T12:20:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-MONITOR-24,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""b2b""]}" +1006,C006,South,paid,5,15.75,0.00,2026-05-01T13:00:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-CABLE-03,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""accessory""]}" +1007,C007,East,open,1,399.00,0.05,2026-05-01T14:40:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-PHONE-08,"{""type"":""cart"",""channel"":""app"",""tags"":[""open"",""mobile""]}" +1008,C008,West,paid,3,64.25,0.10,2026-05-01T15:15:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-CASE-10,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""bundle""]}" +1009,C009,North,paid,1,129.00,0.00,2026-05-02T09:00:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-ROUTER-04,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""retention""]}" +1010,C010,South,cancelled,2,99.00,0.05,2026-05-02T09:45:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-PRINTER-09,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""retention""]}" +1011,C011,East,paid,6,11.50,0.00,2026-05-02T10:10:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-PAPER-01,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""office""]}" +1012,C012,West,open,2,149.99,0.20,2026-05-02T11:35:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-TABLET-05,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""premium""]}" +1013,C013,North,paid,1,899.00,0.10,2026-05-02T12:25:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-LAPTOP-14,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""enterprise""]}" +1014,C014,South,paid,2,44.00,0.00,2026-05-02T13:15:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-MOUSE-06,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""office""]}" +1015,C015,East,open,1,59.95,0.05,2026-05-02T14:05:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-KEYBOARD-12,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""office""]}" +1016,C016,West,paid,4,22.50,0.00,2026-05-02T15:00:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-NOTEBOOK-02,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""school""]}" +1017,C017,North,cancelled,1,549.00,0.00,2026-05-03T09:20:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-CAMERA-17,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""premium""]}" +1018,C018,South,paid,3,72.00,0.10,2026-05-03T10:00:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-TRIPOD-19,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""photo""]}" +1019,C019,East,paid,2,310.00,0.15,2026-05-03T11:30:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-WATCH-20,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""wearable""]}" +1020,C020,West,open,5,18.00,0.00,2026-05-03T12:45:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-BAND-21,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""wearable""]}" +1021,C001,North,paid,3,88.00,0.05,2026-05-03T13:20:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-SHELF-15,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""home""]}" +1022,C002,South,paid,1,179.00,0.10,2026-05-03T14:10:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-CABINET-18,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""home""]}" +1023,C003,East,cancelled,2,27.50,0.00,2026-05-03T15:05:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-PEN-22,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""office""]}" +1024,C004,West,paid,8,8.75,0.00,2026-05-03T16:00:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-STICKER-23,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""promo""]}" +1025,C005,North,open,1,699.00,0.20,2026-05-03T17:30:00,https://shop.example/products?campaign=spring&page=13&sku=SKU-TV-25,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""premium""]}" +1026,C006,South,paid,2,79.99,0.10,2026-05-04T09:15:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-CHAIR-01,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""vip""]}" +1027,C007,East,open,1,249.50,0.00,2026-05-04T10:05:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-DESK-02,"{""type"":""view"",""channel"":""web"",""tags"":[""open"",""research""]}" +1028,C008,West,paid,4,19.95,0.05,2026-05-04T10:30:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-LAMP-11,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""mobile""]}" +1029,C009,North,cancelled,3,34.25,0.00,2026-05-04T11:10:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-MAT-07,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""support""]}" +1030,C010,South,paid,2,199.00,0.15,2026-05-04T12:20:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-MONITOR-24,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""b2b""]}" +1031,C011,East,paid,5,15.75,0.00,2026-05-04T13:00:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-CABLE-03,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""accessory""]}" +1032,C012,West,open,1,399.00,0.05,2026-05-04T14:40:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-PHONE-08,"{""type"":""cart"",""channel"":""app"",""tags"":[""open"",""mobile""]}" +1033,C013,North,paid,3,64.25,0.10,2026-05-04T15:15:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-CASE-10,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""bundle""]}" +1034,C014,South,paid,1,129.00,0.00,2026-05-05T09:00:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-ROUTER-04,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""retention""]}" +1035,C015,East,cancelled,2,99.00,0.05,2026-05-05T09:45:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-PRINTER-09,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""retention""]}" +1036,C016,West,paid,6,11.50,0.00,2026-05-05T10:10:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-PAPER-01,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""office""]}" +1037,C017,North,open,2,149.99,0.20,2026-05-05T11:35:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-TABLET-05,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""premium""]}" +1038,C018,South,paid,1,899.00,0.10,2026-05-05T12:25:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-LAPTOP-14,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""enterprise""]}" +1039,C019,East,paid,2,44.00,0.00,2026-05-05T13:15:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-MOUSE-06,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""office""]}" +1040,C020,West,open,1,59.95,0.05,2026-05-05T14:05:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-KEYBOARD-12,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""office""]}" +1041,C001,North,paid,4,22.50,0.00,2026-05-05T15:00:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-NOTEBOOK-02,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""school""]}" +1042,C002,South,cancelled,1,549.00,0.00,2026-05-06T09:20:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-CAMERA-17,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""premium""]}" +1043,C003,East,paid,3,72.00,0.10,2026-05-06T10:00:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-TRIPOD-19,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""photo""]}" +1044,C004,West,paid,2,310.00,0.15,2026-05-06T11:30:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-WATCH-20,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""wearable""]}" +1045,C005,North,open,5,18.00,0.00,2026-05-06T12:45:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-BAND-21,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""wearable""]}" +1046,C006,South,paid,3,88.00,0.05,2026-05-06T13:20:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-SHELF-15,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""home""]}" +1047,C007,East,paid,1,179.00,0.10,2026-05-06T14:10:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-CABINET-18,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""home""]}" +1048,C008,West,cancelled,2,27.50,0.00,2026-05-06T15:05:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-PEN-22,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""office""]}" +1049,C009,North,paid,8,8.75,0.00,2026-05-06T16:00:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-STICKER-23,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""promo""]}" +1050,C010,South,open,1,699.00,0.20,2026-05-06T17:30:00,https://shop.example/products?campaign=spring&page=13&sku=SKU-TV-25,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""premium""]}" +1051,C011,East,paid,2,79.99,0.10,2026-05-07T09:15:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-CHAIR-01,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""vip""]}" +1052,C012,West,open,1,249.50,0.00,2026-05-07T10:05:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-DESK-02,"{""type"":""view"",""channel"":""web"",""tags"":[""open"",""research""]}" +1053,C013,North,paid,4,19.95,0.05,2026-05-07T10:30:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-LAMP-11,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""mobile""]}" +1054,C014,South,cancelled,3,34.25,0.00,2026-05-07T11:10:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-MAT-07,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""support""]}" +1055,C015,East,paid,2,199.00,0.15,2026-05-07T12:20:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-MONITOR-24,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""b2b""]}" +1056,C016,West,paid,5,15.75,0.00,2026-05-07T13:00:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-CABLE-03,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""accessory""]}" +1057,C017,North,open,1,399.00,0.05,2026-05-07T14:40:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-PHONE-08,"{""type"":""cart"",""channel"":""app"",""tags"":[""open"",""mobile""]}" +1058,C018,South,paid,3,64.25,0.10,2026-05-07T15:15:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-CASE-10,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""bundle""]}" +1059,C019,East,paid,1,129.00,0.00,2026-05-08T09:00:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-ROUTER-04,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""retention""]}" +1060,C020,West,cancelled,2,99.00,0.05,2026-05-08T09:45:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-PRINTER-09,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""retention""]}" +1061,C001,North,paid,6,11.50,0.00,2026-05-08T10:10:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-PAPER-01,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""office""]}" +1062,C002,South,open,2,149.99,0.20,2026-05-08T11:35:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-TABLET-05,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""premium""]}" +1063,C003,East,paid,1,899.00,0.10,2026-05-08T12:25:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-LAPTOP-14,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""enterprise""]}" +1064,C004,West,paid,2,44.00,0.00,2026-05-08T13:15:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-MOUSE-06,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""office""]}" +1065,C005,North,open,1,59.95,0.05,2026-05-08T14:05:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-KEYBOARD-12,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""office""]}" +1066,C006,South,paid,4,22.50,0.00,2026-05-08T15:00:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-NOTEBOOK-02,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""school""]}" +1067,C007,East,cancelled,1,549.00,0.00,2026-05-09T09:20:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-CAMERA-17,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""premium""]}" +1068,C008,West,paid,3,72.00,0.10,2026-05-09T10:00:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-TRIPOD-19,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""photo""]}" +1069,C009,North,paid,2,310.00,0.15,2026-05-09T11:30:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-WATCH-20,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""wearable""]}" +1070,C010,South,open,5,18.00,0.00,2026-05-09T12:45:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-BAND-21,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""wearable""]}" +1071,C011,East,paid,3,88.00,0.05,2026-05-09T13:20:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-SHELF-15,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""home""]}" +1072,C012,West,paid,1,179.00,0.10,2026-05-09T14:10:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-CABINET-18,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""home""]}" +1073,C013,North,cancelled,2,27.50,0.00,2026-05-09T15:05:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-PEN-22,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""office""]}" +1074,C014,South,paid,8,8.75,0.00,2026-05-09T16:00:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-STICKER-23,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""promo""]}" +1075,C015,East,open,1,699.00,0.20,2026-05-09T17:30:00,https://shop.example/products?campaign=spring&page=13&sku=SKU-TV-25,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""premium""]}" +1076,C016,West,paid,2,79.99,0.10,2026-05-10T09:15:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-CHAIR-01,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""vip""]}" +1077,C017,North,open,1,249.50,0.00,2026-05-10T10:05:00,https://shop.example/products?campaign=spring&page=1&sku=SKU-DESK-02,"{""type"":""view"",""channel"":""web"",""tags"":[""open"",""research""]}" +1078,C018,South,paid,4,19.95,0.05,2026-05-10T10:30:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-LAMP-11,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""mobile""]}" +1079,C019,East,cancelled,3,34.25,0.00,2026-05-10T11:10:00,https://shop.example/products?campaign=spring&page=2&sku=SKU-MAT-07,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""support""]}" +1080,C020,West,paid,2,199.00,0.15,2026-05-10T12:20:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-MONITOR-24,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""b2b""]}" +1081,C001,North,paid,5,15.75,0.00,2026-05-10T13:00:00,https://shop.example/products?campaign=summer&page=3&sku=SKU-CABLE-03,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""accessory""]}" +1082,C002,South,open,1,399.00,0.05,2026-05-10T14:40:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-PHONE-08,"{""type"":""cart"",""channel"":""app"",""tags"":[""open"",""mobile""]}" +1083,C003,East,paid,3,64.25,0.10,2026-05-10T15:15:00,https://shop.example/products?campaign=summer&page=4&sku=SKU-CASE-10,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""bundle""]}" +1084,C004,West,paid,1,129.00,0.00,2026-05-11T09:00:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-ROUTER-04,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""retention""]}" +1085,C005,North,cancelled,2,99.00,0.05,2026-05-11T09:45:00,https://shop.example/products?campaign=retention&page=5&sku=SKU-PRINTER-09,"{""type"":""refund"",""channel"":""web"",""tags"":[""cancelled"",""retention""]}" +1086,C006,South,paid,6,11.50,0.00,2026-05-11T10:10:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-PAPER-01,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""office""]}" +1087,C007,East,open,2,149.99,0.20,2026-05-11T11:35:00,https://shop.example/products?campaign=retention&page=6&sku=SKU-TABLET-05,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""premium""]}" +1088,C008,West,paid,1,899.00,0.10,2026-05-11T12:25:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-LAPTOP-14,"{""type"":""checkout"",""channel"":""partner"",""tags"":[""paid"",""enterprise""]}" +1089,C009,North,paid,2,44.00,0.00,2026-05-11T13:15:00,https://shop.example/products?campaign=enterprise&page=7&sku=SKU-MOUSE-06,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""office""]}" +1090,C010,South,open,1,59.95,0.05,2026-05-11T14:05:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-KEYBOARD-12,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""office""]}" +1091,C011,East,paid,4,22.50,0.00,2026-05-11T15:00:00,https://shop.example/products?campaign=enterprise&page=8&sku=SKU-NOTEBOOK-02,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""school""]}" +1092,C012,West,cancelled,1,549.00,0.00,2026-05-12T09:20:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-CAMERA-17,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""premium""]}" +1093,C013,North,paid,3,72.00,0.10,2026-05-12T10:00:00,https://shop.example/products?campaign=spring&page=9&sku=SKU-TRIPOD-19,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""photo""]}" +1094,C014,South,paid,2,310.00,0.15,2026-05-12T11:30:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-WATCH-20,"{""type"":""checkout"",""channel"":""app"",""tags"":[""paid"",""wearable""]}" +1095,C015,East,open,5,18.00,0.00,2026-05-12T12:45:00,https://shop.example/products?campaign=summer&page=10&sku=SKU-BAND-21,"{""type"":""view"",""channel"":""app"",""tags"":[""open"",""wearable""]}" +1096,C016,West,paid,3,88.00,0.05,2026-05-12T13:20:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-SHELF-15,"{""type"":""checkout"",""channel"":""email"",""tags"":[""paid"",""home""]}" +1097,C017,North,paid,1,179.00,0.10,2026-05-12T14:10:00,https://shop.example/products?campaign=retention&page=11&sku=SKU-CABINET-18,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""home""]}" +1098,C018,South,cancelled,2,27.50,0.00,2026-05-12T15:05:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-PEN-22,"{""type"":""refund"",""channel"":""partner"",""tags"":[""cancelled"",""office""]}" +1099,C019,East,paid,8,8.75,0.00,2026-05-12T16:00:00,https://shop.example/products?campaign=enterprise&page=12&sku=SKU-STICKER-23,"{""type"":""checkout"",""channel"":""web"",""tags"":[""paid"",""promo""]}" +1100,C020,West,open,1,699.00,0.20,2026-05-12T17:30:00,https://shop.example/products?campaign=spring&page=13&sku=SKU-TV-25,"{""type"":""cart"",""channel"":""web"",""tags"":[""open"",""premium""]}" diff --git a/tests/test_dataset.incn b/tests/test_dataset.incn index d28abd3..67e299d 100644 --- a/tests/test_dataset.incn +++ b/tests/test_dataset.incn @@ -565,10 +565,10 @@ def test_lazy_frame__deeper_independent_roots_still_lower_with_stable_shapes() - right_base: LazyFrame[Order] = lazy_frame_named_table("orders_archive") # -- Act -- - right_joined: LazyFrame[Order] = right_base.filter(always_false()).order_by([col("id")]).join( - right_base.filter(always_false()).order_by([col("id")]), - always_true(), - ) + right_joined: LazyFrame[Order] = right_base + .filter(always_false()) + .order_by([col("id")]) + .join(right_base.filter(always_false()).order_by([col("id")]), always_true()) joined: LazyFrame[Order] = left.join(right_joined, always_true()) plan = joined.to_substrait_plan() diff --git a/tests/test_prism.incn b/tests/test_prism.incn index fc777a9..f3e337d 100644 --- a/tests/test_prism.incn +++ b/tests/test_prism.incn @@ -338,9 +338,10 @@ def test_prism__rewrite_collapses_adjacent_limits_projects_and_order_by() -> Non def test_prism__rewrite_collapses_adjacent_aggregates_by_merging_measures() -> None: # -- Arrange -- _register_projection_test_schema(str("orders")) - aggregated: PrismCursor[Order] = prism_cursor_named_table(str("orders")).group_by([col("id")]).agg([sum(col("id"))]).agg( - [count()], - ) + aggregated: PrismCursor[Order] = prism_cursor_named_table(str("orders")) + .group_by([col("id")]) + .agg([sum(col("id"))]) + .agg([count()]) # -- Act -- output_cols = prism_cursor_output_columns(aggregated) @@ -361,9 +362,9 @@ def test_prism__rewrite_collapses_adjacent_aggregates_by_merging_measures() -> N def test_prism__aggregate_output_columns_include_approximate_measures() -> None: # -- Arrange -- _register_projection_test_schema(str("orders")) - aggregated: PrismCursor[Order] = prism_cursor_named_table(str("orders")).group_by([col("id")]).agg( - [approx_count_distinct(col("id")), approx_percentile(col("id"), 0.5)], - ) + aggregated: PrismCursor[Order] = prism_cursor_named_table(str("orders")) + .group_by([col("id")]) + .agg([approx_count_distinct(col("id")), approx_percentile(col("id"), 0.5)]) # -- Act -- output_cols = prism_cursor_output_columns(aggregated) @@ -380,10 +381,9 @@ def test_prism__rewrite_collapses_adjacent_compatible_windows() -> None: spec = window().order_by([col("id")]) # -- Act -- - windowed: PrismCursor[Order] = prism_cursor_named_table(str("orders")).with_window_column( - "first_row_num", - row_number().over(spec), - ).with_window_column("second_row_num", row_number().over(spec)) + windowed: PrismCursor[Order] = prism_cursor_named_table(str("orders")) + .with_window_column("first_row_num", row_number().over(spec)) + .with_window_column("second_row_num", row_number().over(spec)) plan = windowed.to_substrait_plan() # -- Assert -- @@ -400,10 +400,9 @@ def test_prism__with_column_tracks_output_columns_and_collapses_adjacent_project base: PrismCursor[Order] = prism_cursor_named_table(str("orders_projection_prism")) # -- Act -- - projected: PrismCursor[Order] = base.with_column("double_id", mul(col("id"), 2)).with_column( - "triple_id", - mul(col("id"), 3), - ) + projected: PrismCursor[Order] = base + .with_column("double_id", mul(col("id"), 2)) + .with_column("triple_id", mul(col("id"), 3)) output_cols = prism_cursor_output_columns(projected) plan = projected.to_substrait_plan() diff --git a/tests/test_session_aggregates.incn b/tests/test_session_aggregates.incn index 847d2fb..b255d03 100644 --- a/tests/test_session_aggregates.incn +++ b/tests/test_session_aggregates.incn @@ -105,9 +105,13 @@ def test_session_aggregates__grouped_collect_executes_core_aggregates() -> None: session.read_csv("aggregate_orders", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - grouped = lazy.group_by([col("customer_id")]).agg( - [sum(col("amount")), count(), count(col("amount")), avg(col("amount")), min(col("amount")), max(col("amount"))], - ) + grouped = lazy + .group_by([col("customer_id")]) + .agg( + [sum(col("amount")), count(), count(col("amount")), avg(col("amount")), min(col("amount")), max( + col("amount"), + )], + ) df = assert_is_ok(session.collect(grouped), "mixed grouped aggregate collect should execute") payload = df.preview_text() resolved = df.resolved_columns() @@ -157,9 +161,9 @@ def test_session_aggregates__grouped_collect_executes_distinct_and_filter_modifi "aggregate modifiers fixture should load", ) paid = eq(col("status"), str_lit("paid")) - grouped = lazy.group_by([col("customer_id")]).agg( - [count_distinct(col("product_id")), count_if(paid.clone()), sum(col("amount")).filter(paid)], - ) + grouped = lazy + .group_by([col("customer_id")]) + .agg([count_distinct(col("product_id")), count_if(paid.clone()), sum(col("amount")).filter(paid)]) df = _collect_modifier_or_fail(session, grouped) payload = df.preview_text() resolved = df.resolved_columns() @@ -209,11 +213,13 @@ def test_session_aggregates__grouped_collect_executes_approximate_aggregates() - "aggregate modifiers fixture should load", ) paid = eq(col("status"), str_lit("paid")) - grouped = lazy.group_by([col("customer_id")]).agg( - [approx_count_distinct(col("product_id")).filter(paid.clone()), approx_percentile(col("amount"), 0.0).filter( - paid, - )], - ) + grouped = lazy + .group_by([col("customer_id")]) + .agg( + [approx_count_distinct(col("product_id")).filter(paid.clone()), approx_percentile(col("amount"), 0.0).filter( + paid, + )], + ) df = _collect_modifier_or_fail(session, grouped) payload = df.preview_text() resolved = df.resolved_columns() diff --git a/tests/test_session_filters.incn b/tests/test_session_filters.incn index 5e3571b..dd3ff58 100644 --- a/tests/test_session_filters.incn +++ b/tests/test_session_filters.incn @@ -160,10 +160,10 @@ def test_session_filters__mixed_scalar_query_shape_executes_end_to_end() -> None session.read_csv("order_lines", ORDER_LINES_CSV_FIXTURE), "order lines fixture should load", ) - query_like = lazy.filter(and_(gt(col("qty"), 2), in_(col("status"), [lit("open"), lit("closed")]))).with_column( - "line_total", - mul(cast(col("unit_price"), float), cast(col("qty"), float)), - ).order_by([desc(col("line_total"))]) + query_like = lazy + .filter(and_(gt(col("qty"), 2), in_(col("status"), [lit("open"), lit("closed")]))) + .with_column("line_total", mul(cast(col("unit_price"), float), cast(col("qty"), float))) + .order_by([desc(col("line_total"))]) plan_root = root_rel(query_like.to_substrait_plan()) df = assert_is_ok(session.collect(query_like), "mixed scalar query shape should collect") payload = df.preview_text() diff --git a/tests/test_session_projection.incn b/tests/test_session_projection.incn index f40dac6..5c4aea7 100644 --- a/tests/test_session_projection.incn +++ b/tests/test_session_projection.incn @@ -246,19 +246,16 @@ def test_session_projection__collect_executes_core_scalar_projection_functions() session.read_csv("aggregate_orders", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - projected = lazy.with_column("amount_minus_one", sub(col("amount"), 1)).with_column( - "amount_div_five", - div(col("amount"), 5), - ).with_column("amount_mod_two", modulo(col("amount"), 2)).with_column("negative_amount", neg(col("amount"))).with_column( - "customer_or_unknown", - coalesce([col("customer_id"), lit("unknown")]), - ).with_column("customer_null_if_a", nullif(col("customer_id"), lit("A"))).with_column( - "amount_text", - cast(col("amount"), str), - ).with_column("customer_try_int", try_cast(col("customer_id"), int)).with_column( - "amount_bucket", - case_when([gt(col("amount"), 10)], [lit("large")], lit("small")), - ) + projected = lazy + .with_column("amount_minus_one", sub(col("amount"), 1)) + .with_column("amount_div_five", div(col("amount"), 5)) + .with_column("amount_mod_two", modulo(col("amount"), 2)) + .with_column("negative_amount", neg(col("amount"))) + .with_column("customer_or_unknown", coalesce([col("customer_id"), lit("unknown")])) + .with_column("customer_null_if_a", nullif(col("customer_id"), lit("A"))) + .with_column("amount_text", cast(col("amount"), str)) + .with_column("customer_try_int", try_cast(col("customer_id"), int)) + .with_column("amount_bucket", case_when([gt(col("amount"), 10)], [lit("large")], lit("small"))) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() @@ -287,28 +284,30 @@ def test_session_projection__collect_executes_common_math_scalar_projection_func session.read_csv("aggregate_orders", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - projected = lazy.with_column("abs_delta", abs(sub(5, col("amount")))).with_column( - "ceil_quarter", - ceil(div(col("amount"), 4.0)), - ).with_column("floor_quarter", floor(div(col("amount"), 4.0))).with_column( - "round_quarter", - round(div(col("amount"), 4.0)), - ).with_column("round_one_place", round(2.25, 1)).with_column("sqrt_sixteen", sqrt(16.0)).with_column( - "power_two_three", - power(2.0, 3.0), - ).with_column("exp_zero", exp(0.0)).with_column("ln_one", ln(1.0)).with_column("log_ten_hundred", log(10.0, 100.0)).with_column( - "log10_hundred", - log10(100.0), - ).with_column("sign_negative", sign(-5)).with_column("least_value", least(7, 3, 5)).with_column( - "greatest_value", - greatest(7, 3, 5), - ).with_column("sin_zero", sin(0.0)).with_column("cos_zero", cos(0.0)).with_column("tan_zero", tan(0.0)).with_column( - "asin_zero", - asin(0.0), - ).with_column("acos_one", acos(1.0)).with_column("atan_zero", atan(0.0)).with_column("atan2_zero", atan2(0.0, 1.0)).with_column( - "degrees_zero", - degrees(0.0), - ).with_column("radians_zero", radians(0.0)) + projected = lazy + .with_column("abs_delta", abs(sub(5, col("amount")))) + .with_column("ceil_quarter", ceil(div(col("amount"), 4.0))) + .with_column("floor_quarter", floor(div(col("amount"), 4.0))) + .with_column("round_quarter", round(div(col("amount"), 4.0))) + .with_column("round_one_place", round(2.25, 1)) + .with_column("sqrt_sixteen", sqrt(16.0)) + .with_column("power_two_three", power(2.0, 3.0)) + .with_column("exp_zero", exp(0.0)) + .with_column("ln_one", ln(1.0)) + .with_column("log_ten_hundred", log(10.0, 100.0)) + .with_column("log10_hundred", log10(100.0)) + .with_column("sign_negative", sign(-5)) + .with_column("least_value", least(7, 3, 5)) + .with_column("greatest_value", greatest(7, 3, 5)) + .with_column("sin_zero", sin(0.0)) + .with_column("cos_zero", cos(0.0)) + .with_column("tan_zero", tan(0.0)) + .with_column("asin_zero", asin(0.0)) + .with_column("acos_one", acos(1.0)) + .with_column("atan_zero", atan(0.0)) + .with_column("atan2_zero", atan2(0.0, 1.0)) + .with_column("degrees_zero", degrees(0.0)) + .with_column("radians_zero", radians(0.0)) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() @@ -338,25 +337,27 @@ def test_session_projection__collect_executes_common_string_scalar_projection_fu session.read_csv("aggregate_orders_strings", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - projected = lazy.with_column("len_abc", char_length("abc")).with_column("octets_abc", octet_length("abc")).with_column( - "upper_abc", - upper("abc"), - ).with_column("lower_abc", lower("ABC")).with_column("trimmed", trim(" hello ")).with_column( - "left_trimmed", - ltrim(" hello"), - ).with_column("right_trimmed", rtrim("hello ")).with_column("middle", substring("abcdef", 2, 3)).with_column( - "position_cd", - position("abcdef", "cd"), - ).with_column("overlayed", overlay("abcdef", "ZZ", 3, 2)).with_column("joined", concat("a", "b")).with_column( - "joined_ws", - concat_ws("-", "a", "b"), - ).with_column("replaced", replace("banana", "na", "NA")).with_column("translated", translate("abc", "ab", "xy")).with_column( - "repeated", - repeat("ha", 3), - ).with_column("left_two", left("abcdef", 2)).with_column("right_two", right("abcdef", 2)).with_column( - "left_padded", - lpad("x", 3, "0"), - ).with_column("right_padded", rpad("x", 3, "0")).with_column("split_second", split_part("a,b,c", ",", 2)) + projected = lazy + .with_column("len_abc", char_length("abc")) + .with_column("octets_abc", octet_length("abc")) + .with_column("upper_abc", upper("abc")) + .with_column("lower_abc", lower("ABC")) + .with_column("trimmed", trim(" hello ")) + .with_column("left_trimmed", ltrim(" hello")) + .with_column("right_trimmed", rtrim("hello ")) + .with_column("middle", substring("abcdef", 2, 3)) + .with_column("position_cd", position("abcdef", "cd")) + .with_column("overlayed", overlay("abcdef", "ZZ", 3, 2)) + .with_column("joined", concat("a", "b")) + .with_column("joined_ws", concat_ws("-", "a", "b")) + .with_column("replaced", replace("banana", "na", "NA")) + .with_column("translated", translate("abc", "ab", "xy")) + .with_column("repeated", repeat("ha", 3)) + .with_column("left_two", left("abcdef", 2)) + .with_column("right_two", right("abcdef", 2)) + .with_column("left_padded", lpad("x", 3, "0")) + .with_column("right_padded", rpad("x", 3, "0")) + .with_column("split_second", split_part("a,b,c", ",", 2)) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() @@ -384,16 +385,16 @@ def test_session_projection__collect_executes_common_encoding_and_regex_projecti session.read_csv("aggregate_orders_encoding_regex", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - projected = lazy.with_column("encoded_base64", encode("abc", "base64")).with_column( - "decoded_base64", - decode("YWJj", "base64"), - ).with_column("base64_abc", base64("abc")).with_column("unbase64_abc", unbase64("YWJj")).with_column( - "hex_abc", - hex("abc"), - ).with_column("unhex_abc", unhex("616263")).with_column("regex_like", regexp_like("order-42", "^order-[0-9]+$")).with_column( - "regex_replaced", - regexp_replace("order-42", "[0-9]+", "99"), - ).with_column("regex_extracted", regexp_extract("order-42", "order-([0-9]+)", 1)) + projected = lazy + .with_column("encoded_base64", encode("abc", "base64")) + .with_column("decoded_base64", decode("YWJj", "base64")) + .with_column("base64_abc", base64("abc")) + .with_column("unbase64_abc", unbase64("YWJj")) + .with_column("hex_abc", hex("abc")) + .with_column("unhex_abc", unhex("616263")) + .with_column("regex_like", regexp_like("order-42", "^order-[0-9]+$")) + .with_column("regex_replaced", regexp_replace("order-42", "[0-9]+", "99")) + .with_column("regex_extracted", regexp_extract("order-42", "order-([0-9]+)", 1)) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() @@ -418,31 +419,25 @@ def test_session_projection__collect_executes_common_datetime_projection_functio session.read_csv("aggregate_orders_datetime", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - projected = lazy.with_column("year_part", date_part("year", to_date("2026-05-30"))).with_column( - "month_trunc", - date_trunc("month", to_timestamp("2026-05-30T12:34:56")), - ).with_column("hour_trunc", time_trunc("hour", "12:34:56")).with_column( - "plus_two_days", - date_add("day", 2, "2026-05-30"), - ).with_column("minus_one_day", date_sub("day", 1, "2026-05-30")).with_column( - "day_diff", - date_diff("day", "2026-05-30", "2026-06-02"), - ).with_column("second_diff", timestamp_diff("second", "1970-01-01T00:00:00", "1970-01-01T00:00:10")).with_column( - "date_value", - to_date("2026-05-30T12:34:56"), - ).with_column("time_value", to_time("12:34:56")).with_column("timestamp_value", to_timestamp("2026-05-30T12:34:56")).with_column( - "from_unix", - from_unixtime(10), - ).with_column("unix_s", unix_seconds("1970-01-01T00:00:10")).with_column( - "unix_ms", - unix_millis("1970-01-01T00:00:01"), - ).with_column("unix_us", unix_micros("1970-01-01T00:00:01")).with_column("date_made", make_date(2026, 5, 30)).with_column( - "time_made", - make_time(12, 34, 56), - ).with_column("timestamp_made", make_timestamp(2026, 5, 30, 12, 34, 56)).with_column( - "month_last_day", - last_day("2026-02-10"), - ) + projected = lazy + .with_column("year_part", date_part("year", to_date("2026-05-30"))) + .with_column("month_trunc", date_trunc("month", to_timestamp("2026-05-30T12:34:56"))) + .with_column("hour_trunc", time_trunc("hour", "12:34:56")) + .with_column("plus_two_days", date_add("day", 2, "2026-05-30")) + .with_column("minus_one_day", date_sub("day", 1, "2026-05-30")) + .with_column("day_diff", date_diff("day", "2026-05-30", "2026-06-02")) + .with_column("second_diff", timestamp_diff("second", "1970-01-01T00:00:00", "1970-01-01T00:00:10")) + .with_column("date_value", to_date("2026-05-30T12:34:56")) + .with_column("time_value", to_time("12:34:56")) + .with_column("timestamp_value", to_timestamp("2026-05-30T12:34:56")) + .with_column("from_unix", from_unixtime(10)) + .with_column("unix_s", unix_seconds("1970-01-01T00:00:10")) + .with_column("unix_ms", unix_millis("1970-01-01T00:00:01")) + .with_column("unix_us", unix_micros("1970-01-01T00:00:01")) + .with_column("date_made", make_date(2026, 5, 30)) + .with_column("time_made", make_time(12, 34, 56)) + .with_column("timestamp_made", make_timestamp(2026, 5, 30, 12, 34, 56)) + .with_column("month_last_day", last_day("2026-02-10")) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() @@ -489,10 +484,10 @@ def test_session_projection__typed_value_or_column_rejects_schema_mismatch_befor Ok(_) => return fail_t("encode should reject an integer source column before DataFusion execution") projected_result = session.collect( - lazy.with_column("normalized_customer", lower(col("customer_id"))).order_by([col("amount")]).with_column( - "bad_projected_date", - make_date(col("normalized_customer"), 5, 30), - ), + lazy + .with_column("normalized_customer", lower(col("customer_id"))) + .order_by([col("amount")]) + .with_column("bad_projected_date", make_date(col("normalized_customer"), 5, 30)), ) match projected_result: Err(err) => @@ -513,13 +508,15 @@ def test_session_projection__collect_executes_format_hashing_projection_function session.read_csv("aggregate_orders", AGGREGATE_ORDERS_CSV_FIXTURE), "aggregate orders fixture should load", ) - projected = lazy.with_column("md5_abc", md5("abc")).with_column("sha1_abc", sha1("abc")).with_column( - "crc32_abc", - crc32("abc"), - ).with_column("xxhash64_abc", xxhash64("abc")).with_column("sha224_abc", sha224("abc")).with_column( - "sha2_256_abc", - sha2("abc", 256), - ).with_column("sha384_abc", sha384("abc")).with_column("sha512_abc", sha512("abc")) + projected = lazy + .with_column("md5_abc", md5("abc")) + .with_column("sha1_abc", sha1("abc")) + .with_column("crc32_abc", crc32("abc")) + .with_column("xxhash64_abc", xxhash64("abc")) + .with_column("sha224_abc", sha224("abc")) + .with_column("sha2_256_abc", sha2("abc", 256)) + .with_column("sha384_abc", sha384("abc")) + .with_column("sha512_abc", sha512("abc")) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() @@ -555,28 +552,24 @@ def test_session_projection__collect_executes_json_url_and_csv_format_functions( ) json_payload = "{\"type\":\"click\",\"tags\":[\"paid\",\"web\"],\"user\":{\"id\":7}}" csv_payload = "42,paid" - projected = lazy.with_column("json_valid", check_json(json_payload)).with_column( - "json_type", - json_extract_path_text(json_payload, "$.type"), - ).with_column("json_object", get_json_object(json_payload, "$.user")).with_column( - "json_tags", - json_array_length("[\"paid\",\"web\"]"), - ).with_column("json_keys", json_object_keys(json_payload)).with_column("json_schema", schema_of_json(json_payload)).with_column( - "json_normalized", - parse_json(json_payload), - ).with_column("json_from", from_json[JsonProjectionSchema](json_payload)).with_column( - "json_try_invalid", - try_from_json[JsonProjectionSchema]("{"), - ).with_column("json_string", to_json("paid")).with_column( - "url_page", - parse_url("https://example.com/orders?page=2&id=7#top", "page"), - ).with_column("url_encoded", url_encode("a b")).with_column("url_decoded", url_decode("a%20b")).with_column( - "url_try_invalid", - try_url_decode("%zz"), - ).with_column("csv_schema", schema_of_csv(csv_payload)).with_column( - "csv_map", - from_csv[CsvProjectionSchema](csv_payload), - ).with_column("csv_line", to_csv(lit("[\"42\",\"paid\"]"))) + projected = lazy + .with_column("json_valid", check_json(json_payload)) + .with_column("json_type", json_extract_path_text(json_payload, "$.type")) + .with_column("json_object", get_json_object(json_payload, "$.user")) + .with_column("json_tags", json_array_length("[\"paid\",\"web\"]")) + .with_column("json_keys", json_object_keys(json_payload)) + .with_column("json_schema", schema_of_json(json_payload)) + .with_column("json_normalized", parse_json(json_payload)) + .with_column("json_from", from_json[JsonProjectionSchema](json_payload)) + .with_column("json_try_invalid", try_from_json[JsonProjectionSchema]("{")) + .with_column("json_string", to_json("paid")) + .with_column("url_page", parse_url("https://example.com/orders?page=2&id=7#top", "page")) + .with_column("url_encoded", url_encode("a b")) + .with_column("url_decoded", url_decode("a%20b")) + .with_column("url_try_invalid", try_url_decode("%zz")) + .with_column("csv_schema", schema_of_csv(csv_payload)) + .with_column("csv_map", from_csv[CsvProjectionSchema](csv_payload)) + .with_column("csv_line", to_csv(lit("[\"42\",\"paid\"]"))) df = _collect_or_fail(session, projected) payload = df.preview_text() resolved = df.resolved_columns() diff --git a/tests/test_session_windows.incn b/tests/test_session_windows.incn index d0955f7..d7cf93e 100644 --- a/tests/test_session_windows.incn +++ b/tests/test_session_windows.incn @@ -110,10 +110,9 @@ def test_session_windows__collect_executes_ranking_window_functions() -> None: spec = _customer_amount_window() # -- Act -- - windowed = (_orders(session, "aggregate_orders_ranking").with_window_column("row_num", row_number().over(spec)).with_window_column( - "amount_rank", - rank().over(spec), - )) + windowed = (_orders(session, "aggregate_orders_ranking") + .with_window_column("row_num", row_number().over(spec)) + .with_window_column("amount_rank", rank().over(spec))) df = _collect_or_fail(session, windowed) payload = df.preview_text() resolved = df.resolved_columns() @@ -255,10 +254,10 @@ def test_session_windows__collect_executes_scalar_and_distinct_window_arguments( distinct_arg = _modifier_orders(session, "aggregate_modifiers_window_distinct").with_window_column( "running_distinct_products", count_distinct(col("product_id")).over( - window().partition_by([col("customer_id")]).order_by([desc(col("amount"))]).rows_between( - unbounded_preceding_bound(), - current_row_bound(), - ), + window() + .partition_by([col("customer_id")]) + .order_by([desc(col("amount"))]) + .rows_between(unbounded_preceding_bound(), current_row_bound()), ), ) scalar_df = _collect_or_fail(session, scalar_arg) @@ -289,9 +288,9 @@ def test_session_windows__collect_executes_window_below_filter() -> None: spec = _customer_amount_window() # -- Act -- - filtered = _orders(session, "aggregate_orders_nested_filter").with_window_column("amount_rank", rank().over(spec)).filter( - eq(col("amount_rank"), 1), - ) + filtered = _orders(session, "aggregate_orders_nested_filter") + .with_window_column("amount_rank", rank().over(spec)) + .filter(eq(col("amount_rank"), 1)) df = _collect_or_fail(session, filtered) payload = df.preview_text() resolved = df.resolved_columns() diff --git a/tests/test_window_functions.incn b/tests/test_window_functions.incn index a983a52..bb0aa55 100644 --- a/tests/test_window_functions.incn +++ b/tests/test_window_functions.incn @@ -26,10 +26,10 @@ from window_builders import ( def test_window_builders__spec_preserves_partition_and_order_columns() -> None: # -- Arrange / Act -- - spec = (window().partition_by([col("customer_id")]).order_by([col("amount")]).rows_between( - unbounded_preceding(), - current_row(), - )) + spec = (window() + .partition_by([col("customer_id")]) + .order_by([col("amount")]) + .rows_between(unbounded_preceding(), current_row())) # -- Assert -- assert len(spec.partition_columns) == 1, "window partition should record explicit partition expressions" diff --git a/vocab_companion/src/desugar.rs b/vocab_companion/src/desugar.rs index a46bf8b..7f0f411 100644 --- a/vocab_companion/src/desugar.rs +++ b/vocab_companion/src/desugar.rs @@ -632,5 +632,8 @@ fn expr_is_aggregate(expr: &IncanExpr) -> bool { } fn is_aggregate_name(name: &str) -> bool { - matches!(name, "sum" | "count" | "avg" | "min" | "max") + matches!( + name, + "sum" | "count" | "count_distinct" | "avg" | "min" | "max" + ) } diff --git a/vocab_companion/src/lib.rs b/vocab_companion/src/lib.rs index 04c4801..722db96 100644 --- a/vocab_companion/src/lib.rs +++ b/vocab_companion/src/lib.rs @@ -21,6 +21,7 @@ const HELPER_EXPORTS: &[&str] = &[ "col", "lit", "array", + "array_distinct", "add", "sub", "mul", @@ -40,10 +41,23 @@ const HELPER_EXPORTS: &[&str] = &[ "desc", "explode", "sum", + "count_distinct", "count", "avg", "min", "max", + "lower", + "upper", + "trim", + "round", + "date_part", + "sha256", + "check_json", + "json_extract_path_text", + "json_array_length", + "parse_url", + "regexp_extract", + "regexp_like", "aggregate_as", "with_column_assignment", "window", From 55726bb2dbd5a0c1785642d652875316a365d4fb Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Thu, 4 Jun 2026 20:20:58 +0200 Subject: [PATCH 03/11] chore - expect Incan rc49 in CI (#4) --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 968a747..70a8e18 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -19,8 +19,8 @@ concurrency: env: CARGO_TERM_COLOR: always - INCAN_REF: feature/750-primitive-type-tokens - EXPECTED_INCAN_VERSION: 0.3.0-rc47 + INCAN_REF: bugfix/755-dependency-union-public-boundary + EXPECTED_INCAN_VERSION: 0.3.0-rc49 RUST_BACKTRACE: 1 INCAN_NO_BANNER: 1 INCAN_GENERATED_CARGO_TARGET_DIR: ${{ github.workspace }}/.incan-generated-cargo-target From c0899c2aee3286d294eec679da54c2d24a4b686e Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Thu, 4 Jun 2026 22:58:40 +0200 Subject: [PATCH 04/11] chore - polish RFC003 closeout review (#4) --- .github/workflows/ci.yml | 2 +- Makefile | 11 +- docs/language/reference/dataset_methods.md | 4 +- docs/language/reference/query_blocks.md | 8 +- docs/release_notes/v0_1.md | 9 +- docs/rfcs/003_inql_query_blocks.md | 7 +- docs/rfcs/013_function_catalog_program.md | 2 +- examples/README.md | 9 +- src/session/datafusion_backend.incn | 32 +- src/substrait/expr_lowering.incn | 25 +- src/substrait/relations.incn | 2 +- vocab_companion/src/desugar.rs | 10 +- vocab_companion/src/lib.rs | 333 +++++++++++++++++++-- 13 files changed, 364 insertions(+), 90 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 70a8e18..0a09fe7 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,7 +20,7 @@ concurrency: env: CARGO_TERM_COLOR: always INCAN_REF: bugfix/755-dependency-union-public-boundary - EXPECTED_INCAN_VERSION: 0.3.0-rc49 + EXPECTED_INCAN_VERSION: 0.3.0-rc50 RUST_BACKTRACE: 1 INCAN_NO_BANNER: 1 INCAN_GENERATED_CARGO_TARGET_DIR: ${{ github.workspace }}/.incan-generated-cargo-target diff --git a/Makefile b/Makefile index 6ad1bbb..8bccca8 100644 --- a/Makefile +++ b/Makefile @@ -39,6 +39,11 @@ test: ## Run package tests (`incan test tests`) @echo "\033[1mRunning InQL tests...\033[0m" @$(INCAN) test $(INQL_TEST_DIR) +.PHONY: vocab-companion-test +vocab-companion-test: ## Run Rust tests for the query-block vocabulary companion + @echo "\033[1mRunning query-block vocabulary companion tests...\033[0m" + @cargo test --manifest-path vocab_companion/Cargo.toml + .PHONY: test-style test-style: ## Validate test style markers (Arrange / Act / Assert) across `tests/*.incn` @echo "\033[1mChecking test style markers...\033[0m" @@ -105,15 +110,15 @@ fmt-check: ## Check formatting without writing (`incan fmt --check` per director # ============================================================================= .PHONY: check -check: fmt-check test-style registry-metadata build test ## Format check, style gate, metadata check, build, and test +check: fmt-check test-style vocab-companion-test registry-metadata build test ## Format check, style gate, metadata check, build, and test @echo "\033[32m✓ check passed\033[0m" .PHONY: pre-commit -pre-commit: fmt-check test-style registry-metadata build test ## Fast gate before commit (same as `check`) +pre-commit: fmt-check test-style vocab-companion-test registry-metadata build test ## Fast gate before commit (same as `check`) @echo "\033[32m✓ pre-commit gate passed\033[0m" .PHONY: ci -ci: fmt-check test-style registry-metadata build test smoke-consumer ## Same steps as GitHub Actions `inql` job +ci: fmt-check test-style vocab-companion-test registry-metadata build test smoke-consumer ## Same steps as GitHub Actions `inql` job @echo "\033[32m✓ ci gate passed\033[0m" .PHONY: verify diff --git a/docs/language/reference/dataset_methods.md b/docs/language/reference/dataset_methods.md index 4215a9d..5c49147 100644 --- a/docs/language/reference/dataset_methods.md +++ b/docs/language/reference/dataset_methods.md @@ -14,7 +14,7 @@ The Substrait helper surface behind these methods is split by semantic role: | Method | Signature | Meaning | | ------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | | `filter` | `def filter(self, predicate: ColumnExpr) -> Self` | Restrict rows by a boolean scalar expression. | -| `join` | `def join(self, other: Self, on: bool) -> Self` | Combine with another same-carrier relation using the package's boolean join predicate surface. | +| `join` | `def join(self, other: Self, on: ColumnExpr) -> Self` | Combine with another same-carrier relation using the package's scalar predicate surface. | | `select` | `def select[U](self, assignments: list[ProjectionAssignment] = []) -> SameCarrier[U]` | Project an output row shape while preserving the carrier kind. | | `with_column` | `def with_column(self, name: str, expr: ColumnExpr) -> Self` | Add or replace one projected column using a scalar expression. | | `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. | @@ -70,7 +70,7 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]: ## Capability notes -- `join(...)` is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature. +- `join(...)` is constrained to same-carrier inputs and the `ColumnExpr` predicate surface shown in the signature. - `select(...)` is the schema-changing projection boundary used by query blocks. Identity `select()` preserves the current row model through its surrounding expected type, while explicit assignments can retarget to a new row model. - `generate(...)` preserves all input columns and appends generated output aliases for `explode`, `explode_outer`, `posexplode`, `posexplode_outer`, `inline`, `inline_outer`, `flatten`, and `stack` generator applications. Alias collisions are rejected during planning/lowering. - `with_window_column(...)` supports placed ranking, distribution, offset, value, and aggregate-over-window helpers over explicit window specs. Portable helpers lower through Substrait window relations and execute through the DataFusion session adapter. diff --git a/docs/language/reference/query_blocks.md b/docs/language/reference/query_blocks.md index 06c6157..68074b4 100644 --- a/docs/language/reference/query_blocks.md +++ b/docs/language/reference/query_blocks.md @@ -20,6 +20,10 @@ def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: } ``` +The `OrderSummary` type parameter documents the intended output row model. The v0.1 implementation checks query +schema evolution and selected aliases through the carrier planning surface; full field/type compatibility validation +against annotated output models is tracked as schema-validation follow-up work. + InQL also accepts the colon spelling in expression position: ```incan @@ -56,4 +60,6 @@ The implemented v0.1 query-block surface supports: - `SELECT` aliases become the output schema for later clauses. - A `SELECT` alias may be reused by later expressions in the same `SELECT` list. -Query blocks lower into the same Dataset, Prism, Substrait, and Session adapter path as equivalent method-chain code. +Query blocks desugar into the same carrier method calls available to ordinary InQL code before lowering through the +current carrier planning path. `LazyFrame` flows are Prism-backed; concrete `DataFrame` and `DataStream` flows still use +their documented carrier paths before converging at the Substrait boundary. diff --git a/docs/release_notes/v0_1.md b/docs/release_notes/v0_1.md index 1fef3fd..304d701 100644 --- a/docs/release_notes/v0_1.md +++ b/docs/release_notes/v0_1.md @@ -9,10 +9,11 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable). - **Language:** Foundational InQL syntax and semantics (naming, query schema, layer boundaries). - **Carriers:** `DataSet[T]` hierarchy including bounded vs unbounded traits and concrete frame/stream types. - **Plans:** Apache Substrait as the logical interchange contract. -- **Authoring:** method-chain lowering and RFC 003 `query {}` blocks share the same InQL logical planning path and - Substrait boundary. Query blocks support the brace spelling and expression-position `query:` spelling, including - SELECT aliases, lateral alias reuse, grouped aggregates, `SELECT DISTINCT`, post-SELECT filters, ordering, limits, - inner and left joins, generator clauses, and named window expressions. +- **Authoring:** `LazyFrame` method chains are Prism-backed, and RFC 003 `query {}` blocks desugar into the same + carrier calls before lowering through the current carrier planning paths and Substrait boundary. Query blocks support + the brace spelling and expression-position `query:` spelling, including SELECT aliases, lateral alias reuse, grouped + aggregates, `SELECT DISTINCT`, post-SELECT filters, ordering, limits, inner and left joins, generator clauses, and + named window expressions. - **Aggregates:** builder-based `col`, `sum`, `count`, `count_expr`, `count_distinct`, `count_if`, `avg`, `min`, and `max` helpers now lower grouped and global aggregates through Prism, Substrait, and Session execution. `count()` counts rows, `count(expr)` counts non-null expression values, `count_expr(expr)` remains a compatibility spelling, and the first aggregate modifier slice supports `DISTINCT` plus aggregate-local `FILTER` where valid. - **Scalar expressions:** RFC 012 unifies filter predicates, computed projection values, grouping keys, and aggregate inputs around one `ColumnExpr` surface with canonical `lit(...)` and typed literal helpers. - **Core scalar functions:** RFC 015 adds registry-backed scalar function applications and the first core helper slice for casts, comparisons, boolean logic, null/NaN predicates, arithmetic, conditionals, membership/range predicates, and ordering expressions. Primitive cast targets can use source-level type tokens such as `cast(col("amount_text"), float)`, while explicit string target spellings remain available for compatibility aliases such as `int64` and `float64`. Implemented helpers lower to Substrait IR through registry metadata, built-in Rex shapes, or structural sort-field lowering; DataFusion remains the first execution adapter rather than the semantic boundary. diff --git a/docs/rfcs/003_inql_query_blocks.md b/docs/rfcs/003_inql_query_blocks.md index 9928d89..5cf69c2 100644 --- a/docs/rfcs/003_inql_query_blocks.md +++ b/docs/rfcs/003_inql_query_blocks.md @@ -8,7 +8,7 @@ - InQL RFC 001 (dataset types — **prerequisite**; `FROM` sources must conform to `DataSet[T]`) - InQL RFC 002 (Apache Substrait — **normative `Rel`-level contract** for lowering) - **Issue:** [InQL #4](https://github.com/dannys-code-corner/InQL/issues/4) -- **RFC PR:** - +- **RFC PR:** [InQL #59](https://github.com/dannys-code-corner/InQL/pull/59) - **Written against:** Incan v0.3 - **Shipped in:** InQL v0.1 @@ -56,7 +56,10 @@ def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: } ``` -The compiler checks `.status`, `.amount`, `GROUP BY` / `SELECT` consistency, and output compatibility with `DataFrame[OrderSummary]` (or structural rules this RFC finalizes). The checked tree lowers to Substrait (InQL RFC 002); execution uses the execution context. +The compiler checks `.status`, `.amount`, and `GROUP BY` / `SELECT` consistency. The `DataFrame[OrderSummary]` return +type records the intended output row model; full field/type compatibility validation against annotated output models is +tracked as schema-validation follow-up work. The checked tree lowers to Substrait (InQL RFC 002); execution uses the +execution context. ## Reference-level explanation diff --git a/docs/rfcs/013_function_catalog_program.md b/docs/rfcs/013_function_catalog_program.md index a6733f4..d13e85f 100644 --- a/docs/rfcs/013_function_catalog_program.md +++ b/docs/rfcs/013_function_catalog_program.md @@ -19,7 +19,7 @@ - InQL RFC 025 (typed sketch logical values) - InQL RFC 026 (semi-structured variant logical values) - **Issue:** [InQL #30](https://github.com/dannys-code-corner/InQL/issues/30) -- **RFC PR:** — +- **RFC PR:** [InQL #59](https://github.com/dannys-code-corner/InQL/pull/59) - **Written against:** Incan v0.2 - **Shipped in:** InQL v0.1 diff --git a/examples/README.md b/examples/README.md index ba2d66b..188d73b 100644 --- a/examples/README.md +++ b/examples/README.md @@ -18,7 +18,8 @@ ordinary Incan code. - `session_grouped_aggregate_csv.incn` — Grouped aggregate over `LazyFrame[AggregateOrder]` using `col(...)`, `sum(...)`, and `count()` - `session_with_column_csv.incn` — Derived-column example over `LazyFrame[AggregateOrder]` using `with_column(...)`, `mul(...)`, and `lit(...)` - `advanced_retail_analytics.incn` — Larger 100-row retail method-chain spike covering scalar functions, JSON, URL parsing, hashing, aggregates, windows, and generators -- `advanced_retail_query_blocks/` — Dependency-consumer query-block version of the retail spike, covering RFC 003 vocab over the same 100-row fixture +- `advanced_retail_query_blocks/` — Dependency-consumer query-block version of the retail spike, covering the + query-block vocabulary over the same 100-row fixture - `models.incn` — Shared `@derive(Clone)` row models for examples ## Running examples @@ -48,8 +49,7 @@ quoted JSON event payloads. It materializes three outputs: - a generated tag view that composes window ranking with `explode(...)` `advanced_retail_query_blocks/` is the same fixture exercised from a standalone dependency consumer. It imports -`pub::inql` and runs real RFC 003 query blocks for the high-value projection, grouped rollup, and generated-tag window -view: +`pub::inql` and runs query blocks for the high-value projection, grouped rollup, and generated-tag window view: ```incan high_value = query { @@ -88,7 +88,8 @@ These examples document the API patterns for the InQL dataset and Session surfac 2. Carrier transformations remain typed Incan functions rather than stringly runtime scripts 3. Builder-based aggregation runs through `col(...)`, `sum(...)`, and `count()` 4. Builder-based scalar expressions run through `col(...)`, `lit(...)`, `eq(...)`, `gt(...)`, `add(...)`, and `mul(...)` -5. Query blocks activate through `pub::inql` in dependency consumers and lower into the same Dataset/Prism/Substrait path +5. Query blocks activate through `pub::inql` in dependency consumers, desugar into carrier calls, and meet the rest of + InQL at the Substrait boundary 6. Session execution provides `collect`, `display`, and write sinks over DataFusion They serve three purposes: diff --git a/src/session/datafusion_backend.incn b/src/session/datafusion_backend.incn index 9bbd2ec..49159fe 100644 --- a/src/session/datafusion_backend.incn +++ b/src/session/datafusion_backend.incn @@ -372,6 +372,8 @@ async def _dataframe_from_window_rel( mut current_window = window_rel mut current_output_columns = output_columns mut child = _window_input_rel(current_window.clone())? + # DataFusion's typed window API is easiest to apply from the innermost relation outward. First peel the nested + # Substrait window chain into a stack, then execute the base child and replay each window on the resulting frame. while true: child_columns = relation_output_columns(child.clone()) pending_windows.append( @@ -725,25 +727,17 @@ def _replace_fetch_generator_child(original: Rel, fetch: FetchRel, path: str) -> def _replace_window_generator_child(original: Rel, window: ConsistentPartitionWindowRel, path: str) -> Rel: """Rebuild a WindowRel with its input child rewritten when present.""" - match window.input: - Some(child) => - return Rel( - rel_type=Some( - RelType.Window( - Box.new( - ConsistentPartitionWindowRel( - common=window.common, - input=Some(_rewritten_generator_child(child, f"{path}_window")), - window_functions=window.window_functions, - partition_expressions=window.partition_expressions, - sorts=window.sorts, - advanced_extension=window.advanced_extension, - ), - ), - ), - ), - ) - None => return original + if let Some(child) = window.input: + rewritten_window = ConsistentPartitionWindowRel( + common=window.common, + input=Some(_rewritten_generator_child(child, f"{path}_window")), + window_functions=window.window_functions, + partition_expressions=window.partition_expressions, + sorts=window.sorts, + advanced_extension=window.advanced_extension, + ) + return Rel(rel_type=Some(RelType.Window(Box.new(rewritten_window)))) + return original def _replace_set_generator_children(set_rel: SetRel, path: str) -> Rel: diff --git a/src/substrait/expr_lowering.incn b/src/substrait/expr_lowering.incn index 06db5b0..551b918 100644 --- a/src/substrait/expr_lowering.incn +++ b/src/substrait/expr_lowering.incn @@ -794,24 +794,17 @@ pub def lower_select_project_for_columns( output_index = input_column_count + len(expressions) - 1 mapping_indexes.append(output_index) existing_idx = _index_of_binding(bindings, assignment.output_name) + resolved_projection_binding = ResolvedProjectionBinding( + name=assignment.output_name, + expr=resolved_expr, + output_index=output_index, + kind=output_kind, + nullable=true, + ) if existing_idx >= 0: - bindings[existing_idx] = ResolvedProjectionBinding( - name=assignment.output_name, - expr=resolved_expr, - output_index=output_index, - kind=output_kind, - nullable=true, - ) + bindings[existing_idx] = resolved_projection_binding else: - bindings.append( - ResolvedProjectionBinding( - name=assignment.output_name, - expr=resolved_expr, - output_index=output_index, - kind=output_kind, - nullable=true, - ), - ) + bindings.append(resolved_projection_binding) return Ok( LoweredProject( expressions=[resolved.expr.clone() for resolved in expressions], diff --git a/src/substrait/relations.incn b/src/substrait/relations.incn index 172828d..91093ad 100644 --- a/src/substrait/relations.incn +++ b/src/substrait/relations.incn @@ -939,7 +939,7 @@ pub def try_select_project_rel_for_columns( pub def project_rel_with_expressions(input: Rel, expressions: list[Expression]) -> Rel: """Append already-lowered Substrait expressions to a relation.""" - # The DataFusion adapter uses this to evaluate generator arguments into temporary columns before unnesting. + # Execution adapters can use this to materialize pre-lowered expressions before a relation-shaping operation. return _rel_project( ProjectRel( common=Some(_direct_common()), diff --git a/vocab_companion/src/desugar.rs b/vocab_companion/src/desugar.rs index 7f0f411..b4c4584 100644 --- a/vocab_companion/src/desugar.rs +++ b/vocab_companion/src/desugar.rs @@ -42,6 +42,8 @@ fn lower_query(declaration: &VocabDeclaration) -> Result = None; + // Query clauses are lowered left-to-right into the same carrier method calls that authors can write manually. + // `JOIN` is staged until the following `ON` clause so relation naming and predicate lowering stay together. for item in &declaration.body { let VocabBodyItem::Clause(clause) = item else { continue; @@ -426,6 +428,8 @@ fn lower_expression_list(clause: &VocabClause) -> Result, Desugar } fn lower_column_expr(expr: &IncanExpr) -> Result { + // The vocab AST distinguishes query-only field shorthands from ordinary Incan expressions. Normalize every + // supported query expression into the public InQL helper surface before the compiler typechecks the generated call. match expr { IncanExpr::ScopedSurface(surface) if surface.descriptor_key == QUERY_FIELD_DESCRIPTOR => { if let IncanScopedSurfacePayload::LeadingDotPath { segments, .. } = &surface.payload { @@ -473,7 +477,7 @@ fn lower_column_expr(expr: &IncanExpr) -> Result { .collect::, _>>()?, )), _ => Err(DesugarError::new( - "query expression form is not part of the RFC003 grammar", + "query blocks do not support this expression form", )), } } @@ -532,7 +536,7 @@ fn binary_helper(op: IncanBinaryOp) -> Result<&'static str, DesugarError> { IncanBinaryOp::And => Ok("and_"), IncanBinaryOp::Or => Ok("or_"), _ => Err(DesugarError::new( - "query binary operator is not part of the RFC003 grammar", + "query blocks do not support this binary operator", )), } } @@ -542,7 +546,7 @@ fn unary_helper(op: IncanUnaryOp) -> Result<&'static str, DesugarError> { IncanUnaryOp::Not => Ok("not_"), IncanUnaryOp::Neg => Ok("neg"), _ => Err(DesugarError::new( - "query unary operator is not part of the RFC003 grammar", + "query blocks do not support this unary operator", )), } } diff --git a/vocab_companion/src/lib.rs b/vocab_companion/src/lib.rs index 722db96..c81d910 100644 --- a/vocab_companion/src/lib.rs +++ b/vocab_companion/src/lib.rs @@ -17,11 +17,200 @@ pub const NAMESPACE: &str = "inql"; pub const QUERY_KW: &str = "query"; pub const QUERY_FIELD_DESCRIPTOR: &str = "inql.query.field"; -const HELPER_EXPORTS: &[&str] = &[ +// Incan's current vocab manifest API requires query desugarers to name helper bindings explicitly. Keep this list as +// the query-expression helper surface, and keep the tests below in lock-step with `src/lib.incn` public function exports +// so query blocks do not silently drift away from ordinary `pub::inql` helper calls. +const QUERY_BLOCK_HELPER_EXPORTS: &[&str] = &[ "col", "lit", + "always_false", + "always_true", + "bool_expr", + "bool_lit", + "float_expr", + "int_expr", + "int_lit", + "str_expr", + "str_lit", + "count", + "count_expr", + "count_distinct", + "count_if", + "sum", + "avg", + "min", + "max", + "approx_count_distinct", + "approx_percentile", + "hll_deserialize", + "hll_estimate", + "hll_merge", + "hll_serialize", + "hll_sketch", + "is_array", + "is_boolean", + "is_float", + "is_integer", + "is_null_value", + "is_object", + "is_string", + "is_timestamp", + "parse_variant_json", + "try_parse_variant_json", + "typeof", + "variant_get", + "abs", + "acos", + "asin", + "atan", + "atan2", + "ceil", + "cos", + "degrees", + "exp", + "floor", + "greatest", + "least", + "ln", + "log", + "log10", + "power", + "radians", + "round", + "sign", + "sin", + "sqrt", + "tan", + "char_length", + "concat", + "concat_ws", + "lcase", + "left", + "lower", + "lpad", + "ltrim", + "octet_length", + "overlay", + "position", + "repeat", + "replace", + "right", + "rpad", + "rtrim", + "split_part", + "substr", + "substring", + "translate", + "trim", + "ucase", + "upper", + "base64", + "decode", + "encode", + "hex", + "unbase64", + "unhex", + "regexp_extract", + "regexp_like", + "regexp_replace", + "current_date", + "current_time", + "current_timestamp", + "date_add", + "date_diff", + "date_part", + "date_sub", + "date_trunc", + "dateadd", + "datediff", + "extract", + "from_unixtime", + "last_day", + "make_date", + "make_time", + "make_timestamp", + "time_trunc", + "timestamp_diff", + "to_date", + "to_time", + "to_timestamp", + "unix_micros", + "unix_millis", + "unix_seconds", "array", + "array_contains", "array_distinct", + "array_except", + "array_flatten", + "array_intersect", + "array_join", + "array_position", + "array_range", + "array_reverse", + "array_slice", + "array_sort", + "array_union", + "arrays_overlap", + "cardinality", + "element_at", + "map_contains_key", + "map_entries", + "map_extract", + "map_from_arrays", + "map_keys", + "map_values", + "named_struct", + "explode", + "explode_outer", + "flatten", + "inline", + "inline_outer", + "posexplode", + "posexplode_outer", + "stack", + "window", + "row_number", + "rank", + "dense_rank", + "percent_rank", + "cume_dist", + "ntile", + "lag", + "lead", + "first_value", + "last_value", + "nth_value", + "current_row", + "following", + "preceding", + "unbounded_following", + "unbounded_preceding", + "md5", + "crc32", + "sha1", + "sha2", + "sha224", + "sha256", + "sha384", + "sha512", + "xxhash64", + "parse_url", + "try_url_decode", + "url_decode", + "url_encode", + "check_json", + "from_json", + "get_json_object", + "json_array_length", + "json_extract_path_text", + "json_object_keys", + "parse_json", + "schema_of_json", + "to_json", + "try_from_json", + "from_csv", + "schema_of_csv", + "to_csv", "add", "sub", "mul", @@ -37,41 +226,27 @@ const HELPER_EXPORTS: &[&str] = &[ "or_", "not_", "neg", + "equal_null", + "cast", + "try_cast", + "safe_cast", + "between", + "in_", + "is_nan", + "is_not_nan", + "is_not_null", + "is_null", + "case_when", + "coalesce", + "nullif", "asc", + "asc_nulls_first", + "asc_nulls_last", "desc", - "explode", - "sum", - "count_distinct", - "count", - "avg", - "min", - "max", - "lower", - "upper", - "trim", - "round", - "date_part", - "sha256", - "check_json", - "json_extract_path_text", - "json_array_length", - "parse_url", - "regexp_extract", - "regexp_like", + "desc_nulls_first", + "desc_nulls_last", "aggregate_as", "with_column_assignment", - "window", - "row_number", - "rank", - "dense_rank", - "percent_rank", - "cume_dist", - "ntile", - "lag", - "lead", - "first_value", - "last_value", - "nth_value", ]; #[must_use] @@ -141,7 +316,7 @@ pub fn library_vocab() -> VocabRegistration { } fn helper_bindings() -> Vec { - HELPER_EXPORTS + QUERY_BLOCK_HELPER_EXPORTS .iter() .map(|name| HelperBinding { key: (*name).to_string(), @@ -151,3 +326,95 @@ fn helper_bindings() -> Vec { } incan_vocab::export_wasm_desugarer!(InqlQueryDesugarer); + +#[cfg(test)] +mod tests { + use std::collections::BTreeSet; + + use super::*; + + const PUBLIC_FACADE: &str = include_str!("../../src/lib.incn"); + const NON_QUERY_EXPRESSION_FUNCTION_EXPORTS: &[&str] = &[ + "display", + "function_registry", + "function_registry_canonical_names", + "function_registry_entries", + "function_registry_entry", + "function_registry_entry_by_name", + "function_registry_entry_count", + "function_registry_function_refs", + "registered_substrait_mapped_function_refs", + ]; + + #[test] + fn helper_manifest_matches_declared_query_block_exports() { + let registration = library_vocab(); + let manifest_bindings = ®istration.metadata().library_manifest.helper_bindings; + let manifest_keys: Vec<&str> = manifest_bindings + .iter() + .map(|binding| binding.key.as_str()) + .collect(); + + assert_eq!(manifest_keys, QUERY_BLOCK_HELPER_EXPORTS); + for binding in manifest_bindings { + assert_eq!(binding.key, binding.exported_name); + } + } + + #[test] + fn query_block_exports_cover_public_function_helpers() { + let helper_names: BTreeSet<&str> = QUERY_BLOCK_HELPER_EXPORTS.iter().copied().collect(); + assert_eq!(helper_names.len(), QUERY_BLOCK_HELPER_EXPORTS.len()); + + for exported_name in public_function_exports() { + if NON_QUERY_EXPRESSION_FUNCTION_EXPORTS.contains(&exported_name.as_str()) { + continue; + } + assert!( + helper_names.contains(exported_name.as_str()), + "`{exported_name}` is exported from `src/lib.incn` but is not available to query-block expressions", + ); + } + } + + fn public_function_exports() -> BTreeSet { + public_imports_with_prefix(PUBLIC_FACADE, "pub from functions.") + } + + fn public_imports_with_prefix(source: &str, prefix: &str) -> BTreeSet { + let mut names = BTreeSet::new(); + let mut in_matching_import = false; + for line in source.lines() { + let trimmed = line.trim(); + if in_matching_import { + if trimmed == ")" { + in_matching_import = false; + } else { + add_import_symbols(trimmed, &mut names); + } + continue; + } + if !trimmed.starts_with(prefix) { + continue; + } + let Some((_, imported)) = trimmed.split_once(" import ") else { + continue; + }; + if imported == "(" { + in_matching_import = true; + } else { + add_import_symbols(imported, &mut names); + } + } + names + } + + fn add_import_symbols(segment: &str, names: &mut BTreeSet) { + for raw_name in segment.trim_end_matches(',').split(',') { + let name = raw_name.trim(); + if !name.is_empty() { + names.insert(name.to_string()); + } + } + } +} From 3eb7f935a6372b291b66a9b7e1e48ced18d7e18d Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 12:12:15 +0200 Subject: [PATCH 05/11] chore - use Incan release branch in CI (#4) --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 0a09fe7..66424c4 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -19,7 +19,7 @@ concurrency: env: CARGO_TERM_COLOR: always - INCAN_REF: bugfix/755-dependency-union-public-boundary + INCAN_REF: release/v0.3 EXPECTED_INCAN_VERSION: 0.3.0-rc50 RUST_BACKTRACE: 1 INCAN_NO_BANNER: 1 From 7d412610366c4aa5cbf593573dc56378ed48040f Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 13:34:08 +0200 Subject: [PATCH 06/11] chore - install vocab wasm target in CI (#4) --- .github/workflows/ci.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 66424c4..3a6cb49 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -48,6 +48,8 @@ jobs: - name: Install Rust toolchain uses: dtolnay/rust-toolchain@stable + with: + targets: wasm32-wasip1 - name: Cache Incan build artifacts uses: Swatinem/rust-cache@v2 From c74cdfd01017494c790d92e1e2e393ae38535409 Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 13:37:39 +0200 Subject: [PATCH 07/11] chore - use Incan install action in CI (#4) --- .github/workflows/ci.yml | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3a6cb49..37bde96 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,8 +1,8 @@ # InQL CI — Incan library package # -# Builds the Incan compiler from source in CI, then runs the InQL package -# checks against that local binary. Keeping this workflow self-contained avoids -# a hard dependency on a remote composite action path staying in sync. +# Checks out the pinned Incan compiler source, installs it through Incan's +# downstream install action, then runs the InQL package checks against that +# local binary. name: CI @@ -46,15 +46,14 @@ jobs: ref: ${{ env.INCAN_REF }} path: incan - - name: Install Rust toolchain - uses: dtolnay/rust-toolchain@stable + - name: Install Incan compiler + uses: ./incan/.github/actions/install-incan with: - targets: wasm32-wasip1 + profile: debug + cache-shared-key: inql-incan-${{ runner.os }}-${{ env.EXPECTED_INCAN_VERSION }} - - name: Cache Incan build artifacts - uses: Swatinem/rust-cache@v2 - with: - workspaces: incan -> target + - name: Install Incan vocab target + run: rustup target add wasm32-wasip1 - name: Cache generated InQL Cargo artifacts uses: actions/cache@v4 @@ -78,13 +77,6 @@ jobs: inql-rust-inspect-${{ runner.os }}-incan-${{ env.EXPECTED_INCAN_VERSION }}-${{ hashFiles('incan.lock', 'incan.toml') }}- inql-rust-inspect-${{ runner.os }}-incan-${{ env.EXPECTED_INCAN_VERSION }}- - - name: Build Incan compiler - working-directory: incan - run: cargo build --locked --bin incan - - - name: Expose local Incan binary on PATH - run: echo "$GITHUB_WORKSPACE/incan/target/debug" >> "$GITHUB_PATH" - - name: Show toolchain run: | incan --version From 6e6af9f244ac282f4bc7c14885d9a393c68252ce Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 13:48:27 +0200 Subject: [PATCH 08/11] chore - validate Incan install action targets (#4) --- .github/workflows/ci.yml | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 37bde96..3b1abce 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -19,7 +19,7 @@ concurrency: env: CARGO_TERM_COLOR: always - INCAN_REF: release/v0.3 + INCAN_REF: bugfix/188-install-incan-targets EXPECTED_INCAN_VERSION: 0.3.0-rc50 RUST_BACKTRACE: 1 INCAN_NO_BANNER: 1 @@ -52,9 +52,6 @@ jobs: profile: debug cache-shared-key: inql-incan-${{ runner.os }}-${{ env.EXPECTED_INCAN_VERSION }} - - name: Install Incan vocab target - run: rustup target add wasm32-wasip1 - - name: Cache generated InQL Cargo artifacts uses: actions/cache@v4 with: From 6d20215c2a4df0855d501222683933ca7e27182f Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 15:09:34 +0200 Subject: [PATCH 09/11] chore - use fixed Incan release action in CI (#4) --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3b1abce..96a6b7e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -19,7 +19,7 @@ concurrency: env: CARGO_TERM_COLOR: always - INCAN_REF: bugfix/188-install-incan-targets + INCAN_REF: release/v0.3 EXPECTED_INCAN_VERSION: 0.3.0-rc50 RUST_BACKTRACE: 1 INCAN_NO_BANNER: 1 From 7b4b749b2230aad1ef8d1b5fe7b780eee7f679b3 Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 15:23:06 +0200 Subject: [PATCH 10/11] docs - document query-block operator lowering (#4) --- docs/language/reference/query_blocks.md | 19 +++++++++++++++++++ docs/rfcs/003_inql_query_blocks.md | 13 +++++++++++++ .../src/main.incn | 4 ++-- 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/docs/language/reference/query_blocks.md b/docs/language/reference/query_blocks.md index 68074b4..bd41f22 100644 --- a/docs/language/reference/query_blocks.md +++ b/docs/language/reference/query_blocks.md @@ -53,6 +53,25 @@ The implemented v0.1 query-block surface supports: `ORDER BY` uses InQL ordering helpers such as `asc(...)` and `desc(...)`; postfix SQL spellings such as `.amount DESC` are not part of the v0.1 query-block grammar. +## Expressions + +Query-block expressions use Incan expression operators and desugar to the same InQL helper calls available in ordinary +method-chain code: + +| Query expression | Helper equivalent | +| ---------------- | ----------------- | +| `.status == "paid"` | `eq(.status, "paid")` | +| `.status != "paid"` | `ne(.status, "paid")` | +| `.amount < 100` | `lt(.amount, 100)` | +| `.amount <= 100` | `lte(.amount, 100)` | +| `.amount > 100` | `gt(.amount, 100)` | +| `.amount >= 100` | `gte(.amount, 100)` | + +The comparison helper names use `lte` and `gte` for inclusive bounds; `le` and `ge` are not public helper names. +Arithmetic operators lower the same way: `+` to `add`, `-` to `sub`, `*` to `mul`, `/` to `div`, and `%` to +`modulo`. Boolean and unary operators lower to their helper forms as well, such as `and_`, `or_`, `not_`, and `neg`. +Use `==` for equality; a single `=` remains assignment/binding syntax, not a query predicate. + ## Resolution - `.column` refers to the primary `FROM` relation or the current query schema after a projection boundary. diff --git a/docs/rfcs/003_inql_query_blocks.md b/docs/rfcs/003_inql_query_blocks.md index 5cf69c2..1aea08e 100644 --- a/docs/rfcs/003_inql_query_blocks.md +++ b/docs/rfcs/003_inql_query_blocks.md @@ -90,6 +90,19 @@ Inside relational expression positions (`WHERE`, `JOIN ON`, `GROUP BY`, `ORDER B 2. `relation.column` → named join relation. 3. Bare identifier → current query schema first, then lexical Incan binding where permitted. +### Expression operators + +Relational expression bodies use ordinary Incan expression operators and lower them into InQL's public helper surface. +Implementations must treat `left == right`, `left != right`, `left < right`, `left <= right`, `left > right`, and +`left >= right` as equivalent to `eq(left, right)`, `ne(left, right)`, `lt(left, right)`, `lte(left, right)`, +`gt(left, right)`, and `gte(left, right)` respectively. Arithmetic operators lower through `add`, `sub`, `mul`, +`div`, and `modulo`; boolean and unary operators lower through their helper equivalents such as `and_`, `or_`, +`not_`, and `neg`. + +Inclusive comparison helpers are named `lte` and `gte`; `le` and `ge` are not part of the public helper surface. +Single `=` is not a predicate equality operator in query expressions. Equality uses `==`; `=` remains reserved for +assignment/binding positions such as named window declarations. + ### `SELECT` and alias publication - `SELECT` defines a projection boundary; output columns become the schema for later clauses in the block. diff --git a/examples/advanced_retail_query_blocks/src/main.incn b/examples/advanced_retail_query_blocks/src/main.incn index 09f03ab..74f5f64 100644 --- a/examples/advanced_retail_query_blocks/src/main.incn +++ b/examples/advanced_retail_query_blocks/src/main.incn @@ -186,7 +186,7 @@ def paid_rollup(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailRollup]: # Keep reusable derivations outside the aggregate query so the grouping block stays focused on relational shape. return query { FROM enriched - WHERE eq(.status_norm, "paid") + WHERE .status_norm == "paid" GROUP BY .region_norm, .channel @@ -208,7 +208,7 @@ def generated_ranked_tags(orders: LazyFrame[RetailOrder]) -> LazyFrame[RetailGen # This block demonstrates generator and window composition inside the query vocabulary. return query { FROM orders - WHERE eq(check_json(.event_json), true) + WHERE check_json(.event_json) == true EXPLODE # Build derived tags from scalar helpers, then expand them into one row per tag. array_distinct( From 3362a6eb98dcdba6025eac6e2506a74a90a34352 Mon Sep 17 00:00:00 2001 From: Danny Meijer Date: Fri, 5 Jun 2026 15:37:33 +0200 Subject: [PATCH 11/11] docs - reflow InQL markdown prose (#4) --- docs/language/reference/dataset_methods.md | 4 +- .../reference/functions/approximate.md | 22 ++----- docs/language/reference/functions/format.md | 14 ++-- .../reference/functions/generators.md | 9 +-- docs/language/reference/functions/sketches.md | 16 ++--- docs/language/reference/functions/variants.md | 14 +--- docs/language/reference/functions/windows.md | 13 +--- docs/language/reference/query_blocks.md | 22 ++----- docs/rfcs/003_inql_query_blocks.md | 16 +---- docs/rfcs/004_inql_execution_context.md | 6 +- docs/rfcs/021_generator_table_functions.md | 3 +- docs/rfcs/023_approximate_sketch_functions.md | 43 ++++-------- docs/rfcs/025_typed_sketch_logical_values.md | 65 +++++-------------- .../026_semi_structured_variant_values.md | 36 +++------- examples/README.md | 10 +-- 15 files changed, 74 insertions(+), 219 deletions(-) diff --git a/docs/language/reference/dataset_methods.md b/docs/language/reference/dataset_methods.md index 5c49147..b51b334 100644 --- a/docs/language/reference/dataset_methods.md +++ b/docs/language/reference/dataset_methods.md @@ -24,9 +24,7 @@ The Substrait helper surface behind these methods is split by semantic role: | `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. | | `limit` | `def limit(self, n: int) -> Self` | Cap row count. | -`SameCarrier[U]` means `DataFrame[U]` for `DataFrame[T]`, `LazyFrame[U]` for `LazyFrame[T]`, and `DataStream[U]` -for `DataStream[T]`. The root `DataSet[T]` trait remains the common plan/schema contract; schema-changing -projection is expressed on concrete carriers until Incan grows native trait type-family support. +`SameCarrier[U]` means `DataFrame[U]` for `DataFrame[T]`, `LazyFrame[U]` for `LazyFrame[T]`, and `DataStream[U]` for `DataStream[T]`. The root `DataSet[T]` trait remains the common plan/schema contract; schema-changing projection is expressed on concrete carriers until Incan grows native trait type-family support. ## `with_column` diff --git a/docs/language/reference/functions/approximate.md b/docs/language/reference/functions/approximate.md index f2c1647..bb95d02 100644 --- a/docs/language/reference/functions/approximate.md +++ b/docs/language/reference/functions/approximate.md @@ -1,7 +1,6 @@ # Approximate Functions (Reference) -Approximate helpers are explicit opt-in functions. InQL does not silently replace exact aggregates with approximate -execution because a backend can do so. +Approximate helpers are explicit opt-in functions. InQL does not silently replace exact aggregates with approximate execution because a backend can do so. The portable RFC 023 aggregate surface is: @@ -23,21 +22,10 @@ summary = ( ) ``` -`approx_count_distinct` is registered as an approximate aggregate with HyperLogLog-family metadata. The portable author -contract is an approximate non-null distinct-count estimate. It does not expose a user-tunable relative-error parameter -because the registered InQL Substrait extension mapping for this function is unary. Backend adapters must keep this -approximation visible in capability/error handling rather than redefining exact `count_distinct` semantics. +`approx_count_distinct` is registered as an approximate aggregate with HyperLogLog-family metadata. The portable author contract is an approximate non-null distinct-count estimate. It does not expose a user-tunable relative-error parameter because the registered InQL Substrait extension mapping for this function is unary. Backend adapters must keep this approximation visible in capability/error handling rather than redefining exact `count_distinct` semantics. -`approx_percentile` is registered as an approximate aggregate with t-digest-family metadata. `percentile` must be between -`0.0` and `1.0` inclusive. `accuracy` must be positive and is carried as an explicit aggregate argument so backend -capability handling can accept, emulate, or reject the requested approximation instead of silently changing semantics. -Generated aggregate output names include the percentile and accuracy arguments. +`approx_percentile` is registered as an approximate aggregate with t-digest-family metadata. `percentile` must be between `0.0` and `1.0` inclusive. `accuracy` must be positive and is carried as an explicit aggregate argument so backend capability handling can accept, emulate, or reject the requested approximation instead of silently changing semantics. Generated aggregate output names include the percentile and accuracy arguments. -Both helpers lower through registered InQL Substrait aggregate extension names. The DataFusion adapter maps -`approx_count_distinct` to DataFusion's `approx_distinct` implementation and maps `approx_percentile` to -`approx_percentile_cont` at the backend boundary. +Both helpers lower through registered InQL Substrait aggregate extension names. The DataFusion adapter maps `approx_count_distinct` to DataFusion's `approx_distinct` implementation and maps `approx_percentile` to `approx_percentile_cont` at the backend boundary. -Sketch-state construction, merge, estimate, serialization, and deserialization are implemented by -[Sketch functions](sketches.md). Those helpers use typed sketch logical values with sketch family, value domain, merge -compatibility, and serialized format identity. Exposing sketch state as strings or binary payloads would violate the RFC -023 type-safety requirement. +Sketch-state construction, merge, estimate, serialization, and deserialization are implemented by [Sketch functions](sketches.md). Those helpers use typed sketch logical values with sketch family, value domain, merge compatibility, and serialized format identity. Exposing sketch state as strings or binary payloads would violate the RFC 023 type-safety requirement. diff --git a/docs/language/reference/functions/format.md b/docs/language/reference/functions/format.md index 44ca5b2..7f737ab 100644 --- a/docs/language/reference/functions/format.md +++ b/docs/language/reference/functions/format.md @@ -1,7 +1,6 @@ # Format Functions (Reference) -Format functions transform scalar values that are already present in a relation. Source discovery, file reads, and -relation reshaping belong to the session and relational APIs rather than this function family. +Format functions transform scalar values that are already present in a relation. Source discovery, file reads, and relation reshaping belong to the session and relational APIs rather than this function family. The format catalog includes deterministic hashes, URL helpers, JSON helpers, and CSV helpers: @@ -55,13 +54,8 @@ projected = ( ) ``` -Hash helpers operate on UTF-8 string bytes and return lowercase hexadecimal strings. `sha2(...)` accepts `224`, `256`, -`384`, and `512`; other digest lengths are rejected during expression construction. +Hash helpers operate on UTF-8 string bytes and return lowercase hexadecimal strings. `sha2(...)` accepts `224`, `256`, `384`, and `512`; other digest lengths are rejected during expression construction. -JSON helpers validate, normalize, and project payload text. CSV parsing returns logical map values instead of JSON text. -Explicit-schema JSON and CSV helpers derive their schema from Incan model type parameters. These helpers do not read -external files or return typed variant values. Use [Variant functions](variants.md) when a plan needs semi-structured -kind inspection. +JSON helpers validate, normalize, and project payload text. CSV parsing returns logical map values instead of JSON text. Explicit-schema JSON and CSV helpers derive their schema from Incan model type parameters. These helpers do not read external files or return typed variant values. Use [Variant functions](variants.md) when a plan needs semi-structured kind inspection. -The DataFusion adapter executes the full RFC 022 catalog with native DataFusion functions where available and -Incan-authored adapter callbacks for helpers that DataFusion does not expose natively. +The DataFusion adapter executes the full RFC 022 catalog with native DataFusion functions where available and Incan-authored adapter callbacks for helpers that DataFusion does not expose natively. diff --git a/docs/language/reference/functions/generators.md b/docs/language/reference/functions/generators.md index cccbd55..b543c47 100644 --- a/docs/language/reference/functions/generators.md +++ b/docs/language/reference/functions/generators.md @@ -1,7 +1,6 @@ # Generator and Table-Valued Functions (Reference) -Generators are relation-shaping operations. They are registry-backed like scalar and aggregate helpers, but they return -`GeneratorApplication` values and must be applied through a relation method such as `generate(...)`. +Generators are relation-shaping operations. They are registry-backed like scalar and aggregate helpers, but they return `GeneratorApplication` values and must be applied through a relation method such as `generate(...)`. ```incan from pub::inql import LazyFrame @@ -32,8 +31,6 @@ The explicit generator surface currently includes: | `flatten(expr, as_)` | one value column | Portable table-valued flatten for one array expression. | | `stack(row_count, values, output_columns)` | declared output columns | Emits `row_count` generated rows from row-major scalar values. | -Generator applications preserve input columns and append generated columns in declaration order. Generated aliases are -required, must be non-empty, and must not collide with existing input columns. +Generator applications preserve input columns and append generated columns in declaration order. Generated aliases are required, must be non-empty, and must not collide with existing input columns. -Nested scalar helpers such as `array_flatten(...)` remain scalar expressions. They do not expand rows and are documented -on the [nested data functions](nested.md) page. The relation-shaping `flatten(...)` helper is intentionally separate. +Nested scalar helpers such as `array_flatten(...)` remain scalar expressions. They do not expand rows and are documented on the [nested data functions](nested.md) page. The relation-shaping `flatten(...)` helper is intentionally separate. diff --git a/docs/language/reference/functions/sketches.md b/docs/language/reference/functions/sketches.md index e08f7a7..efc5ccd 100644 --- a/docs/language/reference/functions/sketches.md +++ b/docs/language/reference/functions/sketches.md @@ -1,7 +1,6 @@ # Sketch Functions (Reference) -Sketch helpers model approximate state as typed logical values, not as ordinary strings or binary payloads. The first -portable family is HyperLogLog. +Sketch helpers model approximate state as typed logical values, not as ordinary strings or binary payloads. The first portable family is HyperLogLog. | Function | Meaning | | --- | --- | @@ -36,15 +35,8 @@ reported = monthly.with_column( ) ``` -Sketch compatibility is structural. HyperLogLog sketches can merge only when family, value domain, precision, and -serialization format match. `hll_deserialize(...)` requires those facts because they cannot be inferred from a payload -alone. +Sketch compatibility is structural. HyperLogLog sketches can merge only when family, value domain, precision, and serialization format match. `hll_deserialize(...)` requires those facts because they cannot be inferred from a payload alone. -The public helper surface follows the typed value-or-column conventions used by the rest of the function catalog: -`hll_sketch(...)` accepts primitive values or scalar expressions, while `hll_deserialize(...)` accepts string payload -values or scalar expressions. +The public helper surface follows the typed value-or-column conventions used by the rest of the function catalog: `hll_sketch(...)` accepts primitive values or scalar expressions, while `hll_deserialize(...)` accepts string payload values or scalar expressions. -RFC 025 helpers lower through InQL-owned Substrait extension mappings and carry sketch metadata in function options. The -DataFusion adapter reports a backend planning diagnostic for typed sketch execution because it has no sketch runtime -implementation. That rejection is an adapter capability boundary; the InQL plan remains typed and -backend-neutral. +RFC 025 helpers lower through InQL-owned Substrait extension mappings and carry sketch metadata in function options. The DataFusion adapter reports a backend planning diagnostic for typed sketch execution because it has no sketch runtime implementation. That rejection is an adapter capability boundary; the InQL plan remains typed and backend-neutral. diff --git a/docs/language/reference/functions/variants.md b/docs/language/reference/functions/variants.md index c562587..a729481 100644 --- a/docs/language/reference/functions/variants.md +++ b/docs/language/reference/functions/variants.md @@ -1,8 +1,6 @@ # Variant Functions (Reference) -Variant helpers model semi-structured payloads as typed logical values, not as ordinary JSON strings. Use RFC 022 JSON -helpers when you want text validation or normalized payload strings. Use variant helpers when you need kind-aware -inspection while preserving the distinction between SQL null and a present semi-structured null value. +Variant helpers model semi-structured payloads as typed logical values, not as ordinary JSON strings. Use RFC 022 JSON helpers when you want text validation or normalized payload strings. Use variant helpers when you need kind-aware inspection while preserving the distinction between SQL null and a present semi-structured null value. | Function | Meaning | | --- | --- | @@ -37,12 +35,6 @@ projected = ( ) ``` -`typeof(...)` accepts a `VariantExpr` value and returns a `StringColumnExpr`. Variant predicates accept -`VariantExpr` values and return `BoolColumnExpr` values. They do not parse strings directly. Parse helpers accept -`StrValueOrColumn` inputs; that keeps parsing, variant inspection, and RFC 022 JSON text helpers separate without -forcing authors to wrap literal payloads in `lit(...)`. +`typeof(...)` accepts a `VariantExpr` value and returns a `StringColumnExpr`. Variant predicates accept `VariantExpr` values and return `BoolColumnExpr` values. They do not parse strings directly. Parse helpers accept `StrValueOrColumn` inputs; that keeps parsing, variant inspection, and RFC 022 JSON text helpers separate without forcing authors to wrap literal payloads in `lit(...)`. -RFC 026 helpers lower through InQL-owned Substrait extension mappings and carry variant metadata in function options. -The DataFusion adapter currently reports a backend planning diagnostic for typed variant execution because it has no -variant runtime implementation. That rejection is an adapter capability boundary; the InQL plan remains typed and -backend-neutral. +RFC 026 helpers lower through InQL-owned Substrait extension mappings and carry variant metadata in function options. The DataFusion adapter currently reports a backend planning diagnostic for typed variant execution because it has no variant runtime implementation. That rejection is an adapter capability boundary; the InQL plan remains typed and backend-neutral. diff --git a/docs/language/reference/functions/windows.md b/docs/language/reference/functions/windows.md index a41d9e3..600c673 100644 --- a/docs/language/reference/functions/windows.md +++ b/docs/language/reference/functions/windows.md @@ -1,8 +1,6 @@ # Window Functions (Reference) -Window helpers are relation-aware. A window function application produces one output value per input row while reading a -partition of related rows. It is not an ordinary scalar expression and must be placed through a projection-like dataset -method. +Window helpers are relation-aware. A window function application produces one output value per input row while reading a partition of related rows. It is not an ordinary scalar expression and must be placed through a projection-like dataset method. ```incan from pub::inql import LazyFrame @@ -39,11 +37,6 @@ The window helper surface includes: | `first_value(expr)`, `last_value(expr)`, `nth_value(expr, n)` | Read a value from the current frame. | Use `.over(window().order_by(...))`, then `with_window_column(...)`; value calls may use `.ignore_nulls()` or `.respect_nulls()` before `.over(...)`. | | `sum(...)`, `count(...)`, `avg(...)`, `min(...)`, `max(...)` | Reuse aggregate helpers over a window frame. | Call `.over(window_spec)` on the aggregate measure, then `with_window_column(...)`. | -`WindowSpec.partition_by(...)` replaces the partition expressions. `WindowSpec.order_by(...)` replaces the ordering -expressions. `WindowSpec.rows_between(...)` and `WindowSpec.range_between(...)` replace the frame. Ranking, -distribution, offset, and value helpers require explicit ordering; missing ordering is rejected during logical lowering. +`WindowSpec.partition_by(...)` replaces the partition expressions. `WindowSpec.order_by(...)` replaces the ordering expressions. `WindowSpec.rows_between(...)` and `WindowSpec.range_between(...)` replace the frame. Ranking, distribution, offset, and value helpers require explicit ordering; missing ordering is rejected during logical lowering. -`with_window_column(name, application)` preserves input columns and adds or replaces `name` using add-or-replace -projection semantics. Compatible adjacent window projections lower through Substrait `ConsistentPartitionWindowRel` with -registry-backed function anchors, frame bounds, invocation metadata, null-treatment options, and output aliases. The -DataFusion session backend executes the portable window helpers through the Substrait adapter boundary. +`with_window_column(name, application)` preserves input columns and adds or replaces `name` using add-or-replace projection semantics. Compatible adjacent window projections lower through Substrait `ConsistentPartitionWindowRel` with registry-backed function anchors, frame bounds, invocation metadata, null-treatment options, and output aliases. The DataFusion session backend executes the portable window helpers through the Substrait adapter boundary. diff --git a/docs/language/reference/query_blocks.md b/docs/language/reference/query_blocks.md index bd41f22..0d4d97c 100644 --- a/docs/language/reference/query_blocks.md +++ b/docs/language/reference/query_blocks.md @@ -1,7 +1,6 @@ # Query blocks (Reference) -Query blocks are dependency-activated InQL expressions. Import `pub::inql` to make the vocabulary and helper surface -available in a downstream Incan package. +Query blocks are dependency-activated InQL expressions. Import `pub::inql` to make the vocabulary and helper surface available in a downstream Incan package. ```incan from pub::inql import DataFrame, count, desc, sum @@ -20,9 +19,7 @@ def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: } ``` -The `OrderSummary` type parameter documents the intended output row model. The v0.1 implementation checks query -schema evolution and selected aliases through the carrier planning surface; full field/type compatibility validation -against annotated output models is tracked as schema-validation follow-up work. +The `OrderSummary` type parameter documents the intended output row model. The v0.1 implementation checks query schema evolution and selected aliases through the carrier planning surface; full field/type compatibility validation against annotated output models is tracked as schema-validation follow-up work. InQL also accepts the colon spelling in expression position: @@ -50,13 +47,11 @@ The implemented v0.1 query-block surface supports: - `EXPLODE as ` - `WINDOW BY = ` -`ORDER BY` uses InQL ordering helpers such as `asc(...)` and `desc(...)`; postfix SQL spellings such as -`.amount DESC` are not part of the v0.1 query-block grammar. +`ORDER BY` uses InQL ordering helpers such as `asc(...)` and `desc(...)`; postfix SQL spellings such as `.amount DESC` are not part of the v0.1 query-block grammar. ## Expressions -Query-block expressions use Incan expression operators and desugar to the same InQL helper calls available in ordinary -method-chain code: +Query-block expressions use Incan expression operators and desugar to the same InQL helper calls available in ordinary method-chain code: | Query expression | Helper equivalent | | ---------------- | ----------------- | @@ -67,10 +62,7 @@ method-chain code: | `.amount > 100` | `gt(.amount, 100)` | | `.amount >= 100` | `gte(.amount, 100)` | -The comparison helper names use `lte` and `gte` for inclusive bounds; `le` and `ge` are not public helper names. -Arithmetic operators lower the same way: `+` to `add`, `-` to `sub`, `*` to `mul`, `/` to `div`, and `%` to -`modulo`. Boolean and unary operators lower to their helper forms as well, such as `and_`, `or_`, `not_`, and `neg`. -Use `==` for equality; a single `=` remains assignment/binding syntax, not a query predicate. +The comparison helper names use `lte` and `gte` for inclusive bounds; `le` and `ge` are not public helper names. Arithmetic operators lower the same way: `+` to `add`, `-` to `sub`, `*` to `mul`, `/` to `div`, and `%` to `modulo`. Boolean and unary operators lower to their helper forms as well, such as `and_`, `or_`, `not_`, and `neg`. Use `==` for equality; a single `=` remains assignment/binding syntax, not a query predicate. ## Resolution @@ -79,6 +71,4 @@ Use `==` for equality; a single `=` remains assignment/binding syntax, not a que - `SELECT` aliases become the output schema for later clauses. - A `SELECT` alias may be reused by later expressions in the same `SELECT` list. -Query blocks desugar into the same carrier method calls available to ordinary InQL code before lowering through the -current carrier planning path. `LazyFrame` flows are Prism-backed; concrete `DataFrame` and `DataStream` flows still use -their documented carrier paths before converging at the Substrait boundary. +Query blocks desugar into the same carrier method calls available to ordinary InQL code before lowering through the current carrier planning path. `LazyFrame` flows are Prism-backed; concrete `DataFrame` and `DataStream` flows still use their documented carrier paths before converging at the Substrait boundary. diff --git a/docs/rfcs/003_inql_query_blocks.md b/docs/rfcs/003_inql_query_blocks.md index 1aea08e..d89b88f 100644 --- a/docs/rfcs/003_inql_query_blocks.md +++ b/docs/rfcs/003_inql_query_blocks.md @@ -56,10 +56,7 @@ def summarize_orders(orders: DataFrame[Order]) -> DataFrame[OrderSummary]: } ``` -The compiler checks `.status`, `.amount`, and `GROUP BY` / `SELECT` consistency. The `DataFrame[OrderSummary]` return -type records the intended output row model; full field/type compatibility validation against annotated output models is -tracked as schema-validation follow-up work. The checked tree lowers to Substrait (InQL RFC 002); execution uses the -execution context. +The compiler checks `.status`, `.amount`, and `GROUP BY` / `SELECT` consistency. The `DataFrame[OrderSummary]` return type records the intended output row model; full field/type compatibility validation against annotated output models is tracked as schema-validation follow-up work. The checked tree lowers to Substrait (InQL RFC 002); execution uses the execution context. ## Reference-level explanation @@ -92,16 +89,9 @@ Inside relational expression positions (`WHERE`, `JOIN ON`, `GROUP BY`, `ORDER B ### Expression operators -Relational expression bodies use ordinary Incan expression operators and lower them into InQL's public helper surface. -Implementations must treat `left == right`, `left != right`, `left < right`, `left <= right`, `left > right`, and -`left >= right` as equivalent to `eq(left, right)`, `ne(left, right)`, `lt(left, right)`, `lte(left, right)`, -`gt(left, right)`, and `gte(left, right)` respectively. Arithmetic operators lower through `add`, `sub`, `mul`, -`div`, and `modulo`; boolean and unary operators lower through their helper equivalents such as `and_`, `or_`, -`not_`, and `neg`. +Relational expression bodies use ordinary Incan expression operators and lower them into InQL's public helper surface. Implementations must treat `left == right`, `left != right`, `left < right`, `left <= right`, `left > right`, and `left >= right` as equivalent to `eq(left, right)`, `ne(left, right)`, `lt(left, right)`, `lte(left, right)`, `gt(left, right)`, and `gte(left, right)` respectively. Arithmetic operators lower through `add`, `sub`, `mul`, `div`, and `modulo`; boolean and unary operators lower through their helper equivalents such as `and_`, `or_`, `not_`, and `neg`. -Inclusive comparison helpers are named `lte` and `gte`; `le` and `ge` are not part of the public helper surface. -Single `=` is not a predicate equality operator in query expressions. Equality uses `==`; `=` remains reserved for -assignment/binding positions such as named window declarations. +Inclusive comparison helpers are named `lte` and `gte`; `le` and `ge` are not part of the public helper surface. Single `=` is not a predicate equality operator in query expressions. Equality uses `==`; `=` remains reserved for assignment/binding positions such as named window declarations. ### `SELECT` and alias publication diff --git a/docs/rfcs/004_inql_execution_context.md b/docs/rfcs/004_inql_execution_context.md index 922ce49..1858af2 100644 --- a/docs/rfcs/004_inql_execution_context.md +++ b/docs/rfcs/004_inql_execution_context.md @@ -348,8 +348,7 @@ Non-normative: the reference implementation **should** use DataFusion's `Session ## Implementation plan and checklist (non-normative) -This section tracks the implementation path for this RFC. It is intentionally operational and does not change the -normative semantics above. +This section tracks the implementation path for this RFC. It is intentionally operational and does not change the normative semantics above. ### Plan @@ -377,5 +376,4 @@ normative semantics above. ### Exit criteria for RFC status change -RFC 004 can move from `In Progress` to `Implemented` when all checklist items above are complete and the InQL CI gate -is green on the target release branch. +RFC 004 can move from `In Progress` to `Implemented` when all checklist items above are complete and the InQL CI gate is green on the target release branch. diff --git a/docs/rfcs/021_generator_table_functions.md b/docs/rfcs/021_generator_table_functions.md index 764ed91..1866665 100644 --- a/docs/rfcs/021_generator_table_functions.md +++ b/docs/rfcs/021_generator_table_functions.md @@ -42,8 +42,7 @@ InQL already has an unnest/explode design direction through its Substrait work. ## Guide-level explanation (how authors think about it) -Authors should use generators when one input row may become multiple output rows. In the current builder surface, -generators are constructed as explicit applications and then applied to a relation: +Authors should use generators when one input row may become multiple output rows. In the current builder surface, generators are constructed as explicit applications and then applied to a relation: ```incan from pub::inql.functions import col, explode diff --git a/docs/rfcs/023_approximate_sketch_functions.md b/docs/rfcs/023_approximate_sketch_functions.md index 6fdd11c..f09d516 100644 --- a/docs/rfcs/023_approximate_sketch_functions.md +++ b/docs/rfcs/023_approximate_sketch_functions.md @@ -17,20 +17,13 @@ ## Summary -This RFC defines the portable approximate aggregate boundary for InQL and records the sketch-state policy decision. InQL -exposes explicit approximate aggregates for distinct counts and percentiles. It delegates sketch-state construction, -merge, estimate, serialization, and deserialization helpers to InQL RFC 025 because those helpers require typed sketch -logical values rather than ordinary string or binary payloads. +This RFC defines the portable approximate aggregate boundary for InQL and records the sketch-state policy decision. InQL exposes explicit approximate aggregates for distinct counts and percentiles. It delegates sketch-state construction, merge, estimate, serialization, and deserialization helpers to InQL RFC 025 because those helpers require typed sketch logical values rather than ordinary string or binary payloads. ## Motivation -Spark exposes many approximate and sketch functions because large-scale analytics often trades exactness for bounded -memory or faster execution. InQL should support the portable part of that direction, but sketch functions require more -than names: they need accuracy parameters, merge semantics, serialization formats, determinism rules, and typed opaque -state values. +Spark exposes many approximate and sketch functions because large-scale analytics often trades exactness for bounded memory or faster execution. InQL should support the portable part of that direction, but sketch functions require more than names: they need accuracy parameters, merge semantics, serialization formats, determinism rules, and typed opaque state values. -If sketches are added as ordinary functions returning untyped bytes, InQL will not be able to reason about compatibility, -aggregation state, or cross-backend behavior. +If sketches are added as ordinary functions returning untyped bytes, InQL will not be able to reason about compatibility, aggregation state, or cross-backend behavior. ## Goals @@ -68,26 +61,15 @@ The function names and arguments should make it clear that results are approxima ## Reference-level explanation (precise rules) -Approximate aggregate functions must be registered as approximate. Their registry entries must declare accuracy -parameters, deterministic behavior for fixed inputs and parameters, mergeability, and result interpretation. +Approximate aggregate functions must be registered as approximate. Their registry entries must declare accuracy parameters, deterministic behavior for fixed inputs and parameters, mergeability, and result interpretation. -`approx_count_distinct(expr)` returns an approximate cardinality estimate over non-null expression values. It is a -HyperLogLog-family aggregate. The portable helper intentionally has no relative-error parameter because the registered -InQL Substrait extension mapping is unary; backend-specific precision controls must not be smuggled into the portable -helper contract. +`approx_count_distinct(expr)` returns an approximate cardinality estimate over non-null expression values. It is a HyperLogLog-family aggregate. The portable helper intentionally has no relative-error parameter because the registered InQL Substrait extension mapping is unary; backend-specific precision controls must not be smuggled into the portable helper contract. -`approx_percentile(expr, percentile, accuracy=10000)` returns an approximate percentile estimate over numeric non-null -expression values. `percentile` is a literal fraction in the inclusive range `[0.0, 1.0]`. `accuracy` is a positive -integer approximation hint carried as a normal aggregate argument. The portable contract is an approximate percentile -estimate, not a bit-for-bit promise of one backend's interpolation or sketch representation. +`approx_percentile(expr, percentile, accuracy=10000)` returns an approximate percentile estimate over numeric non-null expression values. `percentile` is a literal fraction in the inclusive range `[0.0, 1.0]`. `accuracy` is a positive integer approximation hint carried as a normal aggregate argument. The portable contract is an approximate percentile estimate, not a bit-for-bit promise of one backend's interpolation or sketch representation. -Sketch-construction functions are reserved for InQL RFC 025 and are not lowerable in this RFC. When they are introduced, -they must return typed sketch values, not untyped binary blobs. Sketch values may have opaque runtime representation, but -their logical type must identify the sketch family and value domain. +Sketch-construction functions are reserved for InQL RFC 025 and are not lowerable in this RFC. When they are introduced, they must return typed sketch values, not untyped binary blobs. Sketch values may have opaque runtime representation, but their logical type must identify the sketch family and value domain. -Sketch union, intersection, estimation, serialization, and deserialization functions are likewise reserved until InQL -can accept only compatible sketch types and reject incompatible sketch-family or value-domain combinations before -execution. +Sketch union, intersection, estimation, serialization, and deserialization functions are likewise reserved until InQL can accept only compatible sketch types and reject incompatible sketch-family or value-domain combinations before execution. If serialized sketch formats are exposed, format versioning and cross-version compatibility must be specified. @@ -95,20 +77,17 @@ If serialized sketch formats are exposed, format versioning and cross-version co ### Syntax -This RFC permits ordinary function-call syntax for approximate aggregate functions. Reserved sketch helpers will use the -same call style if InQL RFC 025 admits them. RFC 023 does not require special query syntax. +This RFC permits ordinary function-call syntax for approximate aggregate functions. Reserved sketch helpers will use the same call style if InQL RFC 025 admits them. RFC 023 does not require special query syntax. ### Semantics -Approximate functions must be opt-in by name or explicit option. InQL must not silently replace an exact aggregate with -an approximate aggregate because a backend prefers it. +Approximate functions must be opt-in by name or explicit option. InQL must not silently replace an exact aggregate with an approximate aggregate because a backend prefers it. Sketch merge functions must define whether they are associative, commutative, idempotent, or order-sensitive. ### Interaction with other InQL surfaces -Approximate aggregates may appear anywhere aggregate measures are valid if their registry entry supports the position. -Sketch scalar helpers are not exposed until sketch expressions have typed logical values. +Approximate aggregates may appear anywhere aggregate measures are valid if their registry entry supports the position. Sketch scalar helpers are not exposed until sketch expressions have typed logical values. ### Compatibility / migration diff --git a/docs/rfcs/025_typed_sketch_logical_values.md b/docs/rfcs/025_typed_sketch_logical_values.md index 416f617..394591f 100644 --- a/docs/rfcs/025_typed_sketch_logical_values.md +++ b/docs/rfcs/025_typed_sketch_logical_values.md @@ -16,9 +16,7 @@ ## Summary -This RFC defines typed sketch logical values for InQL. Sketch helpers must not be modeled as ordinary strings or binary -blobs; they must produce and consume logical sketch values that record sketch family, input value domain, -parameterization, merge compatibility, and serialization format identity. +This RFC defines typed sketch logical values for InQL. Sketch helpers must not be modeled as ordinary strings or binary blobs; they must produce and consume logical sketch values that record sketch family, input value domain, parameterization, merge compatibility, and serialization format identity. ## Core model @@ -32,14 +30,9 @@ parameterization, merge compatibility, and serialization format identity. ## Motivation -Approximate aggregates such as `approx_count_distinct(...)` and `approx_percentile(...)` are useful when authors only -need a scalar result. Sketch state is different: authors may want to materialize a sketch, merge sketches across -partitions or files, estimate from a stored sketch later, or serialize sketch state for transport. Those operations -require compatibility rules that cannot be represented by `bytes` or `str` alone. +Approximate aggregates such as `approx_count_distinct(...)` and `approx_percentile(...)` are useful when authors only need a scalar result. Sketch state is different: authors may want to materialize a sketch, merge sketches across partitions or files, estimate from a stored sketch later, or serialize sketch state for transport. Those operations require compatibility rules that cannot be represented by `bytes` or `str` alone. -If InQL accepts untyped sketch blobs, it cannot reject invalid operations such as merging a HyperLogLog sketch with a KLL -sketch, merging sketches over different value domains, or deserializing a payload with an incompatible format version. -That would push semantic validation into backend-specific runtime failures and weaken the Substrait boundary. +If InQL accepts untyped sketch blobs, it cannot reject invalid operations such as merging a HyperLogLog sketch with a KLL sketch, merging sketches over different value domains, or deserializing a payload with an incompatible format version. That would push semantic validation into backend-specific runtime failures and weaken the Substrait boundary. ## Goals @@ -61,9 +54,7 @@ That would push semantic validation into backend-specific runtime failures and w ## Guide-level explanation (how authors think about it) -Authors should think of a sketch as a typed summary value. It can be produced by an aggregate, stored as a column when the -carrier supports it, merged with compatible sketches, and estimated later. RFC 025 ships the first concrete family: -HyperLogLog. +Authors should think of a sketch as a typed summary value. It can be produced by an aggregate, stored as a column when the carrier supports it, merged with compatible sketches, and estimated later. RFC 025 ships the first concrete family: HyperLogLog. ```incan from pub::inql.functions import col, hll_sketch @@ -83,8 +74,7 @@ reported = monthly.with_column( ) ``` -The important part is that `users_hll` is not a `str` or `bytes` column. It is a sketch logical value with a family, -value domain, precision, and serialized format contract. +The important part is that `users_hll` is not a `str` or `bytes` column. It is a sketch logical value with a family, value domain, precision, and serialized format contract. ## Reference-level explanation (precise rules) @@ -97,54 +87,35 @@ A sketch logical value must carry at least: - format identity and version when the value can be serialized; - nullability and ordinary column-position metadata needed by existing InQL expression and relation surfaces. -Sketch construction helpers that summarize rows must be aggregate measures. They must declare approximate-result metadata -and sketch-output metadata in the function registry. They must not appear where row-level scalar expressions are required -unless the helper is explicitly a scalar transformation over an existing sketch value. +Sketch construction helpers that summarize rows must be aggregate measures. They must declare approximate-result metadata and sketch-output metadata in the function registry. They must not appear where row-level scalar expressions are required unless the helper is explicitly a scalar transformation over an existing sketch value. -Sketch merge helpers must validate sketch family compatibility before lowering. InQL must reject merges between different -families, incompatible value domains, incompatible parameter sets, or incompatible format versions unless the specific -family declares a safe coercion or union rule. +Sketch merge helpers must validate sketch family compatibility before lowering. InQL must reject merges between different families, incompatible value domains, incompatible parameter sets, or incompatible format versions unless the specific family declares a safe coercion or union rule. -Sketch estimate helpers must declare the scalar result they produce. HyperLogLog-style estimates should return -approximate cardinality results. KLL-style quantile helpers must require explicit percentile or rank arguments. -Count-min-style lookup helpers must require an item expression whose domain is compatible with the sketch domain. +Sketch estimate helpers must declare the scalar result they produce. HyperLogLog-style estimates should return approximate cardinality results. KLL-style quantile helpers must require explicit percentile or rank arguments. Count-min-style lookup helpers must require an item expression whose domain is compatible with the sketch domain. -Sketch serialization helpers must be explicit. InQL must not implicitly coerce a sketch value to `str` or `bytes`. -Deserialization must require enough type metadata to identify family, domain, parameters, and format version before the -value can participate in merge or estimate operations. +Sketch serialization helpers must be explicit. InQL must not implicitly coerce a sketch value to `str` or `bytes`. Deserialization must require enough type metadata to identify family, domain, parameters, and format version before the value can participate in merge or estimate operations. -Substrait lowering must preserve sketch logical type identity through extension type metadata or must reject the -operation before execution. A backend adapter may map the sketch operation to a native implementation only when it can -preserve the InQL sketch contract. +Substrait lowering must preserve sketch logical type identity through extension type metadata or must reject the operation before execution. A backend adapter may map the sketch operation to a native implementation only when it can preserve the InQL sketch contract. ## Design details ### Syntax -This RFC does not require new language syntax. Sketch helpers may use ordinary function-call syntax and aggregate-builder -syntax. Type annotations for deserialization may require a future surface if ordinary helper arguments cannot carry -enough type information ergonomically. +This RFC does not require new language syntax. Sketch helpers may use ordinary function-call syntax and aggregate-builder syntax. Type annotations for deserialization may require a future surface if ordinary helper arguments cannot carry enough type information ergonomically. ### Semantics -Sketch values are opaque to ordinary scalar operators. Equality, ordering, string operations, binary operations, and -arithmetic must not accept sketch values unless a later RFC defines a specific semantic rule. Sketch values may be -grouped, projected, stored, merged, estimated, or serialized only through helpers that declare sketch-aware metadata. +Sketch values are opaque to ordinary scalar operators. Equality, ordering, string operations, binary operations, and arithmetic must not accept sketch values unless a later RFC defines a specific semantic rule. Sketch values may be grouped, projected, stored, merged, estimated, or serialized only through helpers that declare sketch-aware metadata. -Sketch families must define whether merge is associative, commutative, idempotent, or order-sensitive. Families must -define which parameters are part of merge compatibility and which parameters are execution hints. +Sketch families must define whether merge is associative, commutative, idempotent, or order-sensitive. Families must define which parameters are part of merge compatibility and which parameters are execution hints. ### Interaction with other InQL surfaces -Dataframe method chains and future query-block syntax must resolve to the same sketch logical value model. A sketch -helper that is rejected in one authoring surface must not become valid in another. +Dataframe method chains and future query-block syntax must resolve to the same sketch logical value model. A sketch helper that is rejected in one authoring surface must not become valid in another. -Prism may preserve sketch type metadata and may use it for validation, projection pruning, and rewrite safety. Prism must -not rewrite sketch operations in ways that drop family, domain, parameter, or serialization metadata. +Prism may preserve sketch type metadata and may use it for validation, projection pruning, and rewrite safety. Prism must not rewrite sketch operations in ways that drop family, domain, parameter, or serialization metadata. -The Substrait boundary must remain between InQL semantics and backend execution. DataFusion or any other backend may be -the first implementation target, but backend-native sketch names and payload formats do not define the portable InQL -type. +The Substrait boundary must remain between InQL semantics and backend execution. DataFusion or any other backend may be the first implementation target, but backend-native sketch names and payload formats do not define the portable InQL type. ### Implementation @@ -168,9 +139,7 @@ The implemented first family is HyperLogLog: ### Compatibility / migration -This RFC is additive. RFC 023 approximate scalar-result aggregates remain valid. Existing string or binary columns are -not retroactively treated as sketch values. If a backend or existing dataset stores sketch bytes, authors must use -explicit deserialization with the required sketch type metadata. +This RFC is additive. RFC 023 approximate scalar-result aggregates remain valid. Existing string or binary columns are not retroactively treated as sketch values. If a backend or existing dataset stores sketch bytes, authors must use explicit deserialization with the required sketch type metadata. ## Alternatives considered diff --git a/docs/rfcs/026_semi_structured_variant_values.md b/docs/rfcs/026_semi_structured_variant_values.md index b0de9a4..4fd245c 100644 --- a/docs/rfcs/026_semi_structured_variant_values.md +++ b/docs/rfcs/026_semi_structured_variant_values.md @@ -70,40 +70,21 @@ Authors who only need text validation or normalized JSON strings should keep usi ## Reference-level explanation (precise rules) -InQL defines `VariantLogicalType` and `VariantExpr`. A variant logical type records a `VariantKind` and `VariantEncoding`. -The implemented portable kind set is `any`, `null`, `boolean`, `integer`, `float`, `string`, `timestamp`, `array`, and -`object`. The first implemented encoding is JSON. +InQL defines `VariantLogicalType` and `VariantExpr`. A variant logical type records a `VariantKind` and `VariantEncoding`. The implemented portable kind set is `any`, `null`, `boolean`, `integer`, `float`, `string`, `timestamp`, `array`, and `object`. The first implemented encoding is JSON. Variant predicates must accept variant expressions. They must not accept ordinary `str` expressions as an implicit parse-and-inspect shortcut. Authors must use an explicit variant parse or cast helper when starting from JSON text. `typeof(expr)` must return a stable lowercase kind name for a non-null variant value. It must distinguish at least `null`, `boolean`, `integer`, `float`, `string`, `timestamp`, `array`, and `object`. It must not report `timestamp` for a plain JSON string unless an explicit schema or parse option produced a typed timestamp variant. -`is_null_value(expr)`, `is_boolean(expr)`, `is_integer(expr)`, `is_float(expr)`, `is_string(expr)`, -`is_timestamp(expr)`, `is_array(expr)`, and `is_object(expr)` must inspect the variant kind. `is_integer(...)` must be -true only for integer variant values, not floating point values whose runtime value happens to have no fractional -component. `is_null_value(...)` must be true only for semi-structured null values. +`is_null_value(expr)`, `is_boolean(expr)`, `is_integer(expr)`, `is_float(expr)`, `is_string(expr)`, `is_timestamp(expr)`, `is_array(expr)`, and `is_object(expr)` must inspect the variant kind. `is_integer(...)` must be true only for integer variant values, not floating point values whose runtime value happens to have no fractional component. `is_null_value(...)` must be true only for semi-structured null values. SQL null must remain distinct from variant null. If a predicate input is SQL null rather than a present variant value, the predicate result must follow InQL's scalar null behavior for missing inputs rather than returning true for `is_null_value(...)`. -`parse_variant_json(payload)` is the strict JSON-to-variant helper. `try_parse_variant_json(payload)` is the -recoverable form. Their payload parameter accepts `StrValueOrColumn`, so authors may pass a string literal, a string -column reference, or a string-producing expression without wrapping primitive values in `lit(...)`. Strict parse helpers -must fail malformed payloads according to registry error metadata. Recoverable parse helpers must return SQL null or -another explicitly documented recoverable result for malformed payloads. A JSON `null` payload must produce a present -variant null, not SQL null. - -`variant_get(expr, path)` accesses a variant path. Literal path strings are validated as beginning with `$`. String -columns or string-producing expressions are accepted as dynamic paths and are validated at execution time by the -implementation that evaluates the expression. The current path spelling matches the RFC 022 JSON path helper spelling so -authors do not learn two root-marker conventions, but it remains a variant operation rather than a JSON-text extraction -shortcut. Variant field/path access must preserve whether a missing path produced SQL null, variant null, or a present -value. If a backend cannot preserve that distinction, the adapter must reject the operation or require an explicit -compatibility mode. - -Substrait lowering preserves variant logical type identity in registry metadata and scalar function options. A backend -adapter may map variant values and predicates to native functions only when it preserves the InQL variant contract. The -DataFusion adapter currently reports a backend planning diagnostic for typed variant execution because it has no variant -runtime implementation. +`parse_variant_json(payload)` is the strict JSON-to-variant helper. `try_parse_variant_json(payload)` is the recoverable form. Their payload parameter accepts `StrValueOrColumn`, so authors may pass a string literal, a string column reference, or a string-producing expression without wrapping primitive values in `lit(...)`. Strict parse helpers must fail malformed payloads according to registry error metadata. Recoverable parse helpers must return SQL null or another explicitly documented recoverable result for malformed payloads. A JSON `null` payload must produce a present variant null, not SQL null. + +`variant_get(expr, path)` accesses a variant path. Literal path strings are validated as beginning with `$`. String columns or string-producing expressions are accepted as dynamic paths and are validated at execution time by the implementation that evaluates the expression. The current path spelling matches the RFC 022 JSON path helper spelling so authors do not learn two root-marker conventions, but it remains a variant operation rather than a JSON-text extraction shortcut. Variant field/path access must preserve whether a missing path produced SQL null, variant null, or a present value. If a backend cannot preserve that distinction, the adapter must reject the operation or require an explicit compatibility mode. + +Substrait lowering preserves variant logical type identity in registry metadata and scalar function options. A backend adapter may map variant values and predicates to native functions only when it preserves the InQL variant contract. The DataFusion adapter currently reports a backend planning diagnostic for typed variant execution because it has no variant runtime implementation. ## Design details @@ -142,8 +123,7 @@ The implemented public model is: `is_boolean(...)`, `is_integer(...)`, `is_float(...)`, `is_string(...)`, `is_timestamp(...)`, `is_array(...)`, and `is_object(...)` return `BoolColumnExpr`. -Each public helper is registry-backed with explicit variant policy metadata. Variant helpers lower through InQL-owned -Substrait extension mappings and carry variant kind, encoding, and parse mode as scalar function options where needed. +Each public helper is registry-backed with explicit variant policy metadata. Variant helpers lower through InQL-owned Substrait extension mappings and carry variant kind, encoding, and parse mode as scalar function options where needed. ## Alternatives considered diff --git a/examples/README.md b/examples/README.md index 188d73b..97ff540 100644 --- a/examples/README.md +++ b/examples/README.md @@ -4,9 +4,7 @@ Examples demonstrating InQL model-shaped dataset types, scalar-expression builde ## Overview -The examples are split between compile-safe API shape examples and executable Session flows. Together they show how model -types, typed carriers, scalar expressions, grouped aggregates, reads, writes, collection, and display fit together in -ordinary Incan code. +The examples are split between compile-safe API shape examples and executable Session flows. Together they show how model types, typed carriers, scalar expressions, grouped aggregates, reads, writes, collection, and display fit together in ordinary Incan code. ## Example structure @@ -40,16 +38,14 @@ incan run examples/advanced_retail_analytics.incn ## Advanced spike -`advanced_retail_analytics.incn` reads `tests/fixtures/advanced_retail_orders.csv`, a 100-row CSV fixture with -quoted JSON event payloads. It materializes three outputs: +`advanced_retail_analytics.incn` reads `tests/fixtures/advanced_retail_orders.csv`, a 100-row CSV fixture with quoted JSON event payloads. It materializes three outputs: - an enriched high-value order view with string cleanup, math, date extraction, JSON validation/extraction, URL query extraction, hashing, regex, and nested array helpers - a grouped paid-order rollup using `sum`, `avg`, `min`, `max`, `count`, and `count_distinct` - a generated tag view that composes window ranking with `explode(...)` -`advanced_retail_query_blocks/` is the same fixture exercised from a standalone dependency consumer. It imports -`pub::inql` and runs query blocks for the high-value projection, grouped rollup, and generated-tag window view: +`advanced_retail_query_blocks/` is the same fixture exercised from a standalone dependency consumer. It imports `pub::inql` and runs query blocks for the high-value projection, grouped rollup, and generated-tag window view: ```incan high_value = query {