Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 9 additions & 18 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# InQL CI — Incan library package
#
# Builds the Incan compiler from source in CI, then runs the InQL package
# checks against that local binary. Keeping this workflow self-contained avoids
# a hard dependency on a remote composite action path staying in sync.
# Checks out the pinned Incan compiler source, installs it through Incan's
# downstream install action, then runs the InQL package checks against that
# local binary.

name: CI

Expand All @@ -19,8 +19,8 @@ concurrency:

env:
CARGO_TERM_COLOR: always
INCAN_REF: feature/750-primitive-type-tokens
EXPECTED_INCAN_VERSION: 0.3.0-rc47
INCAN_REF: release/v0.3
EXPECTED_INCAN_VERSION: 0.3.0-rc50
RUST_BACKTRACE: 1
INCAN_NO_BANNER: 1
INCAN_GENERATED_CARGO_TARGET_DIR: ${{ github.workspace }}/.incan-generated-cargo-target
Expand All @@ -46,13 +46,11 @@ jobs:
ref: ${{ env.INCAN_REF }}
path: incan

- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable

- name: Cache Incan build artifacts
uses: Swatinem/rust-cache@v2
- name: Install Incan compiler
uses: ./incan/.github/actions/install-incan
with:
workspaces: incan -> target
profile: debug
cache-shared-key: inql-incan-${{ runner.os }}-${{ env.EXPECTED_INCAN_VERSION }}

- name: Cache generated InQL Cargo artifacts
uses: actions/cache@v4
Expand All @@ -76,13 +74,6 @@ jobs:
inql-rust-inspect-${{ runner.os }}-incan-${{ env.EXPECTED_INCAN_VERSION }}-${{ hashFiles('incan.lock', 'incan.toml') }}-
inql-rust-inspect-${{ runner.os }}-incan-${{ env.EXPECTED_INCAN_VERSION }}-

- name: Build Incan compiler
working-directory: incan
run: cargo build --locked --bin incan

- name: Expose local Incan binary on PATH
run: echo "$GITHUB_WORKSPACE/incan/target/debug" >> "$GITHUB_PATH"

- name: Show toolchain
run: |
incan --version
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# will have compiled files and executables
debug
target
incan

# These are backup files generated by rustfmt
**/*.rs.bk
Expand Down
33 changes: 25 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ test: ## Run package tests (`incan test tests`)
@echo "\033[1mRunning InQL tests...\033[0m"
@$(INCAN) test $(INQL_TEST_DIR)

.PHONY: vocab-companion-test
vocab-companion-test: ## Run Rust tests for the query-block vocabulary companion
@echo "\033[1mRunning query-block vocabulary companion tests...\033[0m"
@cargo test --manifest-path vocab_companion/Cargo.toml

.PHONY: test-style
test-style: ## Validate test style markers (Arrange / Act / Assert) across `tests/*.incn`
@echo "\033[1mChecking test style markers...\033[0m"
Expand All @@ -62,21 +67,27 @@ test-locked: ## Run tests with `--locked`
@$(INCAN) test $(INQL_TEST_DIR) --locked

# =============================================================================
# Formatting (Incan source — package only)
# Formatting (Incan source)
# =============================================================================
#
# Scope to `src/`, `tests/`, and `examples/` only. CI checks out the Incan
# compiler under `./incan/`; formatting `.` would walk that tree and fail on
# stdlib snapshots and test fixtures that are not meant for `incan fmt`.
# Scope to InQL-owned source paths. CI checks out the Incan compiler under
# `./incan/`; formatting `.` would walk that tree and fail on stdlib snapshots
# and test fixtures that are not meant for `incan fmt`. Standalone example
# packages are listed by source directory so generated `target/` output stays
# outside the formatting walk.

INQL_FMT_DIRS := src tests examples
INQL_FMT_DIRS := src tests examples/advanced_retail_query_blocks/src
INQL_FMT_FILES := examples/*.incn

.PHONY: fmt
fmt: ## Format package `.incn` sources (`incan fmt` per directory)
@echo "\033[1mFormatting Incan sources (package dirs)...\033[0m"
@for d in $(INQL_FMT_DIRS); do \
if [ -d "$$d" ]; then $(INCAN) fmt "$$d"; fi; \
done
@for f in $(INQL_FMT_FILES); do \
if [ -f "$$f" ]; then $(INCAN) fmt "$$f"; fi; \
done

.PHONY: fmt-check
fmt-check: ## Check formatting without writing (`incan fmt --check` per directory)
Expand All @@ -87,21 +98,27 @@ fmt-check: ## Check formatting without writing (`incan fmt --check` per director
$(INCAN) fmt --check "$$d" || exit $$?; \
fi; \
done
@for f in $(INQL_FMT_FILES); do \
if [ -f "$$f" ]; then \
echo "\033[1m -> $$f\033[0m"; \
$(INCAN) fmt --check "$$f" || exit $$?; \
fi; \
done

# =============================================================================
# Aggregates (local gates)
# =============================================================================

.PHONY: check
check: fmt-check test-style registry-metadata build test ## Format check, style gate, metadata check, build, and test
check: fmt-check test-style vocab-companion-test registry-metadata build test ## Format check, style gate, metadata check, build, and test
@echo "\033[32m✓ check passed\033[0m"

.PHONY: pre-commit
pre-commit: fmt-check test-style registry-metadata build test ## Fast gate before commit (same as `check`)
pre-commit: fmt-check test-style vocab-companion-test registry-metadata build test ## Fast gate before commit (same as `check`)
@echo "\033[32m✓ pre-commit gate passed\033[0m"

.PHONY: ci
ci: fmt-check test-style registry-metadata build test smoke-consumer ## Same steps as GitHub Actions `inql` job
ci: fmt-check test-style vocab-companion-test registry-metadata build test smoke-consumer ## Same steps as GitHub Actions `inql` job
@echo "\033[32m✓ ci gate passed\033[0m"

.PHONY: verify
Expand Down
2 changes: 2 additions & 0 deletions docs/language/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This section documents the current InQL package surface.

- [Dataset carriers (Reference)][dataset-reference]
- [Dataset carriers (Explanation)][dataset-explanation]
- [Query blocks (Reference)][query-blocks-reference]

### Execution and materialization

Expand All @@ -29,6 +30,7 @@ This section documents the current InQL package surface.
[explanation]: explanation/
[dataset-reference]: reference/dataset_carriers.md
[dataset-explanation]: explanation/dataset_carriers.md
[query-blocks-reference]: reference/query_blocks.md
[execution-reference]: reference/execution_context.md
[execution-explanation]: explanation/execution_context.md
[substrait-read-root]: reference/substrait/read_root_binding_contract.md
Expand Down
2 changes: 1 addition & 1 deletion docs/language/reference/dataset_carriers.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Deferred logical pipeline. Always bounded.

### `DataStream[T]`

Streaming specialization. Shares the `DataSet[T]` API while carrying unbounded semantics.
Streaming specialization. Shares the carrier method surface while carrying unbounded semantics.

## Related reference pages

Expand Down
16 changes: 9 additions & 7 deletions docs/language/reference/dataset_methods.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Dataset methods (Reference)

This page documents the current `DataSet[T]` method surface. Builder-function details live under `reference/builders/`.
This page documents the current carrier method surface. Builder-function details live under `reference/builders/`.

The Substrait helper surface behind these methods is split by semantic role:

Expand All @@ -9,21 +9,23 @@ The Substrait helper surface behind these methods is split by semantic role:
- `src/substrait/inspect.incn` owns relation/plan inspection and output-column inference
- `src/schema_registry.incn` owns logical named-table schema binding

## Shared method surface
## Carrier method surface

| Method | Signature | Meaning |
| ------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- |
| `filter` | `def filter(self, predicate: ColumnExpr) -> Self` | Restrict rows by a boolean scalar expression. |
| `join` | `def join(self, other: Self, on: bool) -> Self` | Combine with another same-carrier relation using the package's boolean join predicate surface. |
| `select` | `def select(self) -> Self` | Preserve the current projection shape as an identity projection. |
| `join` | `def join(self, other: Self, on: ColumnExpr) -> Self` | Combine with another same-carrier relation using the package's scalar predicate surface. |
| `select` | `def select[U](self, assignments: list[ProjectionAssignment] = []) -> SameCarrier[U]` | Project an output row shape while preserving the carrier kind. |
| `with_column` | `def with_column(self, name: str, expr: ColumnExpr) -> Self` | Add or replace one projected column using a scalar expression. |
| `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. |
| `agg` | `def agg(self, measures: list[AggregateMeasure]) -> Self` | Apply aggregate measures over the current relation or current grouping. |
| `generate` | `def generate(self, generator: GeneratorApplication) -> Self` | Apply a relation-shaping generator such as `explode(...)` with explicit output aliases. |
| `with_window_column` | `def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self` | Add or replace one projected column using a placed window function. |
| `with_window_column` | `def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self` | Add or replace one projected column using a named window function. |
| `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. |
| `limit` | `def limit(self, n: int) -> Self` | Cap row count. |

`SameCarrier[U]` means `DataFrame[U]` for `DataFrame[T]`, `LazyFrame[U]` for `LazyFrame[T]`, and `DataStream[U]` for `DataStream[T]`. The root `DataSet[T]` trait remains the common plan/schema contract; schema-changing projection is expressed on concrete carriers until Incan grows native trait type-family support.

## `with_column`

### Signature
Expand Down Expand Up @@ -66,8 +68,8 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:

## Capability notes

- `join(...)` is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature.
- `select(...)` preserves projection shape; explicit projection lists are represented today through `with_column(...)` and scalar-expression builders.
- `join(...)` is constrained to same-carrier inputs and the `ColumnExpr` predicate surface shown in the signature.
- `select(...)` is the schema-changing projection boundary used by query blocks. Identity `select()` preserves the current row model through its surrounding expected type, while explicit assignments can retarget to a new row model.
- `generate(...)` preserves all input columns and appends generated output aliases for `explode`, `explode_outer`, `posexplode`, `posexplode_outer`, `inline`, `inline_outer`, `flatten`, and `stack` generator applications. Alias collisions are rejected during planning/lowering.
- `with_window_column(...)` supports placed ranking, distribution, offset, value, and aggregate-over-window helpers over explicit window specs. Portable helpers lower through Substrait window relations and execute through the DataFusion session adapter.
- `DataFrame[T]` exposes materialized metadata and preview text; row-level accessors belong to the materialized DataFrame API surface.
Expand Down
22 changes: 5 additions & 17 deletions docs/language/reference/functions/approximate.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Approximate Functions (Reference)

Approximate helpers are explicit opt-in functions. InQL does not silently replace exact aggregates with approximate
execution because a backend can do so.
Approximate helpers are explicit opt-in functions. InQL does not silently replace exact aggregates with approximate execution because a backend can do so.

The portable RFC 023 aggregate surface is:

Expand All @@ -23,21 +22,10 @@ summary = (
)
```

`approx_count_distinct` is registered as an approximate aggregate with HyperLogLog-family metadata. The portable author
contract is an approximate non-null distinct-count estimate. It does not expose a user-tunable relative-error parameter
because the registered InQL Substrait extension mapping for this function is unary. Backend adapters must keep this
approximation visible in capability/error handling rather than redefining exact `count_distinct` semantics.
`approx_count_distinct` is registered as an approximate aggregate with HyperLogLog-family metadata. The portable author contract is an approximate non-null distinct-count estimate. It does not expose a user-tunable relative-error parameter because the registered InQL Substrait extension mapping for this function is unary. Backend adapters must keep this approximation visible in capability/error handling rather than redefining exact `count_distinct` semantics.

`approx_percentile` is registered as an approximate aggregate with t-digest-family metadata. `percentile` must be between
`0.0` and `1.0` inclusive. `accuracy` must be positive and is carried as an explicit aggregate argument so backend
capability handling can accept, emulate, or reject the requested approximation instead of silently changing semantics.
Generated aggregate output names include the percentile and accuracy arguments.
`approx_percentile` is registered as an approximate aggregate with t-digest-family metadata. `percentile` must be between `0.0` and `1.0` inclusive. `accuracy` must be positive and is carried as an explicit aggregate argument so backend capability handling can accept, emulate, or reject the requested approximation instead of silently changing semantics. Generated aggregate output names include the percentile and accuracy arguments.

Both helpers lower through registered InQL Substrait aggregate extension names. The DataFusion adapter maps
`approx_count_distinct` to DataFusion's `approx_distinct` implementation and maps `approx_percentile` to
`approx_percentile_cont` at the backend boundary.
Both helpers lower through registered InQL Substrait aggregate extension names. The DataFusion adapter maps `approx_count_distinct` to DataFusion's `approx_distinct` implementation and maps `approx_percentile` to `approx_percentile_cont` at the backend boundary.

Sketch-state construction, merge, estimate, serialization, and deserialization are implemented by
[Sketch functions](sketches.md). Those helpers use typed sketch logical values with sketch family, value domain, merge
compatibility, and serialized format identity. Exposing sketch state as strings or binary payloads would violate the RFC
023 type-safety requirement.
Sketch-state construction, merge, estimate, serialization, and deserialization are implemented by [Sketch functions](sketches.md). Those helpers use typed sketch logical values with sketch family, value domain, merge compatibility, and serialized format identity. Exposing sketch state as strings or binary payloads would violate the RFC 023 type-safety requirement.
14 changes: 4 additions & 10 deletions docs/language/reference/functions/format.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Format Functions (Reference)

Format functions transform scalar values that are already present in a relation. Source discovery, file reads, and
relation reshaping belong to the session and relational APIs rather than this function family.
Format functions transform scalar values that are already present in a relation. Source discovery, file reads, and relation reshaping belong to the session and relational APIs rather than this function family.

The format catalog includes deterministic hashes, URL helpers, JSON helpers, and CSV helpers:

Expand Down Expand Up @@ -55,13 +54,8 @@ projected = (
)
```

Hash helpers operate on UTF-8 string bytes and return lowercase hexadecimal strings. `sha2(...)` accepts `224`, `256`,
`384`, and `512`; other digest lengths are rejected during expression construction.
Hash helpers operate on UTF-8 string bytes and return lowercase hexadecimal strings. `sha2(...)` accepts `224`, `256`, `384`, and `512`; other digest lengths are rejected during expression construction.

JSON helpers validate, normalize, and project payload text. CSV parsing returns logical map values instead of JSON text.
Explicit-schema JSON and CSV helpers derive their schema from Incan model type parameters. These helpers do not read
external files or return typed variant values. Use [Variant functions](variants.md) when a plan needs semi-structured
kind inspection.
JSON helpers validate, normalize, and project payload text. CSV parsing returns logical map values instead of JSON text. Explicit-schema JSON and CSV helpers derive their schema from Incan model type parameters. These helpers do not read external files or return typed variant values. Use [Variant functions](variants.md) when a plan needs semi-structured kind inspection.

The DataFusion adapter executes the full RFC 022 catalog with native DataFusion functions where available and
Incan-authored adapter callbacks for helpers that DataFusion does not expose natively.
The DataFusion adapter executes the full RFC 022 catalog with native DataFusion functions where available and Incan-authored adapter callbacks for helpers that DataFusion does not expose natively.
9 changes: 3 additions & 6 deletions docs/language/reference/functions/generators.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Generator and Table-Valued Functions (Reference)

Generators are relation-shaping operations. They are registry-backed like scalar and aggregate helpers, but they return
`GeneratorApplication` values and must be applied through a relation method such as `generate(...)`.
Generators are relation-shaping operations. They are registry-backed like scalar and aggregate helpers, but they return `GeneratorApplication` values and must be applied through a relation method such as `generate(...)`.

```incan
from pub::inql import LazyFrame
Expand Down Expand Up @@ -32,8 +31,6 @@ The explicit generator surface currently includes:
| `flatten(expr, as_)` | one value column | Portable table-valued flatten for one array expression. |
| `stack(row_count, values, output_columns)` | declared output columns | Emits `row_count` generated rows from row-major scalar values. |

Generator applications preserve input columns and append generated columns in declaration order. Generated aliases are
required, must be non-empty, and must not collide with existing input columns.
Generator applications preserve input columns and append generated columns in declaration order. Generated aliases are required, must be non-empty, and must not collide with existing input columns.

Nested scalar helpers such as `array_flatten(...)` remain scalar expressions. They do not expand rows and are documented
on the [nested data functions](nested.md) page. The relation-shaping `flatten(...)` helper is intentionally separate.
Nested scalar helpers such as `array_flatten(...)` remain scalar expressions. They do not expand rows and are documented on the [nested data functions](nested.md) page. The relation-shaping `flatten(...)` helper is intentionally separate.
Loading
Loading