xorq: do-anything, run-anywhere pandas-style pipelines

xorq is a deferred computation toolchain that brings the replicability and performance of declarative pipelines to the Python ML ecosystem. It enables us to write pandas-style transformations that never run out of memory, automatically cache intermediate results, and seamlessly move between SQL engines and Python UDFs—all while maintaining replicability. xorq is built on top of Ibis and DataFusion.

Feature	Description
Declarative expressions	xorq lets you define transformations as Ibis expressions so that you are not tiedd to a specific execution engine. The `.into_backend()` method in xorq enables seamless transitions between engines within a single pipeline.
Built-in caching	xorq automatically tracks the computational graph of your pipeline and caches intermediate results when `cache` operator is invoked, minimizing repeated work.
Multi-engine	Create unified ML workflows that leverage the strengths of different data engines in a single pipeline. xorq orchestrates data movement between engines (e.g., Snowflake for initial extraction, DuckDB for transformations, and Python for ML model training).
Serializable pipelines	All pipeline definitions, including UDFs, are serialized to YAML format, enabling robust version control, reproducibility, and CI/CD integration. This serialization captures the complete execution graph, ensuring consistent results across environments and making it easy to track changes over time.
Portable UDFs	xorq support user-defined functions and its variants like aggregates, window functions, and transformations. The DataFusion based embedded engine provides a portable runtime for UDF execution.
Arrow-native architecture	Built on Apache Arrow's columnar memory format and Arrow Flight transport layer, xorq achieves high-performance data transfer without cumbersome serialization overhead.

Getting Started

xorq functions as both an interactive library for building expressions and a command-line interface. This dual nature enables seamless transition from exploratory research to production-ready artifacts. The steps below will guide through using both the CLI and library components to get started.

Caution

This library does not currently have a stable release. Both the API and implementation are subject to change, and future updates may not be backward compatible.

Installation

xorq is available as xorq on PyPI:

pip install xorq

Note

We are changing the name from LETSQL to xorq.

Usage

# your_pipeline.py
import xorq as xo


pg = xo.postgres.connect_env()
db = xo.duckdb.connect()

batting = pg.table("batting")
awards_players = xo.examples.awards_players.fetch(backend=db)

left = batting.filter(batting.yearID == 2015)

right = (awards_players.filter(awards_players.lgID == "NL")
                       .drop("yearID", "lgID")
                       .into_backend(pg, "filtered"))

expr = (left.join(right, ["playerID"], how="semi")
            .cache()
            .select(["yearID", "stint"]))

result = expr.execute()

xorq provides a CLI that enables you to build serialized artifacts from expressions, making your pipelines reproducible and deployable:

# Build an expression from a Python script
xorq build your_pipeline.py -e "expr" --target-dir builds

This will create a build artifact directory named by its expression hash:

builds
└── fce90c2d4bb8
   ├── abe2c934f4fe.sql
   ├── cec2eb9706bc.sql
   ├── deferred_reads.yaml
   ├── expr.yaml
   ├── metadata.json
   ├── profiles.yaml
   └── sql.yaml

The CLI converts Ibis expressions into serialized artifacts that capture the complete execution graph, ensuring consistent results across environments. More info can be found in the tutorial Building with xorq.

For more examples on how to use xorq, check the examples directory, note that in order to run some of the scripts in there, you need to install the library with examples extra:

pip install 'xorq[examples]'

Contributing

Contributions are welcome and highly appreciated. To get started, check out the contributing guidelines.

Acknowledgements

This project heavily relies on Ibis and DataFusion.

License

This repository is licensed under the Apache License

Name		Name	Last commit message	Last commit date
Latest commit History 497 Commits
.github		.github
db		db
docker		docker
docs		docs
examples		examples
nix		nix
python/xorq		python/xorq
src		src
.codespell.ignore-words		.codespell.ignore-words
.envrc		.envrc
.envrc.user.editable		.envrc.user.editable
.envrc.user.flake		.envrc.user.flake
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
rust-toolchain.toml		rust-toolchain.toml
uv.lock		uv.lock
vendors.txt		vendors.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xorq: do-anything, run-anywhere pandas-style pipelines

Getting Started

Installation

Usage

Contributing

Acknowledgements

License

About

Packages

Contributors 6

Languages

License

letsql/xorq

Folders and files

Latest commit

History

Repository files navigation

xorq: do-anything, run-anywhere pandas-style pipelines

Getting Started

Installation

Usage

Contributing

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Packages 0

Contributors 6

Languages

Packages