MarkQL is a C++20 SQL-style query engine for HTML. It treats HTML elements as rows and lets you query them with familiar SELECT ... FROM ... WHERE ... syntax.
Prerequisites:
- CMake 3.16+
- A C++20 compiler
- Boost (multiprecision); set
-DXSQL_ENABLE_KHMER_NUMBER=OFFto skip Boost - Optional dependencies:
libxml2,curl,nlohmann_json,arrow/parquet
Ubuntu/Debian/WSL (minimal packages):
sudo apt update
sudo apt install -y \
git ca-certificates pkg-config \
build-essential cmake ninja-build \
libboost-devOptional feature packages:
sudo apt install -y libxml2-dev libcurl4-openssl-dev nlohmann-json3-devArrow/Parquet packages (often missing on older distros):
sudo apt install -y libarrow-dev libparquet-devmacOS (Homebrew):
xcode-select --install
brew install cmake ninja pkg-config boostOptional feature packages:
brew install libxml2 curl nlohmann-jsonArrow/Parquet:
brew install apache-arrowBuild (project default):
./build.shMinimal build when optional dependencies are unavailable:
cmake -S . -B build \
-DXSQL_WITH_LIBXML2=OFF \
-DXSQL_WITH_CURL=OFF \
-DXSQL_WITH_ARROW=OFF \
-DXSQL_WITH_NLOHMANN_JSON=OFF
cmake --build buildTo build without Boost, add -DXSQL_ENABLE_KHMER_NUMBER=OFF.
Run one query:
./build/markql --query "SELECT div FROM doc LIMIT 5;" --input ./data/index.htmlRun interactive REPL:
./build/markql --interactive --input ./data/index.html- Primary CLI binary is
./build/markql. - Legacy compatibility binary
./build/xsqlis still generated. docanddocumentare both valid sources inFROM.- If
--inputis omitted, the CLI reads HTML fromstdin. - URL sources (
FROM 'https://...') requireXSQL_WITH_CURL=ON. TO PARQUET(...)requiresXSQL_WITH_ARROW=ON.INNER_HTML(...)returns minified HTML by default. UseRAW_INNER_HTML(...)for unmodified raw output.TO TABLE(...)supports explicit trimming/sparse options:TRIM_EMPTY_ROWS,TRIM_EMPTY_COLS,EMPTY_IS,STOP_AFTER_EMPTY_ROWS,FORMAT,SPARSE_SHAPE, andHEADER_NORMALIZE.
C++ tests:
cmake --build build --target xsql_tests
ctest --test-dir build --output-on-failureBenchmark harness (inner_html minified vs raw):
./build/markql_bench_inner_html 10000Python package/tests (optional):
./install_python.sh
./test_python.sh- Book (chapter path + verified examples): docs/book/SUMMARY.md
- Canonical tutorial: docs/markql-tutorial.md
- CLI guide: docs/markql-cli-guide.md
- Docs index: docs/README.md
- Changelog: CHANGELOG.md
Apache License 2.0. See LICENSE.