data activation tool (dat) is an open source Python library for creating and running data activation (reverse ELT) pipelines with ease
coverage -m pytestcoverage reportcoverage html- Deploy dat Open Source or set up dat Cloud to start fetching unstructured data, generating embeddings and loading them to vector databases.
- Explore popular use cases in our tutorials.
Getting started with dat takes only a few steps! This page guides you through the initial steps to get started and you'll learn how to setup your first connection on the following pages.
When self-managing dat, your data never leaves your premises. Get started immediately by deploying locally using Docker.
- Install
Docker Engineand theDocker Compose pluginon your workstation (see instructions). - After Docker is installed, you can immediately get started locally by running:
# clone dat from GitHub
git clone --depth=1 https://github.com/dat-labs/dat-core.git
# switch into dat directory
cd dat-core
# start dat
./run-dat-platform.shThe dat Protocol describes a series of structs and interfaces for building data pipelines. The Protocol article describes those interfaces in language agnostic pseudocode, this article transcribes those into docker commands. dat's implementation of the protocol is all done in docker. Thus, this reference is helpful for getting a more concrete look at how the Protocol is used. It can also be used as a reference for interacting with dat's implementation of the Protocol.
spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
discover(Config) -> DatCatalog
read(Config, DatCatalog, State) -> Stream<DatRecordMessage | DatStateMessage>
docker run --rm -i <source-image-name> spec
docker run --rm -i <source-image-name> check --config <config-file-path>
docker run --rm -i <source-image-name> discover --config <config-file-path>
docker run --rm -i <source-image-name> read --config <config-file-path> --catalog <catalog-file-path> [--state <state-file-path>] > message_stream.jsonThe read command will emit a stream records to STDOUT.
spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
generate(Config, Stream<DatMessage>(stdin)) -> Stream<DatStateMessage>
docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> generate --config <config-file-path>The generate command will consume DatMessages from STDIN and emit a stream records to STDOUT.
spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
write(Config, DatCatalog, Stream<DatMessage>(stdin)) -> Stream<DatStateMessage>
docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> write --config <config-file-path> --catalog <catalog-file-path>The write command will consume DatMessages from STDIN.
- Connectors receive arguments on the command line via JSON files.
e.g. --catalog catalog.json - They read
DatMessages from STDIN. The destinationwriteaction is the only command that consumesDatMessages. - They emit
DatMessages on STDOUT.