Skip to content

dat-labs/dat-core

Repository files navigation

dat-core

data activation tool (dat) is an open source Python library for creating and running data activation (reverse ELT) pipelines with ease

Run tests

coverage -m pytest

generate test coverage report on terminal

coverage report

generate test coverage report in HTML

coverage html

Getting started

  • Deploy dat Open Source or set up dat Cloud to start fetching unstructured data, generating embeddings and loading them to vector databases.
  • Explore popular use cases in our tutorials.

Getting started with dat takes only a few steps! This page guides you through the initial steps to get started and you'll learn how to setup your first connection on the following pages.

When self-managing dat, your data never leaves your premises. Get started immediately by deploying locally using Docker.

Docker steps (placeholder)

  • Install Docker Engine and the Docker Compose plugin on your workstation (see instructions).
  • After Docker is installed, you can immediately get started locally by running:
# clone dat from GitHub
git clone --depth=1 https://github.com/dat-labs/dat-core.git

# switch into dat directory
cd dat-core

# start dat
./run-dat-platform.sh

dat Protocol Docker Interface

Summary

The dat Protocol describes a series of structs and interfaces for building data pipelines. The Protocol article describes those interfaces in language agnostic pseudocode, this article transcribes those into docker commands. dat's implementation of the protocol is all done in docker. Thus, this reference is helpful for getting a more concrete look at how the Protocol is used. It can also be used as a reference for interacting with dat's implementation of the Protocol.

Source

Pseudocode:

spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
discover(Config) -> DatCatalog
read(Config, DatCatalog, State) -> Stream<DatRecordMessage | DatStateMessage>

Docker:

docker run --rm -i <source-image-name> spec
docker run --rm -i <source-image-name> check --config <config-file-path>
docker run --rm -i <source-image-name> discover --config <config-file-path>
docker run --rm -i <source-image-name> read --config <config-file-path> --catalog <catalog-file-path> [--state <state-file-path>] > message_stream.json

The read command will emit a stream records to STDOUT.

Generator

Pseudocode:

spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
generate(Config, Stream<DatMessage>(stdin)) -> Stream<DatStateMessage>

Docker:

docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> generate --config <config-file-path>

The generate command will consume DatMessages from STDIN and emit a stream records to STDOUT.

Destination

Pseudocode:

spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
write(Config, DatCatalog, Stream<DatMessage>(stdin)) -> Stream<DatStateMessage>

Docker:

docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> write --config <config-file-path> --catalog <catalog-file-path>

The write command will consume DatMessages from STDIN.

I/O:

  • Connectors receive arguments on the command line via JSON files. e.g. --catalog catalog.json
  • They read DatMessages from STDIN. The destination write action is the only command that consumes DatMessages.
  • They emit DatMessages on STDOUT.

About

data activation tool (dat) is an open source Python library for creating and running data activation (reverse ELT) pipelines with ease

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors