dat-core

data activation tool (dat) is an open source Python library for creating and running data activation (reverse ELT) pipelines with ease

Run tests

coverage -m pytest

generate test coverage report on terminal

coverage report

generate test coverage report in HTML

coverage html

Getting started

Deploy dat Open Source or set up dat Cloud to start fetching unstructured data, generating embeddings and loading them to vector databases.
Explore popular use cases in our tutorials.

Getting started with dat takes only a few steps! This page guides you through the initial steps to get started and you'll learn how to setup your first connection on the following pages.

When self-managing dat, your data never leaves your premises. Get started immediately by deploying locally using Docker.

Docker steps (placeholder)

Install Docker Engine and the Docker Compose plugin on your workstation (see instructions).
After Docker is installed, you can immediately get started locally by running:

# clone dat from GitHub
git clone --depth=1 https://github.com/dat-labs/dat-core.git

# switch into dat directory
cd dat-core

# start dat
./run-dat-platform.sh

dat Protocol Docker Interface

Summary

The dat Protocol describes a series of structs and interfaces for building data pipelines. The Protocol article describes those interfaces in language agnostic pseudocode, this article transcribes those into docker commands. dat's implementation of the protocol is all done in docker. Thus, this reference is helpful for getting a more concrete look at how the Protocol is used. It can also be used as a reference for interacting with dat's implementation of the Protocol.

Source

Pseudocode:

spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
discover(Config) -> DatCatalog
read(Config, DatCatalog, State) -> Stream<DatRecordMessage | DatStateMessage>

Docker:

docker run --rm -i <source-image-name> spec
docker run --rm -i <source-image-name> check --config <config-file-path>
docker run --rm -i <source-image-name> discover --config <config-file-path>
docker run --rm -i <source-image-name> read --config <config-file-path> --catalog <catalog-file-path> [--state <state-file-path>] > message_stream.json

The read command will emit a stream records to STDOUT.

Generator

Pseudocode:

spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
generate(Config, Stream<DatMessage>(stdin)) -> Stream<DatStateMessage>

Docker:

docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> generate --config <config-file-path>

The generate command will consume DatMessages from STDIN and emit a stream records to STDOUT.

Destination

Pseudocode:

spec() -> ConnectorSpecification
check(Config) -> DatConnectionStatus
write(Config, DatCatalog, Stream<DatMessage>(stdin)) -> Stream<DatStateMessage>

Docker:

docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> write --config <config-file-path> --catalog <catalog-file-path>

The write command will consume DatMessages from STDIN.

I/O:

Connectors receive arguments on the command line via JSON files. e.g. --catalog catalog.json
They read DatMessages from STDIN. The destination write action is the only command that consumes DatMessages.
They emit DatMessages on STDOUT.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github		.github
alembic_migrations		alembic_migrations
dat_core		dat_core
tests/destinations/vector_db_helpers		tests/destinations/vector_db_helpers
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
db_scripts.sql		db_scripts.sql
pyproject.toml		pyproject.toml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dat-core

Run tests

generate test coverage report on terminal

generate test coverage report in HTML

Getting started

Docker steps (placeholder)

dat Protocol Docker Interface

Summary

Source

Pseudocode:

Docker:

Generator

Pseudocode:

Docker:

Destination

Pseudocode:

Docker:

I/O:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dat-core

Run tests

generate test coverage report on terminal

generate test coverage report in HTML

Getting started

Docker steps (placeholder)

dat Protocol Docker Interface

Summary

Source

Pseudocode:

Docker:

Generator

Pseudocode:

Docker:

Destination

Pseudocode:

Docker:

I/O:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages