It is recommended to read through these instructions in their entirety before attempting to build or deploy Datawave. However, if you just want to get started and use this document as a reference, here's the short version (although we recommend checking out the prereqs at a minimum):
# from the base datawave project directory check out the microservice submodules
git submodule update --init --recursive
# build docker images for datawave and all of the microservices
# optionally include '-Dquickstart-maven' to download accumulo/zookeeper/hadoop/maven tarballs from the maven repository
mvn -Pcompose -Dmicroservice-docker -Dquickstart-docker -Ddeploy -Dtar -Ddist -DskipTests clean install
# bootstrap the services, and bring them up using docker compose
cd docker
./bootstrap.sh
docker compose up -d
# run some queries to ensure everything is working
cd scripts
./testAll.sh
Datawave Quickstart is a self-contained hadoop, zookeeper, and accumulo deployment prepopulated with data.
Consul v1.15.4 is a prepacked docker image used for discovery between the various services.
RabbitMQ v3.12.4 is a prepacked docker image used for messaging between the various services.
Datawave Config Service is Datawave's customized Spring Cloud config service.
Sample configuration files can be found in the config folder.
You will need to build the docker image for this service on your local machine following the instructions in the config service README.
Datawave Hazelcast Service is Datawave's customized Hazelcast In-Memory Data Grid.
You will need to build the docker image for this service on your local machine following the instructions in the hazelcast cache service README.
Datawave Authorization Service provides basic authorization for the Datawave microservices.
You will need to build the docker image for this service on your local machine following the instructions in the authorization service README.
Datawave Audit Service provides query audit capabilities for Datawave.
You will need to build the docker image for this service on your local machine following the instructions in the audit service README.
Datawave Query Metric Service provides metrics caching, storage, and retrieval capabilities for Datawave.
You will need to build the docker image for this service on your local machine following the instructions in the query metrics service README.
Zookeeper is a prepacked docker image used for distributed synchronization.
Kafka is a prepacked docker image used for messaging between the various services.
Datawave Query Service v1.0-SNAPSHOT is a user-facing interface for Datawave query.
This microservice is in development, and can be found in this repo.
You will need to build the docker image for this service on your local machine following the instructions in the config service README.
Datawave Executor Service v1.0-SNAPSHOT is the back-end worker for Datawave queries.
This microservice is in development, and can be found in this repo.
You will need to build the docker image for this service on your local machine following the instructions in the config service README.
Enabled via the 'pool2', or 'full' profile.
Datawave Executor Service v1.0-SNAPSHOT is the back-end worker for Datawave queries.
This microservice is in development, and can be found in this repo.
You will need to build the docker image for this service on your local machine following the instructions in the config service README.
Enabled via the 'storage', or 'full' profile.
Datawave Query Storage Service v1.0-SNAPSHOT is a utility service used to inspect the storage cache.
This microservice is in development, and can be found in this repo.
You will need to build the docker image for this service on your local machine following the instructions in the config service README.
Enabled via the 'management', or 'full' profile.
Kafdrop is a prepacked docker image used for kafka cluster management.
Enabled via the 'management', or 'full' profile.
Hazelcast Management Center v4.2021.06 is a prepacked docker image used for hazelcast cluster management.
Enabled via the 'dictionary', or 'full' profile.
Datawave Dictionary Service provides access to the data dictionary and edge dictionary for Datawave.
You will need to build the docker image for this service on your local machine following the instructions in the dictionary service README.
Please read through these instructions in their entirety before attempting to build or deploy Datawave.
In order for the following bootstrap step to work properly, you should ensure that your /etc/hosts file looks similar to the following:
<your ip address> <your fqdn> <your hostname>
127.0.0.1 localhost
These services have been successfully deployed using the following versions of docker and docker compose.
$> docker --version
Docker version 24.0.6, build ed223bc
$> docker compose version
Docker Compose version v2.21.0
Prior to starting docker compose, you need to use the Datawave Quickstart to deploy working instances of Hadoop, Zookeeper, and Accumulo, along with some sample datasets for query.
There are two methods for deploying the Datawave Quickstart.
-
default: Deploys the Datawave Quickstart as a docker container within docker compose.
-
hybrid: Deploys the Datawave Quickstart directly on your host system.
Build the Datawave Quickstart docker image using the following build command:
# To build the quickstart docker image, and all of the microservice images, run this
mvn -Pcompose -Dmicroservice-docker -Dquickstart-docker -Ddeploy -Dtar -Ddist -DskipTests clean install -T1C
# To build just the quickstart docker image, run this
mvn -Pcompose -DskipServices -Dquickstart-docker -Ddeploy -Dtar -Ddist -DskipTests clean install -T1C
Note that the quickstart-docker property is set. This property is a shortcut which activates the docker
and quickstart
profiles without activating the docker
profile for the microservices.
For this command, the build profile is set to compose
. This profile contains all of the properties needed to make the quickstart work as part
of the docker compose deployment. The use of any other build profile with docker compose is unsupported.
This command also prevents the microservice services from building with -DskipServices
. This is an optional setting which will skip the microservice builds entirely, saving you some time if you only want to build/rebuild the Datawave Quickstart.
If you ever need to rebuild the Datawave quickstart docker image, but don't want to ingest the sample data you can add -DskipIngest
to
your build command. This can save you some time, since the docker compose configuration stores ingested data in a persistent volume.
If desired, you can start and test the wildfly deployment embedded in the Datawave Quickstart by running the following command:
docker run -m 8g datawave/quickstart-compose datawave-bootstrap.sh --test
Before running the quickstart setup, you need to edit your ~/.bashrc to include the following export:
export DW_BIND_HOST=0.0.0.0
This will ensure that Hadoop binds to all interfaces, and that Accumulo binds to the hostname/IP address. This is required to connect to the host Accumulo instance from a docker container.
What follows is a brief description of how to setup and run the Datawave Quickstart. For more detailed information see the DataWave Quickstart Readme.
# Add the quickstart env.sh to your .bashrc
# DW_SOURCE refers to your local path to the datawave source code, and may be set as an environment variable if desired
echo "activateDW() {\n source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh\n}" >> ~/.bashrc
# Source .bashrc to kick off the quickstart build
source ~/.bashrc
# Activate DataWave
activateDW
# Install Datawave and its dependencies
allInstall
# Start Accumulo and its dependencies
accumuloStart
# At this point, you are ready to deploy and test the query microservices via docker compose
# If desired, start the wildfly webservice, and run some diagnostic tests
datawaveWebStart && datawaveWebTest
# Make sure to stop the wildfly webservice before starting the query microservices via docker compose
datawaveWebStop
If you haven't done so already, you can build the Datawave Microservice docker images using the following build command:
mvn -Pcompose -Dmicroservice-docker -Ddist -DskipTests clean install -T1C
Note that the microservice-docker property is set. This property is a shortcut which activates the docker
profile for just the microservices.
This command can be combined with default Datawave Quickstart build command to build everything at once.
The audit, dictionary, query executor, and query metric services all need to connect to Zookeeper, Hadoop and/or Accumulo. In order to make that work, there are some environment variables which need to be configured.
Bootstrap your deployment by running:
./bootstrap.sh
This will produce a .env
file containing the following:
# If set to quickstart, enables the quickstart container
# Note: More than one profile may be set.
COMPOSE_PROFILES=""
# These environment variables are used to create extra hosts which
# allow containers to route to the host quickstart deployment.
# The extra hosts aren't used when deploying the docker quickstart,
# but the variables still need to be set for the compose file to be valid.
DW_HOSTNAME="<Your hostname>"
DW_HOST_FQDN="<Your host FQDN>"
DW_HOST_IP="<Your host IP Address>"
# These environment variables must be set when running the quickstart
# from the host machine in hybrid mode.
DW_ZOOKEEPER_HOST="<Your hostname>"
DW_HADOOP_HOST="<Your hostname>"
Bootstrap your deployment by running:
./bootstrap.sh hybrid
This will produce a .env
file containing the following:
# If set to quickstart, enables the quickstart container
# Note: More than one profile may be set.
COMPOSE_PROFILES=""
# These environment variables are used to create extra hosts which
# allow containers to route to the host quickstart deployment.
# The extra hosts aren't used when deploying the docker quickstart,
# but the variables still need to be set for the compose file to be valid.
DW_HOSTNAME="<Your hostname>"
DW_HOST_FQDN="<Your host FQDN>"
DW_HOST_IP="<Your host IP Address>"
# These environment variables must be set when running the quickstart
# from the host machine in hybrid mode.
DW_ZOOKEEPER_HOST="<Your hostname>"
DW_HADOOP_HOST="<Your hostname>"
Start the default services (with the Kafka as the backend):
docker compose up -d
Start the default services (with RabbitMQ as the backend):
BACKEND=rabbitmq docker compose up -d
Start the default services (with Hazelcast as the backend):
BACKEND=hazelcast docker compose up -d
Start the default services, and the dictionary service:
docker compose --profile quickstart --profile dictionary up -d
Start the default services, the kafka services, and the dictionary service:
docker compose --profile quickstart --profile dictionary --profile kafka up -d
Start all services:
docker compose --profile quickstart --profile full up -d
For everything:
docker compose logs -f
For a specific service:
docker compose logs -f audit
Stop the configured services
docker compose down
Stop the configured services, and delete all volumes
docker compose down -v
Stop all services, including ones that are no longer enabled
docker compose down --remove-orphans
docker compose stop audit
docker compose rm -f audit
docker compose up -d
docker compose restart <servicename>