We provide a docker environment for testing and developing that contains the following containers:
docker-client-1
: The client environment. This is where spark application are submitted/executed fromdocker-hdfs-1
: The HDFS node acting as the staging file-storedocker-jupyter-1
: A Jupyter Notebook that can be used to test the Spark Connectordocker-minio-1
: A minio file-store container acting as an S3 file-storedocker-spark-1
: A Spark driver for our standalone clusterdocker-spark-worker-*
: Spark workers for our standalone cluster, created in swarm modedocker-vertica-1
: Our Vertica database
When using the Kerberos configuration the container names are simplified (client
, hdfs
, and vertica
, plus one for the Kerberos key server, kdc
). The Spark and MinIO containers are not currently deployed for Kerberos.
Note that the container names listed here may not be exactly the same in all environments. For example, under Docker Compose V1 the container might be named docker_client_1
, and under V2 docker-client-1
.
The most convenient method of using Docker (and Docker Compose) is to install Docker Desktop. It is recommended to install a recent version, with Docker Engine version 20.10.0+ and Docker Compose version 1.29.0+.
To check the installed versions:
docker version
docker-compose version
In most cases, the docker-compose
commands can be substituted with docker compose
commands. docker-compose
is the standalone CLI, while Compose is also part of Docker now (e.g. docker compose
).
For some Linux distros, if you get an error about docker-credential-secretservice
not installed, install that package with the package manager on your system. For example, on Ubuntu:
sudo apt install golang-docker-credential-helpers
To start, make sure you have Docker and sbt installed, and that Docker client is running.
Then, clone the repository if you haven't already:
git clone https://github.com/vertica/spark-connector.git
Navigate to the docker
folder:
cd spark-connector/docker
Then use Docker Compose to start the containers:
docker-compose up -d
Or, for the Kerberos configuration:
docker-compose -f docker-compose-kerberos.yml up -d
Once started, you can visit the Spark standalone cluster Master WebUI and the first Spark context UI at localhost:4040 (won't be available until a context is created).
You can access a container shell as follows:
docker exec -it <container-name> bash
# Client container example:
docker exec -it docker-client-1 bash
# or
docker exec -it krb-client bash
You can exit the container's shell with exit
.
Note that it can take a few seconds for some of the services, notably Vertica and HDFS, to initialize. Before running any tests, ensure that the services are up by looking for the "<service> container is now running" message at the end of the logs:
docker logs docker-vertica-1
docker logs docker-hdfs-1
To use a specific version set the following environment variables in your environment, such as in your Docker .env
file:
VERTICA_VERSION
: By default, we use the latest Vertica image. For example, to use Vertica 10.1.1-0 setVERTICA_VERSION=10.1.1-0
.SPARK_INSTALL
: By default, we use the 3.3.2 Bitnami Spark image. For example, to use Spark 3.1.3 setSPARK_INSTALL=3.1.3
.
By default we start a single Spark worker. To start multiple Spark workers, such as 3 workers:
docker-compose up -d --scale spark-worker=3
For other Spark configurations parameters refer to available options for the Bitnami Spark image and set the appropriate environment variables in docker-compose.yml
or your .env
file.
By default the Jupyter Notebook container is not started as the image is quite large and it is only needed when testing the Spark Connector in Jupyter Notebook.
To start (or stop) the Jupyter Notebook, specify the "jupyter" profile when managing the containers:
docker-compose --profile jupyter up -d
docker-compose --profile jupyter down
Once you're finished running the example, exit out of the interactive terminal by running exit
.
Make sure you are still in the docker directory, then run:
docker-compose down
Or, if using the Kerberos configuration:
docker-compose -f docker-compose-kerberos.yml down
This will shut down and remove the containers safely.