diff --git a/spark-on-docker-swarm/README.md b/docker-swarm-spark-simple/README.md similarity index 83% rename from spark-on-docker-swarm/README.md rename to docker-swarm-spark-simple/README.md index 5321d09..10ce6f5 100644 --- a/spark-on-docker-swarm/README.md +++ b/docker-swarm-spark-simple/README.md @@ -7,10 +7,15 @@ First, edit the following items as needed for your swarm: 1. `configured-sparknode -> spark-conf -> spark-env.sh`: adjust the environment variables as appropriate for your cluster's nodes, most notably `SPARK_WORKER_MEMORY` and `SPARK_WORKER_CORES`. Leave 1-2 cores and at least 10% of RAM for other processes. 2. `configured-sparknode -> spark-conf -> spark-env.sh`: Adjust the memory and core settings for the executors and driver. Each executor should have about 5 cores (if possible), and should be a whole divisor into `SPARK_WORKER_CORES`. Spark will launch as many executors as `SPARK_WORKER_CORES` divided by `spark.executor.cores`. Reserve about 7-8% of `SPARK_WORKER_MEMORY` for overhead when setting `spark.executor.memory`. -3. `build-images.sh`: Adjust the IP address for your local Docker registry. You can use a domain name if all nodes in your swarm can resolve it. This is needed as it allows all nodes in the swarm to pull the locally built Docker images. +3. `build-images.sh`: Adjust the IP address for your local Docker registry that all nodes in your cluster can access. You can use a domain name if all nodes in your swarm can resolve it. This is needed as it allows all nodes in the swarm to pull the locally built Docker images. 4. `spark-deploy.yml`: Adjust all image names for the updated local Docker registry address you used in the prior step. Also, adjust the resource limits for each of the services. Setting a `cpus` limit here that is smaller than the number of cores on your node has the effect of giving your process a fraction of each core's capacity. You might consider doing this if your swarm hosts other services or does not handle long term 100% CPU load well (e.g., overheats). Also adjust the `replicas` count for the `spark-worker` service to be equal to the number of nodes in your swarm (or less). -This set up depend son have a GlusterFS volume mounted at `/mnt/gfs` on all nodes and a directory `/mnt/gfs/jupyter-notbooks` exists on it. Then, to start up the Spark cluster in your Docker swarm, `cd` into this project's directory and: +This set up depends on have a GlusterFS volume mounted at `/mnt/gfs` on all nodes and the directories exist on it: + +* `/mnt/gfs/jupyter-notbooks` +* `/mnt/gfs/data` + +Then, to start up the Spark cluster in your Docker swarm, `cd` into this project's directory and: ``` ./build-images.sh docker stack deploy -c deploy-spark-swarm.yml spark diff --git a/spark-on-docker-swarm/build-images.sh b/docker-swarm-spark-simple/build-images.sh similarity index 100% rename from spark-on-docker-swarm/build-images.sh rename to docker-swarm-spark-simple/build-images.sh diff --git a/spark-on-docker-swarm/configured-spark-node/Dockerfile b/docker-swarm-spark-simple/configured-spark-node/Dockerfile similarity index 100% rename from spark-on-docker-swarm/configured-spark-node/Dockerfile rename to docker-swarm-spark-simple/configured-spark-node/Dockerfile diff --git a/spark-on-docker-swarm/configured-spark-node/spark-conf/spark-defaults.conf b/docker-swarm-spark-simple/configured-spark-node/spark-conf/spark-defaults.conf similarity index 100% rename from spark-on-docker-swarm/configured-spark-node/spark-conf/spark-defaults.conf rename to docker-swarm-spark-simple/configured-spark-node/spark-conf/spark-defaults.conf diff --git a/spark-on-docker-swarm/configured-spark-node/spark-conf/spark-env.sh b/docker-swarm-spark-simple/configured-spark-node/spark-conf/spark-env.sh similarity index 100% rename from spark-on-docker-swarm/configured-spark-node/spark-conf/spark-env.sh rename to docker-swarm-spark-simple/configured-spark-node/spark-conf/spark-env.sh diff --git a/spark-on-docker-swarm/deploy-spark-swarm.yml b/docker-swarm-spark-simple/deploy-spark-swarm.yml similarity index 100% rename from spark-on-docker-swarm/deploy-spark-swarm.yml rename to docker-swarm-spark-simple/deploy-spark-swarm.yml diff --git a/spark-on-docker-swarm/spark-jupyter-notebook/Dockerfile b/docker-swarm-spark-simple/spark-jupyter-notebook/Dockerfile similarity index 100% rename from spark-on-docker-swarm/spark-jupyter-notebook/Dockerfile rename to docker-swarm-spark-simple/spark-jupyter-notebook/Dockerfile diff --git a/spark-on-docker-swarm/spark-jupyter-notebook/start-jupyter.sh b/docker-swarm-spark-simple/spark-jupyter-notebook/start-jupyter.sh similarity index 100% rename from spark-on-docker-swarm/spark-jupyter-notebook/start-jupyter.sh rename to docker-swarm-spark-simple/spark-jupyter-notebook/start-jupyter.sh