rearranged dirctory and fixed typos

DIYBigData · Sep 22, 2019 · 6163821 · 6163821
1 parent 37e19e4
commit 6163821
Show file tree

Hide file tree

Showing 8 changed files with 5 additions and 9 deletions.
diff --git a/docker-swarm-spark-simple/README.md → simple-spark-swarm/README.md b/docker-swarm-spark-simple/README.md → simple-spark-swarm/README.md
@@ -10,10 +10,10 @@ First, edit the following items as needed for your swarm:
 3. `build-images.sh`: Adjust the IP address for your local Docker registry that all nodes in your cluster can access. You can use a domain name if all nodes in your swarm can resolve it. This is needed as it allows all nodes in the swarm to pull the locally built Docker images.
 4. `spark-deploy.yml`: Adjust all image names for the updated local Docker registry address you used in the prior step. Also, adjust the resource limits for each of the services. Setting a `cpus` limit here that is smaller than the number of cores on your node has the effect of giving your process a fraction of each core's capacity. You might consider doing this if your swarm hosts other services or does not handle long term 100% CPU load well (e.g., overheats). Also adjust the `replicas` count for the `spark-worker` service to be equal to the number of nodes in your swarm (or less). 
 
-This set up depends on have a GlusterFS volume mounted at `/mnt/gfs` on all nodes and the directories exist on it:
+This set up depends on have a GlusterFS volume mounted at `/mnt/gfs` on all nodes and the following directories exist on it:
 
-* `/mnt/gfs/jupyter-notbooks`
-* `/mnt/gfs/data`
+* `/mnt/gfs/jupyter-notbooks` - used to persist the Jupyter notebooks.
+* `/mnt/gfs/data` - This is where data to analyze with spark gets placed.
 
 Then, to start up the Spark cluster in your Docker swarm, `cd` into this project's directory and:
 ```
@@ -23,9 +23,5 @@ docker stack deploy -c deploy-spark-swarm.yml spark
 
 Point your development computer's browser at `http://swarm-public-ip:7777/` to load the Jupyter notebook.
 
-## TODO
-This cluster is a work in progress. Currently, the following items are missing:
-* A distributed file system, such as HDFS or QFS. Currently there is no way to ingest data into the cluster except through network transfers, such as through `curl`, set up in a Jupyter notebook.
-
 ## Acknowledgements
 The docker configuration leverages the [`gettyimages/spark`](https://hub.docker.com/r/gettyimages/spark/) Docker image as a starting point. 
diff --git a/docker-swarm-spark-simple/build-images.sh → simple-spark-swarm/build-images.sh b/docker-swarm-spark-simple/build-images.sh → simple-spark-swarm/build-images.sh
@@ -2,7 +2,7 @@
 
 set -e
 
-#build images
+# build images
 docker build -t configured-spark-node:latest ./configured-spark-node
 docker build -t spark-jupyter-notebook:latest ./spark-jupyter-notebook
 

diff --git a/...k-simple/configured-spark-node/Dockerfile → ...rk-swarm/configured-spark-node/Dockerfile b/...k-simple/configured-spark-node/Dockerfile → ...rk-swarm/configured-spark-node/Dockerfile
diff --git a/...spark-node/spark-conf/spark-defaults.conf → ...spark-node/spark-conf/spark-defaults.conf b/...spark-node/spark-conf/spark-defaults.conf → ...spark-node/spark-conf/spark-defaults.conf
diff --git a/...igured-spark-node/spark-conf/spark-env.sh → ...igured-spark-node/spark-conf/spark-env.sh b/...igured-spark-node/spark-conf/spark-env.sh → ...igured-spark-node/spark-conf/spark-env.sh
diff --git a/...swarm-spark-simple/deploy-spark-swarm.yml → simple-spark-swarm/deploy-spark-swarm.yml b/...swarm-spark-simple/deploy-spark-swarm.yml → simple-spark-swarm/deploy-spark-swarm.yml
@@ -83,7 +83,7 @@ services:
             - 4040:4040
         volumes:
             - type: bind
-              source: /mnt/gfs/jupyter-notbooks
+              source: /mnt/gfs/jupyter-notebooks
               target: /home/jupyter/notebooks
             - type: bind
               source: /mnt/gfs/data

diff --git a/...-simple/spark-jupyter-notebook/Dockerfile → ...k-swarm/spark-jupyter-notebook/Dockerfile b/...-simple/spark-jupyter-notebook/Dockerfile → ...k-swarm/spark-jupyter-notebook/Dockerfile
diff --git a/...e/spark-jupyter-notebook/start-jupyter.sh → ...m/spark-jupyter-notebook/start-jupyter.sh b/...e/spark-jupyter-notebook/start-jupyter.sh → ...m/spark-jupyter-notebook/start-jupyter.sh