Skip to content

How Spark Submit works

Kevin Wallimann edited this page Apr 13, 2022 · 2 revisions

The following explanations are mainly to understand the submission of yarn jobs in cluster mode.

/usr/lib/jvm/java-1.8.0-openjdk/jre/bin/java -cp /opt/spark/conf/:/opt/spark/jars/*:/opt/hadoop/ \
-Dscala.usejavacp=true org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster \
--conf spark.executor.memory=1g --conf spark.driver.memory=1g  \
--class za.co.absa.hyperdrive.driver.drivers.CommandLineIngestionDriver --name Hyperdrive \
--jars spark-jobs-current.jar hyperdrive-release-latest.jar arg1 arg2

Questions

  • Is it really necessary for spark-submit to have the full $SPARK_HOME or are only specific files required?
Clone this wiki locally