Skip to content

Spark Jar

xuwenyihust edited this page Jan 8, 2024 · 2 revisions

Summary

DataPulse supports submitting Spark applications in jar format. The application jar should be containerized as a docker image and pushed to the docker registry.

QuickStart

Prepare Application

  • Prepare the image with application jar
    • Could either build your own docker image
    • Or use the pre-built examples in examples/

Deploy Application

  • Deploy the application with the following command
    source bin/submit_spark_app.sh 
    --version VERSION
    --image APP_IMAGE 
    --name APP_NAME   
    --main MAIN_CLASS 
    --jar JAR_FILE
    --input INPUT_PATH
    --output OUTPUT_PATH
  • Arguments
    • version: The version of the application
    • image: The docker image of the application
    • name: The name of the application
    • main: The main class of the application
    • jar: The location of the application jar file in the docker image
    • input: The input path of the application under GCS
    • output: The output path of the application under GCS
Clone this wiki locally