Used to generate single or multiple spark projects using existing or customized templates by specifying configuration file(yaml).
The advantages of this application are:
- Create sbt or maven or both build based Spark applications.
- Through single configuration (
config.yaml
orconfig_all_apps.yaml
) we can create N number of Spark Applications. - Supports various Spark templates like hive, hbase, kudu, various file formats etc.
- Generate both Scala and Java based code.
- Generate the run script to run the spark application.
- Deployment steps are mentioned in README.md file.
- Built in Scala Test code.
- If your cluster is enabled kerberos or ssl or both, according to your cluster it will generate appropriate type of applications.
The following spark templates are supported:
Template Name | Template Description | Scala Code | Java Code | Python Code | Test Code | Sample Code Link |
---|---|---|---|---|---|---|
DEFAULT | Spark Hello World Integration | ✓ | ✓ | ⤫ | ✓ | Code |
HBASE | Spark HBase Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
HIVE | Spark Hive Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
KAFKA | Spark Kafka Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
PHOENIX | Spark Phoenix Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
KUDU | Spark Phoenix Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
HWC | Spark Hive Warehouse Connector Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
ORC | Spark ORC File Format Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
AVRO | Spark Avro File Format Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
PARQUET | Spark Parquet File Format Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
S3 | Spark AWS S3 Storage Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
GCS | Spark Google Cloud Storage Integration | ✓ | ✓ | ⤫ | ⤫ | Code |
CASSANDRA | Spark Cassandra Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
DELTA | Spark Delta Lake Integration | ✓ | ⤫ | ⤫ | ⤫ | Code |
git clone https://github.com/rangareddy/spark_project_template_generator.git
cd spark_project_template_generator
Update the Spark application details in config.yaml
or config_all_apps.yaml
file to create Spark project(s).
Note:
- By using
config.yaml
configuration file, we can create single project by default. - By using
config_all_apps.yaml
configuration file, we can create multiple project(s) by default.
Open the configuration file and update the configuration according to your cluster like Java Version, Spark version, Scala versions.
Single Project Template Configuration file
vi src/main/resources/config.yaml
Multiple Projects Template Configuration file
vi src/main/resources/config_all_apps.yaml
Property Name | Property Description | Default Value |
---|---|---|
baseProjectDir | Base Project Template Directory | User Home Directory - System.getProperty("user.home") |
basePackageName | Base Package Name for your project | com.ranga |
baseDeployJarPath | Based Deploy Path to deploy your application in cluster | /apps/spark/ |
buildTools | Supported Build tools: maven, sbt | maven |
jarVersion | Jar Version for your project | 1.0.0-SNAPSHOT |
scalaVersion | Scala Version for your project | 2.12.10 |
javaVersion | Java Version for your project | 1.8 |
sbtVersion | SBT Build tool Version for your project | 0.13.17 |
scope | Spark jars global application scope | compile |
secureCluster | If your cluster is enabled kerberized then you can use this parameter | false |
sslCluster | If your cluster is enabled ssl then you can use this parameter | false |
author | Specify the author name | Ranga Reddy |
authorEmail | Specify the author email | |
projectDetails | We can specify the project details like projectName, templateName, project description | |
componentVersions | We can specify what is the component name, version and its scope. If scope is not specified then it will pick global scope | |
templates | For each template what are all the jars files is required we need to specify here |
Note: Please update your configuration file properly otherwise you will get configuration issues.
$ mvn clean package
Creating the Single project using config.yaml.
$ java -jar target/spark-project-template-generator-1.0.0-SNAPSHOT.jar
or
$ java -jar target/spark-project-template-generator-1.0.0-SNAPSHOT.jar src/main/resources/config.yaml
Creating the Multiple projects using src/main/resources/config_all_apps.yaml.
$ java -jar target/spark-project-template-generator-1.0.0-SNAPSHOT.jar src/main/resources/config_all_apps.yaml
Application <spark-hello-world-integration> created successfully.
Application <spark-hive-integration> created successfully.
Application <spark-hbase-integration> created successfully.
Application <spark-hwc-integration> created successfully.
Application <spark-kafka-integration> created successfully.
Application <spark-phoenix-integration> created successfully.
Application <spark-kudu-integration> created successfully.
Application <spark-orc-integration> created successfully.
Application <spark-avro-integration> created successfully.
Application <spark-parquet-integration> created successfully.
Application <spark-cassandra-integration> created successfully.
Application <spark-s3-integration> created successfully.
Application <spark-gcs-integration> created successfully.
Application <spark-delta-lake-integration> created successfully.
By using this application i have created most of the spark applications mentioned in the following github.
https://github.com/rangareddy/ranga_spark_experiments
Send pull requests to keep this project updated.