This POC consists on a simple library which will provide elementary functionalities to handle your jobs management.
This management is made by delegation, using one of our partner's software :
- Knime
- Trifacta
- Dataiku
- Nifi
- AWS Glue
This project can be logically splitted into three parts :
-
The business logic (
domain) : Defines all informations required to define what a job is (Job= name, project, id, status), and
how we can manage it (JobManager). -
The implementation parts (
infra.right) : Contains all API requests and DTO to make it run with our job managers. -
The demo apps (
infra.left) : Simple apps to manipulate all available commands.
- Jobs : A Job is a specific action which will produces an output result on a given data input.
- Dataset : A dataset is used to store a specified quantity of data. A dataset is usually created from an import process,
or as the result of a given jobs's execution. (In some apps, jobs are related to datasets as their source.) - Project : A project is a group of datasets and jobs, and describes how they're related on to another, in order to produce
the solution to given problem.
Describes the matching between a given app's concepts and ours.
| Concept | Dataiku | Trifacta | Knime | Nifi | AWS Glue |
|---|---|---|---|---|---|
| Project | Project | --- | Workflow | ProcessGroup | --- |
| Dataset | Dataset | WrangledDataset | --- | --- | --- |
| Job | Job | JobGroup | Job | Processor | Job |
Describes which methods are currently available for each app.
| Functionnality | Dataiku | Trifacta | Knime | Nifi | AWS Glue |
|---|---|---|---|---|---|
| Retrieve all projects | OK | -- | OK | OK | -- |
| Retrieve all datasets for a given project | OK | OK | -- | -- | -- |
| Retrieve all jobs for a given project | OK | OK | OK | OK | OK |
| Retrieve a job with a specific ID | OK | OK | OK | OK | OK |
| Retrieve a job's current status | OK | OK | OK | OK | OK |
| Start a specific job | OK | OK | OK | OK | OK |
| Stop a given job | OK | -- | -- | OK | OK |
| Import a job | -- | -- | OK | OK | -- |
| Export a job | -- | OK | OK | OK | OK |
| Import a project | -- | -- | -- | OK | -- |
| Export a project | -- | -- | -- | OK | -- |
| Security profile | basic |
basic |
basic |
none |
none* |
*Securization base on a specific system, not by the use of an option decided by profile at runtime.
By using a correct spring profile, you can select which demonstration tool to use, for a rapid test of the functionalities :
demo: Consists of an interactive demo which commands are described below.starter: An automatic execution of all library's methods, with a simple display of the results.
Note that it will require two additionnal parameters as environment variable (PREDEFINED_PROJECTandPREDIFINED_JOB) to function.
And to select your app, you can add (only one of them) : dataiku, trifacta or knime.
To modify the profile at launch, you should use a command like :
java -Dspring.profiles.active=dataiku,demo -jar {YOUR_JAR} (In this example, we'd use the demo and dataiku profile)
In this demo, every job or project will be displayed with a specific number like : $ID : $OBJECT.
This number will define a local ID, which is required by the local demo tool to realize some of the other functionalities
At any time - projects : Displays a list of all projects registered on the selected platform. (In Trifacta, as the 'project' notion doesn't exist, only the value DEFAULT will be displayed.)
After setting the project - use $PROJECT_ID : Set which project to use for the next commands.
download $PROJECT_ID: Retrieve the selected project as a JSON file ready to be imported elsewhere.upload $PROJECT_ID: Create a new project based on the previously downloaded project. Only usable after a first download !datasets: Displays a list of all datasets.jobs: Displays a list of all jobs and store their informations in order to manipulate them later.
After setting the project and getting all jobs - status $JOB_ID : Give the current status of the selected job
start $JOB_ID: Starts the specified job.stop $JOB_ID: Stops the given job if currently running. Useless if the job's already done.export $JOB_ID: Retrieve the selected job as a JSON file ready to be imported in an other project.import $JOB_ID: Create a new job based on the previously exported job. Only usable after a first job export !
-
In IntelliJ, open
Run>Edit configurations -
Add a new Spring Boot configuration (+), and specify the main class of the app
io.saagie.poc.infra.AppKt -
(Optional, if specified at launch) Change the current app's action by changing the
Active profilesattribute. -
Update your environment varaibles as needed by your app in the
Override parametersmenu. :
-
a)
Most of the time,you'll only need to change the service's URL
(Please check thesrc/main/resources/application.ymlfor the exact syntax). -
b) Select with a profile argument (
spring.profile.active), which kind of security to use :none: No security, I hope you know what you're doing :)basic: You'll have to inform your username and password under thecommonsection of theapplication.yml
-
token: You'll have to inform your URL under thecommonsection of theapplication.yml,
and eventually your (username, password) if the token's request is secured with basic auth.
- Run it through your IDE.
Build and run
Build the app by using mvn clean package, and then edit the provided utility script start.sh, by adding/updating, all environment variables to overload before executing the jar.
Then, you'll be able to launch the app by using ./start.sh $PATH_TO_YOUR_JAR.