Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DSIP-79][Task] Add datavines task for data quality #16863

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/docs/en/guide/task/datavines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Datavines

## Overview

Use `Datavines Task` to create a datavines-type task and support data quality job in Datavines. When the worker executes `Datavines Task`,
it will call `Datavines API` to trigger datavines job. Click [here](https://datavane.github.io/datavines-website/) for details about `Datavines`.

## Create Task

- Click Project Management-Project Name-Workflow Definition, and click the "Create Workflow" button to enter the DAG editing page.
- Drag <img src="../../../../img/tasks/icons/datavines.png" width="15"/> from the toolbar to the canvas.

## Task Parameter

- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.

| **Parameter** | **Description** |
|-------------------|-------------------------------------------------------------------------------------------------------|
| Datavines Address | The URL for the Datavines service, e.g., http://localhost:5600. |
| Datavines Job ID | The unique job id for a datavines job. |
| Datavines token | The Datawines service access token can be obtained through token management on the Datavines service. |
| Block on Failure | When turned on, if the data quality check result is failed, the task result will be set as failed. |

## Task Example

This example illustrates how to create a datavines task node.

![demo-datavines](../../../../img/tasks/demo/datavines_task.png)

![demo-get-datavines-job-id](../../../../img/tasks/demo/datavines_job_id.png)
31 changes: 31 additions & 0 deletions docs/docs/zh/guide/task/datavines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Datavines

## 综述

`Datavines`任务类型,用于创建并执行 `Datavines` 类型任务来执行 Datavines 中的数据质量检查作业。Worker 执行该任务的时候,会通过 `Datavines API` 触发 `Datavines 的作业`。
点击 [这里](https://datavane.github.io/datavines-website/) 获取更多关于 `Datavines` 的信息。

## 创建任务

- 点击项目管理-项目名称-工作流定义,点击"创建工作流"按钮,进入DAG编辑页面。
- 工具栏中拖动 <img src="../../../../img/tasks/icons/datavines.png" width="15"/> 到画板中,即可完成创建。

## 任务参数

- 默认参数说明请参考[DolphinScheduler任务参数附录](appendix.md)`默认任务参数`一栏。

| **任务参数** | **描述** |
|-----------------|------------------------------------------------------|
| Datavines 地址 | Datavines 服务的 url,例如:`http://localhost:5600`。 |
| Datavines 作业 ID | Datavines 作业对应的唯一ID。 |
| Datavines token | Datavines 服务访问 token, 可在 Datavines 服务上的 token 管理中取得。 |
| 检查失败时阻塞 | 开启时,数据质量检查结果为失败时会将任务结果置为失败。 |

## 例子

这个示例展示了如何创建 Datavines 任务节点:

![demo-datavines](../../../../img/tasks/demo/datavines_task.png)

![demo-get-datavines-job-id](../../../../img/tasks/demo/datavines_job_id.png)

Binary file added docs/img/tasks/demo/datavines_job_id.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tasks/demo/datavines_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tasks/icons/datavines.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ task:
- 'SEATUNNEL'
- 'DATAX'
- 'SQOOP'
dataQuality:
- 'DATAVINES'
machineLearning:
- 'JUPYTER'
- 'MLFLOW'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,12 @@
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-task-datavines</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-task-java</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-task-plugin</artifactId>
<version>dev-SNAPSHOT</version>
</parent>

<artifactId>dolphinscheduler-task-datavines</artifactId>
<packaging>jar</packaging>

<properties>
<plugin.name>task.datavines</plugin.name>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-task-api</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-common</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>shade</goal>
</goals>
<phase>package</phase>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.dolphinscheduler.plugin.task.datavines;

import org.apache.dolphinscheduler.plugin.task.api.model.ResourceInfo;
import org.apache.dolphinscheduler.plugin.task.api.parameters.AbstractParameters;

import org.apache.commons.lang3.StringUtils;

import java.util.Collections;
import java.util.List;

import lombok.Data;
import lombok.EqualsAndHashCode;

@EqualsAndHashCode(callSuper = true)
@Data
public class DatavinesParameters extends AbstractParameters {

private String address;

private String jobId;

private String token;

private boolean failureBlock;

@Override
public boolean checkParameters() {
return StringUtils.isNotEmpty(this.address) && StringUtils.isNotEmpty(this.jobId);
}

@Override
public List<ResourceInfo> getResourceFilesList() {
return Collections.emptyList();
}
}
Loading
Loading