|
| 1 | +# 通用配置 |
| 2 | + |
| 3 | +## 核心概念 |
| 4 | + |
| 5 | +* Row 是seatunnel逻辑意义上一条数据,是数据处理的基本单位。在Filter处理数据时,所有的数据都会被映射为Row。 |
| 6 | + |
| 7 | +* Field 是Row的一个字段。Row可以包含嵌套层级的字段。 |
| 8 | + |
| 9 | +* raw_message 指的是从input输入的数据在Row中的`raw_message`字段。 |
| 10 | + |
| 11 | +* __root__ 指的是Row的最顶级的字段相同的字段层级,常用于指定数据处理过程中生成的新字段在Row中的存储位置(top level field)。 |
| 12 | + |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## 配置文件 |
| 17 | + |
| 18 | +一个完整的seatunnel配置包含`spark`, `input`, `filter`, `output`, 即: |
| 19 | + |
| 20 | +``` |
| 21 | +spark { |
| 22 | + ... |
| 23 | +} |
| 24 | +
|
| 25 | +input { |
| 26 | + ... |
| 27 | +} |
| 28 | +
|
| 29 | +filter { |
| 30 | + ... |
| 31 | +} |
| 32 | +
|
| 33 | +output { |
| 34 | + ... |
| 35 | +} |
| 36 | +
|
| 37 | +``` |
| 38 | + |
| 39 | +* `spark`是spark相关的配置, |
| 40 | + |
| 41 | +可配置的spark参数见: |
| 42 | +[Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#available-properties), |
| 43 | +其中master, deploy-mode两个参数不能在这里配置,需要在seatunnel启动脚本中指定。 |
| 44 | + |
| 45 | +* `input`可配置任意的input插件及其参数,具体参数随不同的input插件而变化。 |
| 46 | + |
| 47 | +* `filter`可配置任意的filter插件及其参数,具体参数随不同的filter插件而变化。 |
| 48 | + |
| 49 | +filter中的多个插件按配置顺序形成了数据处理的pipeline, 上一个filter的输出是下一个filter的输入。 |
| 50 | + |
| 51 | +* `output`可配置任意的output插件及其参数,具体参数随不同的output插件而变化。 |
| 52 | + |
| 53 | +`filter`处理完的数据,会发送给`output`中配置的每个插件。 |
| 54 | + |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## 配置文件示例 |
| 59 | + |
| 60 | +一个示例如下: |
| 61 | + |
| 62 | +> 配置中, 以`#`开头的行为注释。 |
| 63 | +
|
| 64 | +``` |
| 65 | +spark { |
| 66 | + # You can set spark configuration here |
| 67 | + # seatunnel defined streaming batch duration in seconds |
| 68 | + spark.streaming.batchDuration = 5 |
| 69 | +
|
| 70 | + # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties |
| 71 | + spark.app.name = "seatunnel" |
| 72 | + spark.executor.instances = 2 |
| 73 | + spark.executor.cores = 1 |
| 74 | + spark.executor.memory = "1g" |
| 75 | +} |
| 76 | +
|
| 77 | +input { |
| 78 | + # This is a example input plugin **only for test and demonstrate the feature input plugin** |
| 79 | + fakestream { |
| 80 | + content = ["Hello World, InterestingLab"] |
| 81 | + rate = 1 |
| 82 | + } |
| 83 | +
|
| 84 | +
|
| 85 | + # If you would like to get more information about how to configure seatunnel and see full list of input plugins, |
| 86 | + # please go to https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/base |
| 87 | +} |
| 88 | +
|
| 89 | +filter { |
| 90 | + split { |
| 91 | + fields = ["msg", "name"] |
| 92 | + delimiter = "," |
| 93 | + } |
| 94 | +
|
| 95 | + # If you would like to get more information about how to configure seatunnel and see full list of filter plugins, |
| 96 | + # please go to https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/base |
| 97 | +} |
| 98 | +
|
| 99 | +output { |
| 100 | + stdout {} |
| 101 | +
|
| 102 | +
|
| 103 | + # If you would like to get more information about how to configure seatunnel and see full list of output plugins, |
| 104 | + # please go to https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/base |
| 105 | +} |
| 106 | +``` |
| 107 | + |
| 108 | +其他配置可参考: |
| 109 | + |
| 110 | +[配置示例1 : Streaming 流式计算](https://github.com/InterestingLab/seatunnel/blob/master/config/streaming.conf.template) |
| 111 | + |
| 112 | +[配置示例2 : Batch 离线批处理](https://github.com/InterestingLab/seatunnel/blob/master/config/batch.conf.template) |
| 113 | + |
| 114 | +[配置示例3 : 一个灵活的多数据流程处理](https://github.com/InterestingLab/seatunnel/blob/master/config/complex.conf.template) |
0 commit comments