FakeSource connector
The FakeSource is a virtual data source, which randomly generates the number of rows according to the data structure of the user-defined schema, just for some test cases such as type conversion or connector new feature testing
Name | Type | Required | Default | Description |
---|---|---|---|---|
tables_configs | list | no | - | Define Multiple FakeSource, each item can contains the whole fake source config description below |
schema | config | yes | - | Define Schema information |
rows | config | no | - | The row list of fake data output per degree of parallelism see title Options rows Case . |
row.num | int | no | 5 | The total number of data generated per degree of parallelism |
split.num | int | no | 1 | the number of splits generated by the enumerator for each degree of parallelism |
split.read-interval | long | no | 1 | The interval(mills) between two split reads in a reader |
map.size | int | no | 5 | The size of map type that connector generated |
array.size | int | no | 5 | The size of array type that connector generated |
bytes.length | int | no | 5 | The length of bytes type that connector generated |
string.length | int | no | 5 | The length of string type that connector generated |
string.fake.mode | string | no | range | The fake mode of generating string data, support range and template , default range ,if use configured it to template , user should also configured string.template option |
string.template | list | no | - | The template list of string type that connector generated, if user configured it, connector will randomly select an item from the template list |
tinyint.fake.mode | string | no | range | The fake mode of generating tinyint data, support range and template , default range ,if use configured it to template , user should also configured tinyint.template option |
tinyint.min | tinyint | no | 0 | The min value of tinyint data that connector generated |
tinyint.max | tinyint | no | 127 | The max value of tinyint data that connector generated |
tinyint.template | list | no | - | The template list of tinyint type that connector generated, if user configured it, connector will randomly select an item from the template list |
smallint.fake.mode | string | no | range | The fake mode of generating smallint data, support range and template , default range ,if use configured it to template , user should also configured smallint.template option |
smallint.min | smallint | no | 0 | The min value of smallint data that connector generated |
smallint.max | smallint | no | 32767 | The max value of smallint data that connector generated |
smallint.template | list | no | - | The template list of smallint type that connector generated, if user configured it, connector will randomly select an item from the template list |
int.fake.template | string | no | range | The fake mode of generating int data, support range and template , default range ,if use configured it to template , user should also configured int.template option |
int.min | int | no | 0 | The min value of int data that connector generated |
int.max | int | no | 0x7fffffff | The max value of int data that connector generated |
int.template | list | no | - | The template list of int type that connector generated, if user configured it, connector will randomly select an item from the template list |
bigint.fake.mode | string | no | range | The fake mode of generating bigint data, support range and template , default range ,if use configured it to template , user should also configured bigint.template option |
bigint.min | bigint | no | 0 | The min value of bigint data that connector generated |
bigint.max | bigint | no | 0x7fffffffffffffff | The max value of bigint data that connector generated |
bigint.template | list | no | - | The template list of bigint type that connector generated, if user configured it, connector will randomly select an item from the template list |
float.fake.mode | string | no | range | The fake mode of generating float data, support range and template , default range ,if use configured it to template , user should also configured float.template option |
float.min | float | no | 0 | The min value of float data that connector generated |
float.max | float | no | 0x1.fffffeP+127 | The max value of float data that connector generated |
float.template | list | no | - | The template list of float type that connector generated, if user configured it, connector will randomly select an item from the template list |
double.fake.mode | string | no | range | The fake mode of generating float data, support range and template , default range ,if use configured it to template , user should also configured double.template option |
double.min | double | no | 0 | The min value of double data that connector generated |
double.max | double | no | 0x1.fffffffffffffP+1023 | The max value of double data that connector generated |
double.template | list | no | - | The template list of double type that connector generated, if user configured it, connector will randomly select an item from the template list |
common-options | no | - | Source plugin common parameters, please refer to Source Common Options for details |
This example Randomly generates data of a specified type
schema = {
fields {
c_map = "map<string, array<int>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
c_row = {
c_map = "map<string, map<string, string>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
}
16 data matching the type are randomly generated
source {
# This is a example input plugin **only for test and demonstrate the feature input plugin**
FakeSource {
row.num = 16
schema = {
fields {
c_map = "map<string, string>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
result_table_name = "fake"
}
}
This is a self-defining data source information, defining whether each piece of data is an add or delete modification operation, and defining what each field stores
source {
FakeSource {
schema = {
fields {
c_map = "map<string, string>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
rows = [
{
kind = INSERT
fields = [{"a": "b"}, [101], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
}
{
kind = UPDATE_BEFORE
fields = [{"a": "c"}, [102], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
}
{
kind = UPDATE_AFTER
fields = [{"a": "e"}, [103], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
}
{
kind = DELETE
fields = [{"a": "f"}, [104], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
}
]
}
}
This case specifies the number of data generated and the length of the generated value
FakeSource {
row.num = 10
map.size = 10
array.size = 10
bytes.length = 10
string.length = 10
schema = {
fields {
c_map = "map<string, array<int>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
c_row = {
c_map = "map<string, map<string, string>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
}
}
Randomly generated according to the specified template
Using template
FakeSource {
row.num = 5
string.fake.mode = "template"
string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"]
tinyint.fake.mode = "template"
tinyint.template = [1, 2, 3, 4, 5, 6, 7, 8, 9]
smalling.fake.mode = "template"
smallint.template = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
int.fake.mode = "template"
int.template = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
bigint.fake.mode = "template"
bigint.template = [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
float.fake.mode = "template"
float.template = [40.0, 41.0, 42.0, 43.0]
double.fake.mode = "template"
double.template = [44.0, 45.0, 46.0, 47.0]
schema {
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}
The specified data generation range is randomly generated
FakeSource {
row.num = 5
string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"]
tinyint.min = 1
tinyint.max = 9
smallint.min = 10
smallint.max = 19
int.min = 20
int.max = 29
bigint.min = 30
bigint.max = 39
float.min = 40.0
float.max = 43.0
double.min = 44.0
double.max = 47.0
schema {
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}
This is a case of generating a multi-data source test.table1 and test.table2
FakeSource {
tables_configs = [
{
row.num = 16
schema {
table = "test.table1"
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
},
{
row.num = 17
schema {
table = "test.table2"
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}
]
}
rows = [
{
kind = INSERT
fields = [1, "A", 100]
},
{
kind = UPDATE_BEFORE
fields = [1, "A", 100]
},
{
kind = UPDATE_AFTER
fields = [1, "A_1", 100]
},
{
kind = DELETE
fields = [1, "A_1", 100]
}
]
FakeSource {
table-names = ["test.table1", "test.table2"]
schema = {
table = "database.schema.table"
...
}
...
}
- Add FakeSource Source Connector
- [Improve] Supports direct definition of data values(row) (2839)
- [Improve] Improve fake source connector: (2944)
- Support user-defined map size
- Support user-defined array size
- Support user-defined string length
- Support user-defined bytes length
- [Improve] Support multiple splits for fake source connector (2974)
- [Improve] Supports setting the number of splits per parallelism and the reading interval between two splits (3098)