You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR introduces the changes to stream changes in parallel using
multiple tasks for a table given the user provides the hash_code ranges
for it to stream. The following changes have been introduced in this PR:
1. New configurations:
a. `streaming.mode`: This values takes the input as `default` or
`parallel` which is then used to decide whether or not parallel
streaming mode is supposed to be used.
b. `slot.names`: A list of comma separated values for all the slot names
which should be used by each task.
c. `publication.names`: A list of comma separated values for all the
publication names which should be used by each task.
d. `slot.ranges`: A list of **semi-colon** separated values for slot
ranges in the format `a,b;b,c;c,d`.
2. Validations in the class `YBValidate` have been introduced:
a. To validate that the complete hash range is provided by the user and
nothing is missing.
b. To validate that the number of slot names is equal to the publication
names as well as the number of slot ranges.
c. To ensure that there's only one table provided in the
`table.include.list` as parallel streaming will not work with multiple
tables.
3. Support for snapshot with `streaming.mode` parallel.
a. This will require providing the hash part of the primary key columns
to the configuration parameter `primary.key.hash.columns`.
4. The `PostgresPartition` object will now also use the slot name to
uniquely identify the source partition.
### Usage example
If the connector configuration contains the following properties:
```
{
...
"streaming.mode":"parallel",
"slot.names":"rs1,rs1",
"publication.names":"pb1,pb2",
"slot.ranges":"0,32768;32768,65536"
...
}
```
then we will have 2 tasks created:
1. `task 0`: `slot=rs1 publication=pb1 hash_range=0,32768`
2. `task 1`: `slot=rs2 publication=pb2 hash_range=32768,65536`
### Note:
It is currently the user's responsibility to provide full hash ranges
and maintain the order given in the configs for `slot.names`,
`publication.names` and `slot.ranges` as the values will be picked
sequentially and divided into tasks. Thus, in order to ensure that the
task with a slot gets the same hash_range every time, the user needs to
be careful with the order provided.
This closesyugabyte/yugabyte-db#26107.
0 commit comments