Skip to content

Unexpected Behaviour shown in PUB-SUB pattern #911

Open
@rajputChaturya

Description

@rajputChaturya

I am using pub-sub pattern for fanning out a stream to 25 subscribers. so basically, data is being transferred from 1 publisher to 25 subscribers.

The payload size of single message is 256 Bytes
And we are publishing 40k messages / sec (as batch of 4k messages every 100 ms)
5 IO threads of publisher were kept

I ran my publisher on (32 core 126 GB ram) azure vm
And each of 25 subscribers run on (4 core 16 GB ram) azure vm

The test was run for 12 hours and the results achieved are shown below in the image
p95, p99.... are in milliseconds
mean_rate is the mean of messages received / second at subscriber side
name is the vm name on which subscriber was running
count is the total number of messages sent to subscriber in 12 hours

   name  min  max      mean  p50   p75   p95   p98   p99  p999       count     mean_rate
0   vm.27   -1    0 -0.792223 -1.0  -1.0   0.0   0.0   0.0   0.0  1729096000  39999.223763
1   vm.29   -1   30 -0.206549  0.0   0.0   0.0   0.0   0.0  24.0  1729326000  40000.012745
2   vm.28   -1    1 -0.221198  0.0   0.0   0.0   1.0   1.0   1.0  1729312000  39999.594228
3   vm.30   -1    2  0.187631  0.0   0.0   1.0   1.0   1.0   2.0  1729322000  39999.994097
4    vm.9    0    4  0.560987  1.0   1.0   1.0   1.0   1.0   4.0  1727722675  39999.800001
5   vm.18    0    2  0.568149  1.0   1.0   1.0   1.0   1.0   1.0  1728531783  40000.006029
6   vm.17    0   11  0.587446  1.0   1.0   1.0   1.0   1.0   5.0  1728532568  40000.017977
7    vm.8   -1   15  0.348731  0.0   1.0   1.0   1.0   2.0  12.0  1727736000  39999.957408
8   vm.13    0   11  0.998037  1.0   1.0   1.0   2.0   2.0  11.0  1728132000  39999.955226
9   vm.16    0    3  0.764761  1.0   1.0   2.0   2.0   2.0   3.0  1728338585  39999.986043
10  vm.15    0   28  1.243683  1.0   1.0   2.0   2.0   2.0  20.0  1728332000  39999.965091
11  vm.14    1    3  1.393507  1.0   2.0   2.0   2.0   2.0   3.0  1728138000  40000.040671
12  vm.31   -1    4  0.294211  0.0   0.0   1.0   2.0   3.0   4.0  1729526000  39999.992003
13  vm.32   -1    5  0.425758  0.0   1.0   1.0   3.0   3.0   5.0  1729530000  40000.014967
14  vm.10    0   11  0.594815  1.0   1.0   1.0   1.0   4.0  10.0  1727932000  39999.629982
15  vm.25    0    6  0.766227  1.0   1.0   2.0   3.0   4.0   6.0  1729106000  39999.389766
16  vm.24    0   28  0.881269  1.0   1.0   2.0   2.0   6.0  28.0  1728920000  39999.583785
17  vm.12    0   19  0.741611  1.0   1.0   1.0   1.0   7.0  16.0  1728137085  39999.860449
18  vm.26    0   14  3.304725  2.0   5.0   9.0  10.0  11.0  14.0  1729144000  39999.952394
19  vm.20    0   15  4.277812  3.0   6.0  10.0  11.0  12.0  14.0  1728737203  39999.659962
20  vm.19    1   33  4.406672  3.0   6.0  10.0  11.0  15.0  33.0  1728544732  39999.944749
21  vm.11    0   35  1.825326  1.0   2.0   3.0  11.0  19.0  34.0  1727944035  40000.008251
22  vm.21    0   25  7.798344  7.0  11.0  18.0  20.0  20.0  25.0  1728747832  39999.996823
23  vm.23    1   26  7.826982  7.0  11.0  18.0  20.0  21.0  26.0  1728910503  39999.060496
24  vm.22    1   28  8.625836  8.0  11.0  19.0  21.0  21.0  28.0  1728748000  39999.967383

As you can in the image that some subscribers received the messages in less than 5 ms but some took around 20 ms (talking about p99 latencies).

Why there is so much skewness in the latencies, is there an explanation why such behaviour is shown by pub-sub pattern?

In theory every subscriber should receive messages in nearly same time.

Is it possible to avoid such kind of behaviour, and get nearly equal latencies at each subscriber ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions