Description
I am using pub-sub pattern for fanning out a stream to 25 subscribers. so basically, data is being transferred from 1 publisher to 25 subscribers.
The payload size of single message is 256 Bytes
And we are publishing 40k messages / sec (as batch of 4k messages every 100 ms)
5 IO threads of publisher were kept
I ran my publisher on (32 core 126 GB ram) azure vm
And each of 25 subscribers run on (4 core 16 GB ram) azure vm
The test was run for 12 hours and the results achieved are shown below in the image
p95, p99.... are in milliseconds
mean_rate is the mean of messages received / second at subscriber side
name is the vm name on which subscriber was running
count is the total number of messages sent to subscriber in 12 hours
name min max mean p50 p75 p95 p98 p99 p999 count mean_rate
0 vm.27 -1 0 -0.792223 -1.0 -1.0 0.0 0.0 0.0 0.0 1729096000 39999.223763
1 vm.29 -1 30 -0.206549 0.0 0.0 0.0 0.0 0.0 24.0 1729326000 40000.012745
2 vm.28 -1 1 -0.221198 0.0 0.0 0.0 1.0 1.0 1.0 1729312000 39999.594228
3 vm.30 -1 2 0.187631 0.0 0.0 1.0 1.0 1.0 2.0 1729322000 39999.994097
4 vm.9 0 4 0.560987 1.0 1.0 1.0 1.0 1.0 4.0 1727722675 39999.800001
5 vm.18 0 2 0.568149 1.0 1.0 1.0 1.0 1.0 1.0 1728531783 40000.006029
6 vm.17 0 11 0.587446 1.0 1.0 1.0 1.0 1.0 5.0 1728532568 40000.017977
7 vm.8 -1 15 0.348731 0.0 1.0 1.0 1.0 2.0 12.0 1727736000 39999.957408
8 vm.13 0 11 0.998037 1.0 1.0 1.0 2.0 2.0 11.0 1728132000 39999.955226
9 vm.16 0 3 0.764761 1.0 1.0 2.0 2.0 2.0 3.0 1728338585 39999.986043
10 vm.15 0 28 1.243683 1.0 1.0 2.0 2.0 2.0 20.0 1728332000 39999.965091
11 vm.14 1 3 1.393507 1.0 2.0 2.0 2.0 2.0 3.0 1728138000 40000.040671
12 vm.31 -1 4 0.294211 0.0 0.0 1.0 2.0 3.0 4.0 1729526000 39999.992003
13 vm.32 -1 5 0.425758 0.0 1.0 1.0 3.0 3.0 5.0 1729530000 40000.014967
14 vm.10 0 11 0.594815 1.0 1.0 1.0 1.0 4.0 10.0 1727932000 39999.629982
15 vm.25 0 6 0.766227 1.0 1.0 2.0 3.0 4.0 6.0 1729106000 39999.389766
16 vm.24 0 28 0.881269 1.0 1.0 2.0 2.0 6.0 28.0 1728920000 39999.583785
17 vm.12 0 19 0.741611 1.0 1.0 1.0 1.0 7.0 16.0 1728137085 39999.860449
18 vm.26 0 14 3.304725 2.0 5.0 9.0 10.0 11.0 14.0 1729144000 39999.952394
19 vm.20 0 15 4.277812 3.0 6.0 10.0 11.0 12.0 14.0 1728737203 39999.659962
20 vm.19 1 33 4.406672 3.0 6.0 10.0 11.0 15.0 33.0 1728544732 39999.944749
21 vm.11 0 35 1.825326 1.0 2.0 3.0 11.0 19.0 34.0 1727944035 40000.008251
22 vm.21 0 25 7.798344 7.0 11.0 18.0 20.0 20.0 25.0 1728747832 39999.996823
23 vm.23 1 26 7.826982 7.0 11.0 18.0 20.0 21.0 26.0 1728910503 39999.060496
24 vm.22 1 28 8.625836 8.0 11.0 19.0 21.0 21.0 28.0 1728748000 39999.967383
As you can in the image that some subscribers received the messages in less than 5 ms but some took around 20 ms (talking about p99 latencies).
Why there is so much skewness in the latencies, is there an explanation why such behaviour is shown by pub-sub pattern?
In theory every subscriber should receive messages in nearly same time.
Is it possible to avoid such kind of behaviour, and get nearly equal latencies at each subscriber ?