-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected Behaviour shown in PUB-SUB pattern #911
Comments
The test results image is broken. I'm assuming you're using TCP and the behaviour is to fan out all the messages to the subscribers so that may explain the latency you are seeing. If may want to look at multicast (epgm) if you don't want this behaviour. Unfortunately, it's not supported in JeroMQ but we're more than happy to accept any PRs that add it. |
Hi @trevorbernard Thanks for quick reply, I have fixed the broken image and you will be able to see the results now As for multicast, we can only use unicast for our application so its not an option for us. |
Code for publisher
|
code for subscriber
|
Because of Java’s GC, the high percentile values can be surprising. You can have a look at issue #723 were at track occasional high latency events. I explained my methodology at #723 (comment) |
Are you sure your chrony/ntp setup is working perfectly ? Does azur ensure network latency ? |
yes, the chrony setup is working fine, we can consider +-1 ms error in the results due to synchronization. moreover i ran tests many times, and there is always a skewness in the results but the vms getting high values are different every time, that supports the fact that synchronization is not a problem here
here are another run results, as you can see that vm's getting high values are different, and this was a 2 hour test run |
I am using pub-sub pattern for fanning out a stream to 25 subscribers. so basically, data is being transferred from 1 publisher to 25 subscribers.
The payload size of single message is 256 Bytes
And we are publishing 40k messages / sec (as batch of 4k messages every 100 ms)
5 IO threads of publisher were kept
I ran my publisher on (32 core 126 GB ram) azure vm
And each of 25 subscribers run on (4 core 16 GB ram) azure vm
The test was run for 12 hours and the results achieved are shown below in the image
p95, p99.... are in milliseconds
mean_rate is the mean of messages received / second at subscriber side
name is the vm name on which subscriber was running
count is the total number of messages sent to subscriber in 12 hours
As you can in the image that some subscribers received the messages in less than 5 ms but some took around 20 ms (talking about p99 latencies).
Why there is so much skewness in the latencies, is there an explanation why such behaviour is shown by pub-sub pattern?
In theory every subscriber should receive messages in nearly same time.
Is it possible to avoid such kind of behaviour, and get nearly equal latencies at each subscriber ?
The text was updated successfully, but these errors were encountered: