Description
The socket option ZMQ_XPUB_NODROP (69) is supposed to toggle the behavior of the socket when SENDHWM is reached. If 0/false (the default), messages are silently dropped. If 1/true, sending a message will instead return an error.
We're using TrySendMultipartMessage to send the message, and our logic assumes that this returns false if the message couldn't be sent, but this seems to not always be the case (or we wouldn't be seeing lost messages). I checked the NetMQ (v4, master) source code, but was not able to understand what actually happens in NetMQ when the socket is full.
The motivation for requesting this is that we experience occasional message loss (not a slow joiner, but while the system is running) between publishers and subscribers. All subscribers always lose the same amount of messages, so I'm thinking it must be a publisher problem. We're verifying sequence numbers, both when we send a message, and when we receive it. The publisher code does not report SN gaps, but subscribers do. We've bumped the SENDHWM to 100_000 from the default 1_000 and also increased the buffer, but still have the problem when there are message spikes.
We're going to try to bump the HWM even more, but it seems like a hacky workaround instead of a solution.