My demo is like one server and one client using PUB-SUB mode on the same CentOS machine. Both of them contains 2 threads with one zmq ctx(one for zmq_send and the other for zmq_msg_recv), client send request to server and server recv it then send response back.client serverc.PUB -req-> s.SUBc.SUB <-resp- s.PUB
The latency at 90% times will be around 80us but sometimes it suddenly goes to 30 milliseconds(peak latency could be 100-200ms) and slowly decrease to normal level with several hundred packets, this phenomenon is observed periodically during the test.The latency is measured just before and after zmq_msg_recv is called(both server and client could observe such issue). But since zmq_send is just a enqueue operation here, it's hard to say it's send late or recv stucked. I'm wondering if libzmq had any mechnism to match such behavior.
more details:NONBLOCK flag is only set in zmq_send, and the TPS is around only 4000, req/resp packet size is 512 bytes. I found that the stream is on the localhosts interface, and hardware resouces should not be guilty. libzmq 4.3.4 is used here and TCP nagle is disabled by default in zmq source code.
I've tried to set more threads to the zmq_ctx, and use affanity to separate pub and sub for both server and client, but it doesn't help. Any suggestion on what's could go wrong or how to debug further is welcomed, thanks in advance!