Latency after moving micro-service (using ZeroMQ, C, & Python processes) from 64 bit hardware to 32 bit hardware, but nominal cpu usage

I have two processes wrote in C that set up PUSH/PULL ZeroMQ sockets and two threads in a Python process that mirror the PUSH/PULL sockets. There are roughly 80 - 300 light weight (<30 bytes) messages per second being sent from the C process to the Python process, and 10-30 similar messages from the Python process to the C process.

I was running these services on 64 bit ARMv8 (Ubuntu based) and AMD64 (Ubuntu 18.04) with no noticeable latency. I tried running the exact same services on a 32 bit Linux based system and was shocked to see messages coming through over 30 seconds behind, even after killing the C services. When checking the CPU usage, it was pretty flat 30-40% and didn't appear to be the bottle neck.

My ZeroMQ socket settings didn't change between systems, I set LINGER to 0, I tried RCVTIMEO between 0 to 100 ms, and I tried varying BACKLOG between 0 and 50, with no difference either way. I tried using multiple IO threads and setting socket thread affinity, also to no avail. For the PUSH sockets I'm connecting the sockets on tcp://localhost:##### and binding the PULL sockets to tcp://*:#####. I also used ipc:///tmp/..., messages were being sent and received, but the latency still existed on the 32 bit system.

I investigated other Python steps in-between receiving the messages, and they don't appear to be taking more than a millisecond at most. When I time the socket.recv(0) it's as high as 0.02 seconds even when the RCVTIMEO is set to 0 for that socket.

Any suggestions why I would see this behaviour on the new 32 bit platform and not on other platforms? Am I possibly looking in all the wrong places?

Here's a bit of code to help explain:

The connection and the _recv() class-method are roughly depicted below:

    def _connect(self):        self.context = zmq.Context(4)        self.sink = self.context.socket(zmq.PULL)        self.sink.setsockopt(zmq.LINGER, 0)        self.sink.setsockopt(zmq.RCVTIMEO, 100)        self.sink.setsockopt(zmq.BACKLOG, 0)        self.sink.bind("tcp://*:55755")    def _recv(self):        while True:            msg = None            try:                msg = self.sink.recv(0)  # Use blocking or zmq.NOBLOCK, still appears to be slow            except zmq.Error                ... meaningful exception handle here            # This last step, when timed usually takes less than a millisecond to process            if msg:                msg_dict = utils.bytestream_to_dict(msg)  # unpacking step (negligible)                if msg_dict:                    self.parser.parse(msg_dict)  # parser is a dict of callbacks also negligible

On the C process side

    zmq_init (4);    void *context = zmq_ctx_new ();    /* Connect the Sender */    void *vent = zmq_socket (context, ZMQ_PUSH);    int timeo = 0;    int timeo_ret = zmq_setsockopt(vent, ZMQ_SNDTIMEO, &timeo, sizeof(timeo));    if (timeo_ret != 0)        error("Failed to set ZMQ recv timeout because %s", zmq_strerror(errno));    int linger = 100;    int linger_ret = zmq_setsockopt(vent, ZMQ_LINGER, &linger, sizeof(linger));    if (linger_ret != 0)        error("Failed to set ZMQ linger because %s", zmq_strerror(errno));    if (zmq_connect (vent, vent_port) == 0)        info("Successfully initialized ZeroMQ ventilator on %s", vent_port);    else {        error("Failed to initialize %s ZeroMQ ventilator with error %s", sink_port,                zmq_strerror(errno));        ret = 1;    }    ...    /* When a message needs to be sent it's instantly hitting this where msg is a char* */    ret = zmq_send(vent, msg, msg_len, ZMQ_NOBLOCK);

On docker running on target 32 bit systemlstopo - -v --no-io

Machine (P#0 local=1019216KB total=1019216KB HardwareName="Freescale i.MX6 Quad/DualLite (Device Tree)" HardwareRevision=0000 HardwareSerial=0000000000000000 Backend=Linux LinuxCgroup=/docker/d2b0a3b3a5eedb7e10fc89fdee6e8493716a359597ac61350801cc302d79b8c0 OSName=Linux OSRelease=3.10.54-dey+g441c8d4 OSVersion="#1 SMP PREEMPT RT Tue Jan 28 12:11:37 CST 2020" HostName=db1docker Architecture=armv7l hwlocVersion=1.11.12 ProcessName=lstopo)  Package L#0 (P#0 CPUModel="ARMv7 Processor rev 10 (v7l)" CPUImplementer=0x41 CPUArchitecture=7 CPUVariant=0x2 CPUPart=0xc09 CPURevision=10)    Core L#0 (P#0)      PU L#0 (P#0)    Core L#1 (P#1)      PU L#1 (P#1)    Core L#2 (P#2)      PU L#2 (P#2)    Core L#3 (P#3)      PU L#3 (P#3)depth 0:        1 Machine (type #1) depth 1:       1 Package (type #3)  depth 2:      4 Core (type #5)   depth 3:     4 PU (type #6)

EDIT:

We were able to make the latency disappear on our target machine by disabling nearly all other worker threads.

Latency after moving micro-service (using ZeroMQ, C, & Python processes) from 64 bit hardware to 32 bit hardware, but nominal cpu usage

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112