I really love ZeroMQ. We used it at LogNormal for all of the back end RPC. We di...

augustl · on July 22, 2013

I've never come across this. I'm able to kill my server (the one with the REP sockets), submit a form on the web page, have the HTTP request in the browser hang (that under the hood proxies to ZeroMQ), start up the server, and have the request in the browser get replied to almost immediately. But I've only been running ZeroMQ in topologies with two nodes at most, no advanced routing etc.

bluesmoon · on July 23, 2013

when you say kill the server, do you mean kill the server process or kill the box that the server runs on? I have no problem with the former, but I have a problem with the latter, and I can't tell why. It's pretty consistent.

rumcajz · on July 23, 2013

It's a TCP issue. When the other end of the connection goes away silently (power goes of, cable is cut off etc.) TCP doesn't inform ZeroMQ that the connection is broken, so ZeroMQ uses it furthe without trying to handle the error.

jzwinck · on July 23, 2013

It's only an issue with TCP if you forget to enable keep-alive (http://www.gnugk.org/keepalive.html). Keep-alive solves this problem, and it solves other problems too, like when your network equipment decides a connection is no longer needed and disables it. Or, you can implement application-level heartbeats--whenever you send you have the chance to recognize that your peer has gone away.

bluesmoon · on July 25, 2013

That's an interesting thought. I need to look into whether I have TCP keepalive turned on for my private interfaces.

mh- · on July 23, 2013

I'd call that a characteristic of TCP, not an 'issue'.

augustl · on July 23, 2013

I mean "kill -9" on the server process. I'm not under that much load, though. From reading the other comments, my suspect is REQ/REP socket lockstep, which I've never encountered, but that I guess would happen if something dies while the REQ server waits for a reply that it never gets. Do you have timeout handling in your system?

bluesmoon · on July 25, 2013

I have no problem with kill -9. I use PUB/SUB sockets. I only have problems with reboots.

MetaCosm · on July 22, 2013

I suspect it is something in you setup -- that is a use case that ZMQ is specifically very good at... the fact that you can start the client before or after you start the server (doesn't apply to inproc) is awesome.

bluesmoon · on July 23, 2013

yeah, starting/stopping the server works without a problem. rebooting the vm on which the server runs is what causes the problem.

nitrogen · on July 23, 2013

Have you compared the traffic using Wireshark or tcpdump?

bluesmoon · on July 25, 2013

unfortunately I can't do that while the vm is booting up, and after bootup there is no traffic, so nothing to look at.

micktwomey · on July 23, 2013

Without knowing details this sounds very much like the stuck clients issue Armin Ronacher complains about here: http://lucumr.pocoo.org/2012/6/26/disconnects-are-good-for-y... (discussed here https://news.ycombinator.com/item?id=4161073)

In a nutshell if the server goes away while a client is in recv the client will never know and waits forever.