Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really love ZeroMQ. We used it at LogNormal for all of the back end RPC. We did run into one issue though. If your 0MQ servers (the ones that call bind) run on a virtual host, as do your clients (the ones that call connect), and if that virtual host reboots, then the clients do not automatically reconnect to the server. I didn't quite have the time to investigate this, but from what I could tell, it only happens with virtual hosts across a reboot. It may have to do with iptables rules that we have in place (we have an IP whitelist), but I can't be certain of this.


I've never come across this. I'm able to kill my server (the one with the REP sockets), submit a form on the web page, have the HTTP request in the browser hang (that under the hood proxies to ZeroMQ), start up the server, and have the request in the browser get replied to almost immediately. But I've only been running ZeroMQ in topologies with two nodes at most, no advanced routing etc.


when you say kill the server, do you mean kill the server process or kill the box that the server runs on? I have no problem with the former, but I have a problem with the latter, and I can't tell why. It's pretty consistent.


It's a TCP issue. When the other end of the connection goes away silently (power goes of, cable is cut off etc.) TCP doesn't inform ZeroMQ that the connection is broken, so ZeroMQ uses it furthe without trying to handle the error.


It's only an issue with TCP if you forget to enable keep-alive (http://www.gnugk.org/keepalive.html). Keep-alive solves this problem, and it solves other problems too, like when your network equipment decides a connection is no longer needed and disables it. Or, you can implement application-level heartbeats--whenever you send you have the chance to recognize that your peer has gone away.


That's an interesting thought. I need to look into whether I have TCP keepalive turned on for my private interfaces.


I'd call that a characteristic of TCP, not an 'issue'.


I mean "kill -9" on the server process. I'm not under that much load, though. From reading the other comments, my suspect is REQ/REP socket lockstep, which I've never encountered, but that I guess would happen if something dies while the REQ server waits for a reply that it never gets. Do you have timeout handling in your system?


I have no problem with kill -9. I use PUB/SUB sockets. I only have problems with reboots.


I suspect it is something in you setup -- that is a use case that ZMQ is specifically very good at... the fact that you can start the client before or after you start the server (doesn't apply to inproc) is awesome.


yeah, starting/stopping the server works without a problem. rebooting the vm on which the server runs is what causes the problem.


Have you compared the traffic using Wireshark or tcpdump?


unfortunately I can't do that while the vm is booting up, and after bootup there is no traffic, so nothing to look at.


Without knowing details this sounds very much like the stuck clients issue Armin Ronacher complains about here: http://lucumr.pocoo.org/2012/6/26/disconnects-are-good-for-y... (discussed here https://news.ycombinator.com/item?id=4161073)

In a nutshell if the server goes away while a client is in recv the client will never know and waits forever.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: