I really love ZeroMQ. We used it at LogNormal for all of the back end RPC. We did run into one issue though. If your 0MQ servers (the ones that call bind) run on a virtual host, as do your clients (the ones that call connect), and if that virtual host reboots, then the clients do not automatically reconnect to the server. I didn't quite have the time to investigate this, but from what I could tell, it only happens with virtual hosts across a reboot. It may have to do with iptables rules that we have in place (we have an IP whitelist), but I can't be certain of this.
I've never come across this. I'm able to kill my server (the one with the REP sockets), submit a form on the web page, have the HTTP request in the browser hang (that under the hood proxies to ZeroMQ), start up the server, and have the request in the browser get replied to almost immediately. But I've only been running ZeroMQ in topologies with two nodes at most, no advanced routing etc.
when you say kill the server, do you mean kill the server process or kill the box that the server runs on? I have no problem with the former, but I have a problem with the latter, and I can't tell why. It's pretty consistent.
It's a TCP issue. When the other end of the connection goes away silently (power goes of, cable is cut off etc.) TCP doesn't inform ZeroMQ that the connection is broken, so ZeroMQ uses it furthe without trying to handle the error.
It's only an issue with TCP if you forget to enable keep-alive (http://www.gnugk.org/keepalive.html). Keep-alive solves this problem, and it solves other problems too, like when your network equipment decides a connection is no longer needed and disables it. Or, you can implement application-level heartbeats--whenever you send you have the chance to recognize that your peer has gone away.
I mean "kill -9" on the server process. I'm not under that much load, though. From reading the other comments, my suspect is REQ/REP socket lockstep, which I've never encountered, but that I guess would happen if something dies while the REQ server waits for a reply that it never gets. Do you have timeout handling in your system?
I suspect it is something in you setup -- that is a use case that ZMQ is specifically very good at... the fact that you can start the client before or after you start the server (doesn't apply to inproc) is awesome.