Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If hung SSH connections are common it's likely due to CGNAT which use aggressively low TCP timeouts. e.g. I've found all UK mobile carriers set their TCP timeout as low as 5 minutes. The "default" is supposed to be 2 hours, you could literally sleep your computer, zero packets, and an SSH connection would continue to work an hour later, and generally speaking this is still true unless CGNAT is in the way.

If you are interested there are a few ways you can fix this:

Easiest is to use a VPN, because the VPN's exit node becomes the effective NAT they usually have normal TCP timeouts due to being less resource constrained. Another nice benefit of this method is you can move between physical networks and your connection doesn't die... If you use Tailscale then you already have this in a more direct way.

Another is to tune the tcp_keepalive kernel parameters. Lowering the keepalive timeout to be less than the CGNAT timeout will cause keepalive probes to prevent CGNAT from dropping the connection even while your SSH connection is technically idle. For Linux I pop these into /etc/sysctl.d/z.conf, I have no idea for Windows or Mac:

  # Keepalive frequently to survive CGNAT
  net.ipv4.tcp_keepalive_time   = 240 
  net.ipv4.tcp_keepalive_intvl  = 60
  net.ipv4.tcp_keepalive_probes = 120
This is really a misuse of these settings, they are supposed to be for checking TCP connections are still alive and clearing them up from the local routing table. Instead the idea is to exploit the probes by sending them more frequently to force idle connections to stay alive in a CGNAT environment (dont worry the probes are tiny and still very infrequent).

_time=240 will send a probe after 4 mins of idle connection instead of the default 2 hours, undercutting the CGNAT timeout. _intvl=60 and _probes=120 mean it will send 120 probes 60 seconds apart (2 hours worth) before considering the connection dead. This will keep it alive for at least 2 hours, but also allows us to have the best of both worlds so that under a nice NAT it keeps the old behaviour, e.g if I temporarily lose my network the SSH connection is still valid after 2 hours, but under CGNAT it will at least not drop the connection after 5 mins so long as I keep my computer on and don't lose the network.

There are also some SSH client keepalive settings but I'm less familiar with them.



Check Mosh. It supports these kind of cuts and it will reconnect seamlessly. It will use far less bandwidth too. I successfully tried it with a 2.7 KBPS connection.


> you could literally sleep your computer,

Depends on whether your sockets survive that, though. Especially on Wi-Fi, many implementations will reset your interface when sleeping, and sockets usually don't survive that.

Even if they do, if the remote side has heartbeats/keepalive enabled (at the TCP or SSH level), your connection might be torn down from the server side.


Yes, by generally I really mean all the defaults are pretty permissive, but I understand some people tune both TCP and SSH on their servers to drop connections faster because they are worried about resource exhaustion.

But if you throw up a default Linux install for your SSH box and have a not-horrible wifi router with a not-horrible internet provider then IME you can sleep your machine and keep an SSH connection alive for quite some time... I appreciate that might be too many "not-horrible" requirements for the real world today though.


Not on a Mac


    Host *
        ServerAliveInterval 25


Yes, this makes your connection more likely not survive client suspends. (ClientAliveInterval, which makes the server ping the client, will make it fail almost certainly, since the server will be active while the client is sleeping.)


Note this is only an issue if not using IPv6.

CGNAT is for access to legacy IPv4 only.


Well, for different reasons, but you have similar issues with IPv6 as well. If your client uses temporary addresses (most likely since they're enabled by default on most OS), OpenSSH will pick one of them over the stable address and when they're rotated the connection breaks.

For some reason, OpenSSH devs refuse to fix this issue, so I have to patch it myself:

    --- a/sshconnect.c
    +++ b/sshconnect.c
    @@ -26,6 +26,7 @@
     #include <net/if.h>
     #include <netinet/in.h>
     #include <arpa/inet.h>
    +#include <linux/ipv6.h>
     
     #include <ctype.h>
     #include <errno.h>
    @@ -370,6 +371,11 @@ ssh_create_socket(struct addrinfo *ai)
      if (options.ip_qos_interactive != INT_MAX)
        set_sock_tos(sock, options.ip_qos_interactive);
     
    + if (ai->ai_family == AF_INET6 && options.bind_address == NULL) {
    +  int val = IPV6_PREFER_SRC_PUBLIC;
    +  setsockopt(sock, IPPROTO_IPV6, IPV6_ADDR_PREFERENCES, &val, sizeof(val));
    + }
    +
      /* Bind the socket to an alternative local IP address */
      if (options.bind_address == NULL && options.bind_interface == NULL)
        return sock;


The temporary address doesn't stay active while there's a connection on it? I think that would be the actual "fix".


I think it does, but that's not the issue: if the interface goes down all the temporary address are gone for good, not just "expired".


If you're on a stable address, and the interface goes down, will it let your connection/socket continue to exist?

Because if the connection/socket gets lost either way, I don't really care if the IP changes too.


I'm not sure what happens to the socket, maybe it's closed and reopened, but with this patch I have SSH sessions lasting for days with no issues. Without it, even roaming between two access points can break the session.


Interesting! Is there anywhere a discussion around their refusal to include your fix?


See this, for example: https://groups.google.com/g/opensshunixdev/c/FVv_bK16ADM/m/R...

It boilds down to using a Linux-specific API, though it's really BSD that is lacking support for a standard (RFC 5014).


It would also seem to break address privacy (usually not much of a concern if you authenticate yourself via SSH anyway, but still, it leaks your Ethernet or Wi-Fi interface's MAC address in many older setups).


This is a good argument for not making it the default, but it would be nice to have it as a command line switch.


Well, yss, but SSH is hardly ever anonymous and this could simply be a cli option.


Not anonymous, but it's pretty unexpected for different servers with potentially different identities for each to learn your MAC address (if you're using the default EUI-64 method for SLAAC).


This is a very common misconception. The issue is not IPv4 or CGNAT, it's stateful middleboxes... of which IPv6 has plenty.

The largest IPv6 deployments in the world are mobile carriers, which are full of stateful firewalls, DPI, and mid-path translation. The difference is that when connections drop it gets blamed on the wireless rather than the network infrastructure.

Also, fun fact: net.ipv4.tcp_keepalive_* applies to IPv6 too. The "ipv4" is just a naming artifact.


Mobile carriers usually have stateful firewalls for IPv6 as well (otherwise you can get a lot of random noise on the air interface, draining both your battery and data plan), so it's an issue just the same.

The constrained resource there is only firewall-side memory, though, as opposed to that plus (IP, port) tuples for CG-NAT.


> otherwise you can get a lot of random noise on the air interface, draining both your battery and data plan

I highly doubt you get "random" data over ipv6. There are more ipv6 addresses than there are atoms on the planet.


Yes, but they're not randomly distributed across the entire number space.

For example, receiving traffic from a given address is a pretty good indicator that there's somebody there possibly worth port scanning.

And where there has once been somebody, there or in the same neighborhood (subnet) might be somebody else, now or in the future.


Then it isn't random noise. It is determined by your own actions.


Or my predecessor/address space neighbor, or that of somebody using my wireless hotspot once, or that of me clicking a random link once and connecting to 671 affiliated advertisers's analytics servers...

I think a default policy of "no inbound connections" does makes sense for most mobile users. It should obviously be configurable.


putty is sending packets for network up since like forever




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: