Basically, if there are multiple IP's behind a hostname, connect a non-blocking socket to each, then do a select/poll/epoll/kqueue on all of them. Then, once at least one returns, immediately close the rest and use the newly established connection.
This has three nice side-effects. First, if one of the hosts is down, it will never be selected, unlike now when there is a 1 in N chance that you'll be stuck waiting for a very long time. Second, you don't need to explicitly check whether the IPv4 vs IPv6 stacks are operational. A connection that is returned to you is the one that works, regardless of the underlying protocol. Third, this provides crude load balancing. Presumably the host that connected the fastest is also the fastest to process your request. The my blog post for some numbers.
Yup! In fact, I didn't cover it in the writeup, but Chrome and other browsers already do this kind of thing on multiple layers: IPv4 and IPv6 (see Happy Eyeballs), plus to recover from lost SYN's.
Interesting. One of the apps I run is split between two servers. When I was bringing everything up, if one of the severs was down there was always a 50% chance of about a 40 second blank screen before the other server was picked up. If the server that was chosen was in the DNS cache, there was a 100% chance of blank screen. Admittedly, this was about 2 years ago, so things might have changed, but the reliability properties didn't seem to be there at the time.
We've considered this before, and the performance gains do not appear worthwhile (we've run much more extensive tests than you have on top websites to evaluate potential gains), it has extra complexity, it is more of a Hampering Eyeballs approach rather than a Happy Eyeballs approach, it has potential web compat issues (many sites will not expect you to connect to the IPs other than the first), etc etc.
Sorry, but I can't tell from your profile who "we" is. Are any of your results public? Also, what do you mean by "many websites will not expect you to connect to the IPs other than the first"? My OS's DNS resolver randomizes the A/AAAA records I get on every resolution. The website operator cannot control which IP I will connect to. If you mean that websites don't expect two simultaneous connections, then I would be curious how two different servers would coordinate the fact that each got a TCP connection from the same IP at any scale. How does that interact with NAT? For example I believe some IBM campuses are behind giant NAT's with a single IP address per building or some such. Which websites get confused by this?
Sorry, I'm a Chromium dev, and in particular I'm the net maintainer who authored the majority of the current connection management code you are discussing. Our results are not public. The web compat concerns are with more exotic, enterprise configurations, not public ones. They are not the most compelling concerns though.
Basically, if there are multiple IP's behind a hostname, connect a non-blocking socket to each, then do a select/poll/epoll/kqueue on all of them. Then, once at least one returns, immediately close the rest and use the newly established connection.
This has three nice side-effects. First, if one of the hosts is down, it will never be selected, unlike now when there is a 1 in N chance that you'll be stuck waiting for a very long time. Second, you don't need to explicitly check whether the IPv4 vs IPv6 stacks are operational. A connection that is returned to you is the one that works, regardless of the underlying protocol. Third, this provides crude load balancing. Presumably the host that connected the fastest is also the fastest to process your request. The my blog post for some numbers.