Interesting, thanks for that; it's pretty much what I guessed (especially the bit about the supervision tree and hot-code-upgrade advantages being mooted by your infrastructure.)
On a tangent, though:
> What it does not have is type safety
I've tried to work this out before (I'm designing a new language for Erlang's VM), but as far as I can tell, type safety is in practice incompatible with VM-supported hot code upgrade.
If you have two services, A and B, and you need to upgrade them both, but you can't "stop the world" to do an atomic upgrade of both A and B together (because you're running a distributed soft real-time system, after all), then you need to switch out A, and then switch out B.
So, at some point, on some nodes, A will be running a version with an ABI incompatible with B. In a strongly-typed system, the VM wouldn't allow A's new code to load, since it refers to functions in B with type signatures that don't exist.
On the other hand, in a system with pattern-matching and a "let it crash" philosophy, you just let A's new code start up and repeatedly try-and-fail to communicate with B for a while, until B's code gets upgraded as well--and now the types are compatible again.
> type safety is in practice incompatible with VM-supported hot code upgrade.
That's not true.
First, it's very easy to hot reload changes that have been made to the code that are backward compatible. The JVM spec describes in very specific details what that means (adding or removing a method is not backward compatible, modifying a body is, etc...).
This is how Hotswap works, the JVM has been using it for years.
As for changes that are backward incompatible, you can still manage them with application level techniques, such as rolling out servers or simply allow two different versions of the class to exist at the same time (JRebel does that, as do other a few other products in the JVM ecosystem).
Erlang doesn't really have any advantages over statically typed systems in the hot reload area, and its lack of static typing is a deal breaker for pretty much any serious production deployment.
> lack of static typing is a deal breaker for pretty much any serious production deployment.
Are you talking about Google only where they made it a mandate or in general? There are serious production deployments on Python, Ruby, Erlang and Javascript.
I will trade expressiveness and less lines of code with a strong but dynamically typed language + tests over more a static typed language with more lines of code all being equal.
Or put it another way, if strong typing is the main thing that protects against lack of faults and crashes in production, there is a serious issue that needs to be addressed (just my 2 cents).
> As for changes that are backward incompatible, you can still manage them with application level techniques, such as rolling out servers or simply allow two different versions of the class to exist at the same time (JRebel does that, as do other a few other products in the JVM ecosystem).
Neither of these allow for the whole reason Erlang has hot code upgrade in the first place: allowing to upgrade the code on one side of a TCP connection without dropping the connection to the other side. Tell me how to do that with a static type system :)
Tomcat (and other app servers) has support for doing hot reloads of Java web apps while not reloading the HTTP layer (and not dropping TCP connections).
I have implemented a similar system for JRuby apps running inside a Servlet container. There are many caveats. I don't actually recommend it because for a while you're using nearly twice the memory (and JRuby is particularly memory hungry). Also there are many ways to leak the old class definitions such that they are not GC'd (e.g. thread locals). But it's certainly possible.
I suspect that Erlang, Java, and all languages are in the same boat: some parts can be upgraded live in the VM while other parts require a full restart (maybe coordinating with multiple nodes and a load balancer to achieve zero-downtime).
Out of curiosity, where/why would such an exotic feature be needed in today's internet architectures where you always front a group of servers with a load balancer ?
Not all Internet protocols are HTTP. If you're running a service where long-lived connections are the norm, "simply fronting a bunch of servers with a load balancer" can require a pretty smart load balancer. E.g. IMAP connections often last hours or even days, and are required to maintain a degree of statefulness.
Not everything is a website! Also, not everything is stateless. Consider writing a chat application for the web for example and letting users on one page communicate with another one.
On a tangent, though:
> What it does not have is type safety
I've tried to work this out before (I'm designing a new language for Erlang's VM), but as far as I can tell, type safety is in practice incompatible with VM-supported hot code upgrade.
If you have two services, A and B, and you need to upgrade them both, but you can't "stop the world" to do an atomic upgrade of both A and B together (because you're running a distributed soft real-time system, after all), then you need to switch out A, and then switch out B.
So, at some point, on some nodes, A will be running a version with an ABI incompatible with B. In a strongly-typed system, the VM wouldn't allow A's new code to load, since it refers to functions in B with type signatures that don't exist.
On the other hand, in a system with pattern-matching and a "let it crash" philosophy, you just let A's new code start up and repeatedly try-and-fail to communicate with B for a while, until B's code gets upgraded as well--and now the types are compatible again.
It's an interesting problem.