You get the load balancing for free in K8s and rolling deploys. What you do is upgrade the deployment with a new docker image, and yes immutably it is replaced. In a case of an HTTP service, k8s will wait until a pod (container) responds healthy until it is put in the loop. Then it steps down old pods according to your rolling deploy metrics and replaces them. You can define what that is like having a max number of pods with a minimum number of pods up.