if your application needs to be up a lot then you will also care about memory fragmentation - as far as I know most benchmarks do not focus on such problems. You still must test it under real live conditions! No way around that.
So true. Inside Google, it used to be the case that every N months someone had to figure how to build their Python daemon with Blaze (Bazel) using tcmalloc. It was painful and I hope it's easier nowadays. Without it, after a few weeks you'd see memory leaks whose actual root cause was fragmentation. I can't remember how they interacted with the allocator in Python, but I always suspected that heavy use of protocol buffers made the problem a lot more acute.