When I tried something similar I found that Googles profile-based compilation seemed to really affect performance of their builds, and with my own build I couldn't seem to match their performance, presumably because I didn't have as good a profile, but maybe because they had some spiffy compiler optimizations they were keeping close to their chest.
Also, GPU is an economic requirement - it doesn't make a huge difference to user-percieved performance, but makes a big difference to CPU used, since all the render worker threads suddenly drop from 100% CPU to near zero... That lets you put more users on the same VM... There are some ideas I was experimenting with trying to do 'remote rendering' - ie rendering work is sent to a remote machine with a GPU. I got it working for basic image/text rendering, but not webgl/canvas/video encode/decode...
In general, a single GPU can handle hundreds of browsing sessions, whereas a CPU can only handle tens of sessions, so costs can be cut quite a lot by decoupling them. That and one can get better bin-packing by decoupling...