Not even remotely an expert on chip design, but deep learning dataflow is a lot more predictable and linear than what a CPU or even a GPU doing actual graphics needs to do. Speed of light latency is probably not an issue since the relevant distances are not one side of the die to the other but rather the distance between adjacent components.
Or is the game plan to make them bigger & nobody cares cause no heat & fast?