The benefit of CNNs is like the benefits of SVMs -- they generalize all the great old techniques so you don't have to understand them all, you just throw more CPU at the optimization problem.
I don't think that's true particularly in this case.
This paper is pointing out that you can encode a structural prior in a CNN - but knowing the "great old techniques" will help you design the right network architecture to do that.
SVMs were a surprise when they came out, no so much a generalization as a challenge (at least at first)