Intuitive explanation: if you interpret a time series as the coefficients of a polynomial, the Fourier transform is just polynomial multiplication (this makes sense because polynomial multiplication is just convolution). Now we can write base b numbers as polynomials with single-digit coefficients, evaluated at x = b. Then after multiplying two such numbers/polynomials the coefficients are too large. The only thing remaining is too carry the overflows in the digits to the next digit (much like the school algorithm for adding numbers).
The Fourier transform isn't polynomial multiplication. (It operates on one thing, not two.) What the Fourier transform does is to translate between two representations of a polynomial: (1) its coefficients, (2) its values at a suitable set of points. And then the thing that makes FFT multiplication work is that pointwise multiplication of values is cheaper and simpler than the convolution operation that gives you the coefficients of the product of two polynomials.
Every once in a while I pick up "The Scientist and Engineer's Guide to Digital Signal Processing" (freely downloadable: http://www.dspguide.com/) and it's always so satisfying to see the different relations between domains and how you can think about it. Your example is no different.
But since I never get to actually use that stuff in my just, I always forget the details.
Thanks for that alternate way of thinking about it. As an engineer, half of my intuition about the FFT comes from the notion of the "frequency domain", and the other half from "change of basis".
The explanation you gave, of FFT as an alternative representation of a polynomial through its value at selected points, gives that nice intuition about pointwise multiplication, without appealing (directly) to the "convolution-turns-into-multiplication" catchphrase.