Its comparing 1 dimensional space to 2 demential space. So, any comparison is gong to be a little hand wavy. But temporally, I believe the comparison is apt.
> Its comparing 1 dimensional space to 2 demential space
Although I love the coinage "2 demential space", I think you mean "comparing one-dimensional space [audio] to three-dimensional space [video]". A two-dimensional signal might be a still image or a temporal sequence of samples from a one-dimensional array of sensors, such as those in a single slice of a CT machine or a linear MIMO antenna array. A video signal is three-dimensional, not two-dimensional, and probably not "2 demential" either.
No, wavelength is just one more dimension along which intensity may vary (in addition to X, and Y, and time), not five or six more dimensions, so a multi-band image is only three-dimensional, regardless of whether there are three wavelength bands (like RGB or YCbCr), four (like RGBA), 8 (like Landsat), or 210 (like HYDICE, AVIRIS, and other imaging spectrometers).
My point is, I am comparing temporal dimension to temporal dimension regardless of how many special dimensions there are. And I don’t understand the argument that an audio sample is more analogous to a pixel that it is a frame on a time line.
In particular, in data processing, all dimensionality is equivalent, since and infinite set S the same cardinality as S^n for any whole number n, and any finite set is smaller than the 1-dimensional set of naturals.
Yeah, at least if Hilbert spaces can fuck off, which is why we can approximate signal processing on digital computers at all. And, because of space-filling curves, in some sense ℝⁿ is equivalent to ℝ. But, to understand signal processing, a much more useful point of view is that ℝⁿ is significantly different for different values of n, but not completely unrelated; and ℤⁿ is a useful approximation of ℝⁿ, as is (ℤ/mℤ)ⁿ.