I don't think GP is asserting that the multimodal encoding is "more rich" or "more accurate", I think they are saying that the felt modality is a different thing than the text modality entirely, and that the former isn't contained in the latter.
Language encodes what people need it to encode to be useful. I heard of an example of colors--there are some languages that don't even have a word for blue.
Huh, text definitely encodes multimodal experiences, it's just not as accurate and as rich encoding as the encodings of real sensations.