As an analogy, how about if all text was served to you as .png files where the person writing the text has guessed what size text you want?
This is vaguely equivalent - it is difficult to go from image->text, just as it is difficult (or impossible) to go from low dynamic range -> high dynamic range. The reverse direction is cheap (and possible).
In fact, some compression codecs even have well defined compression curves to use - so your codec isn't just "bitstream in, pcm out" it is "bitstream+listening environment in, pcm out".
Isn't it more like the artist wrote his text using all kinds of font sizes that require the user to zoom in and out all the time to read it comfortably? Compression would mean that the range of font sizes is compressed so that one zoom level works reasonably well for the whole text.
This is vaguely equivalent - it is difficult to go from image->text, just as it is difficult (or impossible) to go from low dynamic range -> high dynamic range. The reverse direction is cheap (and possible).
In fact, some compression codecs even have well defined compression curves to use - so your codec isn't just "bitstream in, pcm out" it is "bitstream+listening environment in, pcm out".