It's a perfectly pragmatic engineering choice. Blocking is visible only when the compression is too heavy. When degradation is imperceptible, then the block edges are imperceptible too, and the problem doesn't need to be solved (in JPEG imperceptible still means 10:1 data size reduction).
Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.
Deblocking is an inelegant hack.
Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).
The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.
I only recently learned that JPEG and MPEG-1 were designed for near-lossless compression, so the massive bitrate reductions which came further down the road had nothing to do with the original design.
"Inelegant" is the right word; it's hard to shake off the feeling that we might have missed something important. I suspect the next big breakthrough might be waiting for researchers to focus on lower-quality compression specifically, rather than requiring every new codec to improve the state of the art in near-lossless compression.
> for researchers to focus on lower-quality compression specifically
JPEG-XL already does this because it uses VarDCT (Variable-size Discrete Cosine Transform) aka adaptive block sizes (2×2 up to 256×256). Large smooth areas use huge blocks and fine detail uses small blocks to preserve detail. JXL spends bits where your eyes care most instead of evenly across the image. It also has many techniques it uses to really focus on keeping edges sharp.
JPEG XL achieves about half the bitrate of an equal-quality JPEG, even at lower quality levels. That's a real achievement, but the complexity cost is high; I'd estimate that JPEG XL decoders are at least ten times more complex than JPEG decoders. Modern lossy image codecs are "JPEG, with three decades of patch notes" :-)
I think we're badly in need of an entirely new image compression technique; the block-based DCT has serious flaws, such as its high coding cost for edges and its tendency to create block artefacts. The modern hardware landscape is quite different from 1992, so it's plausible that the original researchers might have missed something important, all those years ago.
The really big problem with blocking is that it introduces very visible artifacts in dark backgrounds and that they're a type of artifact that draws your attention to them. Part of the problem here is that 8 bit SRGB isn't quite sufficient to prevent visible banding without dithering in dark regions, so whne you add blocking artifacts to already slightly visible banding the result turns into a jagged attention grabbing mess.
Deblocking is inelegant but blur is a much less noticeable artifact than blocks. That said the best answer turns out to be having the input image in 10 bit, and having encoders/decoders work at higher internal bitrates which allows for the encoder to make smarter choices about what detail is real, gives the decoder some info from which it can more intelligently dither the decoded image.
IIUC AV2 is trying to resurrect the Daala deblocking work. I think Jpeg-xl also has some good stuff here (but I don't remember exactly what)
Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.
Deblocking is an inelegant hack.
Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).
The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.