Shouldn’t really matter too much if the response is being compressed. If you’re ...

ludocode · on June 3, 2021

This is the real answer. All of the other answers are suggesting various changes to the JSON structure to eliminate key repetition, but this is irrelevant under compression.

Where it becomes relevant is if each record is stored as a separate document so you can't just compress them all together. Compressing each record separately won't eliminate the duplication, so you're better off with either a columnar format (like a typical database) or a schema-based format (like protobuf.)

rovr138 · on June 3, 2021

To parse it, you need to check the keys. If there can’t be other keys, you can just use an array which is stable on JSON and you can save on keys.

So you just have an array of arrays. Or even a huge array and every X elements, it’s a new record.

If each one has 2 keys,

    [
        {
            key1: ‘a’,
            key2: ‘b’
        },
        {
            key1: ‘a’,
            key2: ‘b’
        }
    ]

Can become,

Or just every 2 will be a new record,

    [
        ‘a’,
        ‘b’,
        ‘a’,
        ‘b’
    ]

ludocode · on June 3, 2021

But why? Why save on keys when compression will nearly eliminate them for you?

rovr138 · on June 3, 2021

Compression mainly helps with transmission.

Trying to point out that the original structure allows for more flexibility.

If you only cared about space, this compresses better anyway and uncompressed, it still occupies less space.

brundolf · on June 3, 2021

Sometimes you fetch a large dataset and only show one page at a time in the DOM, or render it as a line in a chart or something. At a previous workplace we had CSV responses in the hundreds of megabytes.

nly · on June 3, 2021

70 GB CSV files aren't uncommon at my work. It's not really a problem since CSV streams well.

cerved · on June 3, 2021

That sounds incredible inefficient

What was the rational for such enormous single payloads?

hnlmorg · on June 3, 2021

Without knowing more about the application, I'd guess probably caching and/or scaling. If you only need 1 payload then that can be statically generated and cached in your CDN. Which in turn reduces your dependence on the web servers so few nodes are required and/or you can scale your site more easily to demand. Also compute time is more expensive than CDN costs so there might well be some cost savings there too.

brundolf · on June 4, 2021

This was basically it. The dataset was the same across users so caching was simple and efficient, and the front-end had no difficulty handling this much data (and paging client-side was snappier than requesting anew each time)