Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Shouldn’t really matter too much if the response is being compressed.

If you’re rendering the table in the DOM, the response size is the least of your issues.



This is the real answer. All of the other answers are suggesting various changes to the JSON structure to eliminate key repetition, but this is irrelevant under compression.

Where it becomes relevant is if each record is stored as a separate document so you can't just compress them all together. Compressing each record separately won't eliminate the duplication, so you're better off with either a columnar format (like a typical database) or a schema-based format (like protobuf.)


To parse it, you need to check the keys. If there can’t be other keys, you can just use an array which is stable on JSON and you can save on keys.

So you just have an array of arrays. Or even a huge array and every X elements, it’s a new record.

If each one has 2 keys,

    [
        {
            key1: ‘a’,
            key2: ‘b’
        },
        {
            key1: ‘a’,
            key2: ‘b’
        }
    ]
Can become,

    [
        [
            ‘a’,
            ‘b’
        ],
        [
            ‘a’,
            ‘b’
        ]
     ]
Or just every 2 will be a new record,

    [
        ‘a’,
        ‘b’,
        ‘a’,
        ‘b’
    ]


But why? Why save on keys when compression will nearly eliminate them for you?


Compression mainly helps with transmission.

Trying to point out that the original structure allows for more flexibility.

If you only cared about space, this compresses better anyway and uncompressed, it still occupies less space.


Sometimes you fetch a large dataset and only show one page at a time in the DOM, or render it as a line in a chart or something. At a previous workplace we had CSV responses in the hundreds of megabytes.


70 GB CSV files aren't uncommon at my work. It's not really a problem since CSV streams well.


That sounds incredible inefficient

What was the rational for such enormous single payloads?


Without knowing more about the application, I'd guess probably caching and/or scaling. If you only need 1 payload then that can be statically generated and cached in your CDN. Which in turn reduces your dependence on the web servers so few nodes are required and/or you can scale your site more easily to demand. Also compute time is more expensive than CDN costs so there might well be some cost savings there too.


This was basically it. The dataset was the same across users so caching was simple and efficient, and the front-end had no difficulty handling this much data (and paging client-side was snappier than requesting anew each time)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: