Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> When will PostgreSQL support column-oriented tables?

Citus or Timescale extension?

- Timescale: https://news.ycombinator.com/item?id=21412596

- Citus: https://news.ycombinator.com/item?id=26369305

-----------

"Citus 10 brings columnar compression to Postgres"

https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...

"The following options are available: compression: [none|pglz|zstd|lz4|lz4hc] - set the compression type for newly-inserted data. Existing data will not be recompressed/decompressed. The default value is zstd (if support has been compiled in). compression_level: <integer> - Sets compression level. Valid settings are from 1 through 19. If the compression method does not support the level chosen, the closest level will be selected instead. stripe_row_limit: <integer> - the maximum number of rows per stripe for newly-inserted data. Existing stripes of data will not be changed and may have more rows than this maximum value. The default value is 150000. chunk_group_row_limit: <integer> - the maximum number of rows per chunk for newly-inserted data. Existing chunks of data will not be changed and may have more rows than this maximum value. The default value is 10000."

read more: https://github.com/citusdata/citus/tree/master/src/backend/c...



The columnar implementations are cool and all, but they don't really do what you'd expect columnar storage to do—i.e. make each column independently retrievable / make each column effectively its own "single data column plus shared CTID" table for the dataset. Instead, when you fetch one column of a row from e.g. a Citus "USING columnar" table, that still bakes down to the query engine fetching the entire row-tuple for that row (and then discarding most of it), which in turn means that the columnar storage engine must read in + decompress the equivalent compressed blocks for every column participating in that row.

So, unlike in a columnar database, you still have to worry about splitting up your narrow+hot "fact" columns from your wide+cold "dimension" columns, and joining them back together later only when you need them, in order to not be polluting your page cache with the wide+cold column data.


Maybe I'm misunderstanding the feature that you would like to have, but I think Citus columnar supports what you want. Citus columnar storage has projection pushdown, so it only fetches the columns that you requested in your query.

To quote from the blogpost[1]:

What are the Benefits of Columnar?

- Compression reduces storage requirements - Compression reduces the IO needed to scan the table - Projection Pushdown means that queries can skip over the columns that they don’t need, further reducing IO - Chunk Group Filtering allows queries to skip over Chunk Groups of data if the metadata indicates that none of the data in the chunk group will match the predicate. In other words, for certain kinds of queries and data sets, it can skip past a lot of the data quickly, without even decompressing it!

Disclaimer: I work on the Citus team at Microsoft, but have not worked on the columnar storage.

[1]: https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: