> When will PostgreSQL support column-oriented tables? Citus or Timescale extens...

derefr · on Nov 8, 2021

The columnar implementations are cool and all, but they don't really do what you'd expect columnar storage to do—i.e. make each column independently retrievable / make each column effectively its own "single data column plus shared CTID" table for the dataset. Instead, when you fetch one column of a row from e.g. a Citus "USING columnar" table, that still bakes down to the query engine fetching the entire row-tuple for that row (and then discarding most of it), which in turn means that the columnar storage engine must read in + decompress the equivalent compressed blocks for every column participating in that row.

So, unlike in a columnar database, you still have to worry about splitting up your narrow+hot "fact" columns from your wide+cold "dimension" columns, and joining them back together later only when you need them, in order to not be polluting your page cache with the wide+cold column data.

JelteF · on Nov 8, 2021

Maybe I'm misunderstanding the feature that you would like to have, but I think Citus columnar supports what you want. Citus columnar storage has projection pushdown, so it only fetches the columns that you requested in your query.

To quote from the blogpost[1]:

What are the Benefits of Columnar?

- Compression reduces storage requirements - Compression reduces the IO needed to scan the table - Projection Pushdown means that queries can skip over the columns that they don’t need, further reducing IO - Chunk Group Filtering allows queries to skip over Chunk Groups of data if the metadata indicates that none of the data in the chunk group will match the predicate. In other words, for certain kinds of queries and data sets, it can skip past a lot of the data quickly, without even decompressing it!

Disclaimer: I work on the Citus team at Microsoft, but have not worked on the columnar storage.

[1]: https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...