Hacker Newsnew | past | comments | ask | show | jobs | submit | jhd3's commentslogin

[The Baked Data architectural pattern](https://simonwillison.net/2021/Jul/28/baked-data/)


Emery Berger's thoughts on the topic

How to Have Real-World Impact: Five “Easy” Pieces - https://emeryberger.medium.com/how-to-have-real-world-impact...


And there is also the sequel by Michael Stonebraker and Andrew Pavlo - https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec20...


… also commented here on HN: https://news.ycombinator.com/item?id=40846883


Yeah, the sequel is a great read as well!


This plays really well with datasette:

grab https://github.com/turbot/steampipe-plugin-hackernews/releas...

  datasette --load-extension steampipe_sqlite_hackernews.so 
and then in your browser:

  select
    *, 'https://news.ycombinator.com/item?id=' || id as link
  from
    hackernews_item
  where
    id = 38713046;


> Databases are much more powerful than we think

and data has mass. One example of bringing the work to the data is https://madlib.apache.org/ (works on Postgres and Greenplum)

[Disclaimer - former employee of Pivotal]



I was wondering if anyone had thought about using this to experiment with the planner.

The engineering and support teams at Greenplum, a fork of Postgres, have a tool (minirepro[0]) which, given a sql query, can grab a minimal set of DDLs and the associated statistics for the tables involved in the query that can then be loaded into a "local" GPDB instance. Having the DDL and the statistics meant the team was able to debug issues in the optimizer (example [1]), without having access to a full set of data. This approach, if my understanding is correct, could be enabled in the browser with this Postgres WASM capability.

[0] https://github.com/greenplum-db/gpdb/blob/6X_STABLE/gpMgmt/b...

[1] https://github.com/greenplum-db/gpdb/issues/5740#issuecommen... (has an example output)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: