Just that deeply inside that forest of functions you just wrote is the inner joi...

calvinmorrison · on Feb 5, 2024

more like, when it comes to complex data structures and logic, i will do that outside of sql. I'll do a join with sql no problem, by the time we're doing multiple inner joins I usually prefer to just do multiple sql queries. I don't care about performance that badly.

ako · on Feb 5, 2024

That’ll often not scale to millions of records. Letting the database optimizer find the optimal execution path instead of doing it procedurally elsewhere might result in “finishes in 5 minutes”, versus “doesn’t fit in a night”.

threeseed · on Feb 5, 2024

This isn’t the 90s. Most hardware is way over-specced for the data sizes most people are dealing with.

The number of use cases which are too heavy to finish in hours but small enough to fit in a single instance is pretty limited.

ako · on Feb 6, 2024

Costs are another reason to optimize queries, long running, inefficient queries will be a lot more expensive on things like snowflake than more efficient queries.

fifilura · on Feb 5, 2024

SQL is popular because it can be run on a map/reduce backend. So once you have written your code it can run on any number of machines.

threeseed · on Feb 5, 2024

a) SQL is not that popular on map/reduce backends. Most people are doing it in code.

b) Only basic SQL works on any database and even then there are major differences in how they treat things like nulls, type coercion etc.

fifilura · on Feb 6, 2024

BigQuery? Athena/Redshift?

fifilura · on Feb 5, 2024

I usually only do one join at a time. But I separate them with CTEs ("WITH"). I can agree that many joins at once can make you grow grey hair.