If I had a dollar for every senior engineer I've worked with who has never heard...

xxs · on Jan 24, 2021

Truth be told I am yet to see a reason to use in-memory database. Datstructures, maps/trees/set - yes. Concurrent/lock free/skip lists/whatever - all great. I don't need a relational database when I can use objects/structs/etc.

Falkon1313 · on Jan 24, 2021

I think that depends on what you're doing with the data. If you're just grabbing one thing and working with it, or looping through and processing everything, maybe not.

But if you're doing more complicated query-like stuff, especially if you want to allow for queries you haven't thought of yet, then the DB might be useful.

Sometimes a hybrid of query-able metadata in a DB along with plain old data files is good.

That depends very much on your data, how much things key to each other, and what you're doing with it.

xxs · on Jan 24, 2021

>doing more complicated query-like stuff

That's some kind of fallacy - standard datastructures would totally destroy any-sql-alike thing, if it comes to performance (and memory footprint). I guess it does depend on where the background comes when it comes to convenience - or how people tend to see their data. However like I said - for close to 3 decades I have not seen a single reason to do so. On the contrary I've had cases where optimization of 3 orders of magnitude was possible.

jawns · on Jan 24, 2021

It's easier to find devs who know basic SQL than it is to find devs who know pandas or whatever your language specific SQL-like library is. And the more complicated the queries, the more the gulf widens.

recursive · on Jan 24, 2021

I don't think pandas was the proposal here. I think "standard data structures" refers to arrays, hash tables, trees, and the like.

busterarm · on Jan 24, 2021

Performance is not god. It is not the altar at which we sacrifice all other considerations.

throwaway08320 · on Jan 24, 2021

> for close to 3 decades I have not seen a single reason to do so

Evidently you don't have dataset far exceeding the amount of RAM you can afford.

For a good example look at LMDB.

busterarm · on Jan 24, 2021

ACID transactions, validations & constraints, and the ability to debug/log by dumping your data to disk which can then easily be queried with SQL.

All of the same reasons you would store relational data in a dbms...

xxs · on Jan 24, 2021

>ACID transactions, validations & constraints

There is no D from the ACID. For the D to happen, it takes transaction logs + write barrier (on the non-volatile memory). Doing Atomic, consistent and isolated is trivial in memory (esp. in GC setup), and a lot faster: no locks needed.

Validations and constraints are simple if-statements, I'd never think of them as sql.

piggubiggu · on Jan 24, 2021

It sounds like you're talking about toy databases which don't run at a lot of TPS. Let me point out some features missing from your simple load a map in memory architecture.

You also have to do backup and recovery. And for that, you need to write to disk, which becomes a big bottleneck since besides backup and checkpointing there is no other reason to ever write to disk.

Then, you have to know that even in mem database, data needs to be queried, and for that you need special data structures like a cache aware B+tree. Implementing one is non trivial.

Thirdly, doing atomic, consistent and isolated transaction is certainly trivial in a toy example but in an actual database where you have a high number of transactions, it's a lot harder. For example, when you have multiple cores, you certainly will have resource contention, and then you do need locks.

And last thing about gc, again, gc is great, but there has to be a custom gc for a database. You need to make sure the transaction log in memory is flushed before committing. And malloc is also very slow.

I'd suggest reading more into in mem research to understand this better. But in mem db is certainly not the same as a disk db with cache or a simple Hashmat/B+tree structure.

sobani · on Jan 25, 2021

> And malloc is also very slow.

Isn't one of the advantages of a GC environment that malloc is basically free? Afaik the implementation of malloc_in_gc comes down to

    result_address = first_free_address;
    first_free_address += requested_bytes;
    return result_address;

It's the actual garbage collection that might be expensive, but since that process deals with the fragmentation, there is no need to keep a data structure with available blocks of memory around.

That's also the reason why, depending on the patterns of memory usage, a GC can be faster than malloc+free.

xxs · on Jan 24, 2021

>It sounds like you're talking about toy databases which don't run at a lot of TPS.

The original talk was explicitly about SqlLite and in-memory databases, no idea where you got the rest of.

piggubiggu · on Jan 24, 2021

Correct. So we're talking about in memory databases like MongoDb, and all of the things I listed here are true about MongoDb. For example, MongoDb migrated their database memory manager away from mmap and towards a custom memory manager (point being that gc and memory management for databases is not something you can just use jvm or operating system constructs for)

https://docs.rocket.chat/installation/docker-containers/mong...

I'm happy to justify every single point I made with research papers.

Lastly I know I came off as a bit condescending. Just having a bad day, nothing personal. But you should read more about in mem dbs.

busterarm · on Jan 24, 2021

You _can_ have forms of durability if you wish to. You can get "good enough" (actually fairly impressive...) performance for most problems (vs only in-memory) with SQLite making memory the temp store, turning on synchronous and WAL. Then fsync only gets called at checkpoints and you have durability at the checkpoint.