>ACID transactions, validations & constraints There is no D from the ACID. For t...

piggubiggu · on Jan 24, 2021

It sounds like you're talking about toy databases which don't run at a lot of TPS. Let me point out some features missing from your simple load a map in memory architecture.

You also have to do backup and recovery. And for that, you need to write to disk, which becomes a big bottleneck since besides backup and checkpointing there is no other reason to ever write to disk.

Then, you have to know that even in mem database, data needs to be queried, and for that you need special data structures like a cache aware B+tree. Implementing one is non trivial.

Thirdly, doing atomic, consistent and isolated transaction is certainly trivial in a toy example but in an actual database where you have a high number of transactions, it's a lot harder. For example, when you have multiple cores, you certainly will have resource contention, and then you do need locks.

And last thing about gc, again, gc is great, but there has to be a custom gc for a database. You need to make sure the transaction log in memory is flushed before committing. And malloc is also very slow.

I'd suggest reading more into in mem research to understand this better. But in mem db is certainly not the same as a disk db with cache or a simple Hashmat/B+tree structure.

sobani · on Jan 25, 2021

> And malloc is also very slow.

Isn't one of the advantages of a GC environment that malloc is basically free? Afaik the implementation of malloc_in_gc comes down to

    result_address = first_free_address;
    first_free_address += requested_bytes;
    return result_address;

It's the actual garbage collection that might be expensive, but since that process deals with the fragmentation, there is no need to keep a data structure with available blocks of memory around.

That's also the reason why, depending on the patterns of memory usage, a GC can be faster than malloc+free.

xxs · on Jan 24, 2021

>It sounds like you're talking about toy databases which don't run at a lot of TPS.

The original talk was explicitly about SqlLite and in-memory databases, no idea where you got the rest of.

piggubiggu · on Jan 24, 2021

Correct. So we're talking about in memory databases like MongoDb, and all of the things I listed here are true about MongoDb. For example, MongoDb migrated their database memory manager away from mmap and towards a custom memory manager (point being that gc and memory management for databases is not something you can just use jvm or operating system constructs for)

https://docs.rocket.chat/installation/docker-containers/mong...

I'm happy to justify every single point I made with research papers.

Lastly I know I came off as a bit condescending. Just having a bad day, nothing personal. But you should read more about in mem dbs.

busterarm · on Jan 24, 2021

You _can_ have forms of durability if you wish to. You can get "good enough" (actually fairly impressive...) performance for most problems (vs only in-memory) with SQLite making memory the temp store, turning on synchronous and WAL. Then fsync only gets called at checkpoints and you have durability at the checkpoint.