RethinkDB vs. MongoDB? They seem comparable. What is the big picture difference?

orthecreedence · on June 13, 2013

Although I haven't used Rethink in large-scale production yet, I have to say I like it about 10x better than Mongo. The query language alone beats mongo, but the server maintenance/clustering side of things is so much more well thought out.

Rethink has this air of being rock-solid about it too. The team worked really hard on making the core great at storing data, then built features around that. Mongo, after using it for many years both in production and personal projects, seems like it's all slapped together. The early-on technical decisions they made to get it out the door are still biting them in the ass to this day.

That's not to say there aren't some things Mongo is probably better at (especially considering how early-stage Rethink is) but overall I prefer Rethink hands down.

coffeemug · on June 13, 2013

Hey, we'd actually really appreciate it if you could take a few minutes and share your thoughts on what Mongo is better at. Beating Mongo in everything isn't our explicit goal (we're forging our own path), but if there are things we can fix, we'd love to hear about them.

orthecreedence · on June 13, 2013

Now that you ask, it's hard to say. Mongo seems to have been built for "fire and forget" quick writes, but due to the global write lock, it fails at this considerably once you start sending it any real data. The answer, which they spend every waking breath promoting, is to shard...but sharding/rebalancing/etc is far from automatic and the one time I really had to do it in a pinch, I ended up doing manual sharding on top of Mongo because it couldn't figure out how to correctly distribute the data. This was about two years ago though, so things may have changed with the clustering. On top of this, the only real way to get any performance is to have the entire dataset fit into memory. Once again, this may be an older restriction.

Reading over your guys' documentation and history, it seems that you have succeeded in every way Mongo fails: MVCC for high-write, homogeneous clustering (no config servers or route services), actual auto-sharding that doesn't make you want to gauge your eyes out, joins, a useful management interface baked in to the DB...those are the main ones I can think of.

I'm currently working with a lot of crypto data, so a binary storage format would be really nice in Rethink, I guess Mongo wins there (but I'm still using Rethink and just Base64-encoding everything).

There are one or two things I really do like about Mongo. For instance, the find-and-modify command on top of the built-in data structures makes it dead simple to build something like a queuing system. It can't actually handle the writes a real queuing system needs, but for lower-write stuff it's useful. For instance of you wanted to have a set of servers run shared cron jobs...they could all "check in" and see if another server is already running that job atomically (and if not grab the job in the same operation that checks it, ensuring that no two servers are running the same job at the same time). I'm not sure if Rethink supports anything like this at the moment (please correct me if I'm wrong).

Mongo also has GridFS (admittedly, I've never used it), but if I was going to do some sort of clustered filesystem, I'd use Riak, definitely not Mongo. Also, I believe Mongo now has full-text search, but once again, I've given up on primary databases having real/useful full-text search and would much rather use ElasticSearch to supplement the primary DB.

Once Rethink has a query optimizer (I know you are in the process of figuring this out and to what extent it will work), I don't think Mongo will have anything on Rethink. I'm certainly not ever going to choose Mongo over Rethink for another project. I may choose Postgres/Mysql for strictly relational data, but Mongo is pretty much dead to me at this point.

I'd rather suffer the consequences and growing pains of a newer DB that wins at just about everything it does than use something slightly more stable that has burned me a number of times.

Plus, I like your team better. I know none of the Mongo devs, but I've talked to many members of the Rethink team on many occasions and you've all been really helpful, supportive, and quick to fix any issues. I've said this before, but you guys aren't running around screaming about how great your DB is and how it will solve ALL OF YOUR PROBLEMS, get you a promotion, improve your sex life, etc etc. I feel like a lot of the reasons I've been burned by Mongo was because 10gen just wasn't straight with me or the teams I worked with when using Mongo. They'd much rather sell a support contract than actually see you succeed, it felt like.

Sorry for the book, TL;DR:

- Atomic "find and modify" is very useful in Mongo. Rethink may already support it via the QL.

- Binary storage type would be nice.

- Query optimizer (I know you're in the process of figuring this out).

- Packaged linux/mac Rethink binaries so people can download the latest version without a recompile (I do like this about Mongo)

I think that's about it. There may be many technical things I'm missing out of sheer ignorance of the internal workings of Rethink and Mongo. Maybe there are things Mongo is great at but Rethink isn't, but it would be news to me.

coffeemug · on June 13, 2013

This is a great writeup and helps tremendously, thank you! Now, wrt specifics.

> the find-and-modify command

This has been baked into ReQL from day one and doesn't require a special command. Here's an example:

  # 'jobs' table contains documents of the form
  # { type: 'type', jobs: ['job1', 'job2'] }
  r.table('jobs').filter({type: 'printer'}).update(function(row) {
    return r.branch(
      row('jobs').contains('job1'),             // if there is job1 in the array
      {jobs: row('jobs').difference(['job1'])}, // atomically remove job1
      null                                      // else don't modify the object
    );
  })

This isn't limited to arrays -- you can do all sorts of atomic find and modify this way within a single document. We clearly need to do a better job documenting this. I'll make sure that happens.

> Packaged linux/mac Rethink binaries

That's been available for a while -- http://rethinkdb.com/docs/install/, you can download a binary pkg for OS X if you don't want to use brew, and we support apt-get on ubuntu/debian via a private PPA. We'll be adding binary packages for more distros soon.

> binary storage format

Mongo wins on this one, but it's definitely on the horizon -- https://github.com/rethinkdb/rethinkdb/issues/137. The storage engine bits for this have been done for a long time, we just have to figure out a sane ReQL API (which is harder than it seems).

> Query optimizer

This is a somewhat longer-term project (i.e. we probably won't get to it by mid-fall), but we're definitely thinking about it. So far though, there hasn't been a real use case yet where it's a problem.

JulianMorrison · on June 15, 2013

ReQL for binary suggestion: take binary input in the form of a no-args generator function (called repeatedly, returns a series of binary blobs and then null as EOF) or where the language permits, allow a language specific abstraction of a read-once stream such as an IO in Ruby as a higher level wrapper for that. Return binary data in the same form.

coffeemug · on June 18, 2013

Thanks, this is cool. I posted this suggestion as a comment on the issue (https://github.com/rethinkdb/rethinkdb/issues/137).

coffeemug · on June 13, 2013

Take a look at these two writeups:

* A biased/big picture one -- http://rethinkdb.com/blog/mongodb-biased-comparison/

* An unbiased/technical one -- http://rethinkdb.com/docs/comparisons/mongodb/

andrewflnr · on June 14, 2013

The biased comparison still says secondary indexes are in development. Might want to update that.

mglukhovsky · on June 14, 2013

Thanks for noticing this, we'll update it shortly.

albiabia · on June 13, 2013

Perfect. Thank you.