Full-Text Search with MongoDB — Flowdock Style

rb2k_ · on Sept 18, 2012

It's a nice hack, but I'd rather just use Elasticsearch and try out the MongoDB river (https://github.com/richardwilly98/elasticsearch-river-mongod...). There is a reason why Lucene exists. You'll gain insertion speed, use less RAM for your active working set with MongoDB, and gain TONS of features.

mattbillenstein · on Sept 18, 2012

It sure seems like by paying a little more technical due diligence up front and finding a proper search library [1] that could fit in their stack, they could have saved a lot of operational burden here.

Growing their main database indexes for this one feature seems like a loser.

1. Whoosh, Xapian, Lucene, etc?

Argorak · on Sept 18, 2012

From what I can glance from the post, this is a "poor man's" full-text search. It can search by keywords, but it does not seem to do stemming, word similarity, document similarity and all the nice things you love about Lucene and others. As far as the post explains, it also does not take term counts into account (which seems okay, as they don't have large documents). Also not sure how well it handles things like german umlauts (does "über" match "ueber"?).

lautis · on Sept 18, 2012

There are libraries to handle stemming and Unicode equivalence which are easy to add into this kind of boolean search. If ranking documents would definitely mean that some other approach, e.g. vector space model, should be used.

https://github.com/aurelian/ruby-stemmer http://unicode-utils.rubyforge.org

tedchs · on Sept 18, 2012

Apparently HN is injesting a feed of:

select url from all_worldwide_blog_posts where content like '%MongoDB%'