Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Full-Text Search with MongoDB — Flowdock Style (nodeta.com)
13 points by khangtoh on Sept 18, 2012 | hide | past | favorite | 5 comments


It's a nice hack, but I'd rather just use Elasticsearch and try out the MongoDB river (https://github.com/richardwilly98/elasticsearch-river-mongod...). There is a reason why Lucene exists. You'll gain insertion speed, use less RAM for your active working set with MongoDB, and gain TONS of features.


It sure seems like by paying a little more technical due diligence up front and finding a proper search library [1] that could fit in their stack, they could have saved a lot of operational burden here.

Growing their main database indexes for this one feature seems like a loser.

1. Whoosh, Xapian, Lucene, etc?


From what I can glance from the post, this is a "poor man's" full-text search. It can search by keywords, but it does not seem to do stemming, word similarity, document similarity and all the nice things you love about Lucene and others. As far as the post explains, it also does not take term counts into account (which seems okay, as they don't have large documents). Also not sure how well it handles things like german umlauts (does "über" match "ueber"?).


There are libraries to handle stemming and Unicode equivalence which are easy to add into this kind of boolean search. If ranking documents would definitely mean that some other approach, e.g. vector space model, should be used.

https://github.com/aurelian/ruby-stemmer http://unicode-utils.rubyforge.org


Apparently HN is injesting a feed of:

select url from all_worldwide_blog_posts where content like '%MongoDB%'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: