Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even leaving aside the cost of storing an index, if you're only going to use that dataset for a small number of queries then it's cheaper to just perform them directly rather than construct an index first. As the paper points out at the end, Hadoop is fundamentally batchy; it works best when you have discrete chunks of data (e.g. weekly) and you want to reduce each one into some summarized form, then never use the raw data again.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: