Even leaving aside the cost of storing an index, if you're only going to use tha...

Even leaving aside the cost of storing an index, if you're only going to use that dataset for a small number of queries then it's cheaper to just perform them directly rather than construct an index first. As the paper points out at the end, Hadoop is fundamentally batchy; it works best when you have discrete chunks of data (e.g. weekly) and you want to reduce each one into some summarized form, then never use the raw data again.