Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where does the index come from?


Indexing can be done on the fly, when adding or modifying entries.


Only if your data is stored in such a way that you can monitor adds and modifications. MapReduce is often used on huge bags of data pulled from random places or as one pass in a massive sequence of passes (think pagerank), where any index will be destroyed by each pass.

MapReduce isn't a database. It avoids mutable state such as an index. It's more like a command line tool such as grep.

You could also use MapReduce to build your index.

Finally, it is a simple example. Like how people use fib to demo parallelism.


Thanks for your answer. My point was exactly what you're implying: wouldn't it be cheaper to insert the data into an ad hoc "database" instead of using Hadoop?


What kind of database are you going to insert into if you have 1000 TB of data? Let alone an open source one. What kind of database is going to allow you to set everything up and strip it all down just for one of a million passes of your data? Do you have a simple database that you can distribute over thousands of nodes?

Searching isn't really a MapReduce problem anyway - think, what is the map? What is the reduce? They're not really any kind of computation are they?

If you want to understand why MapReduce, find a better motivating example than search. PageRank is the classic.

Have you read the paper? If not, that would be the best start.

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of Operating Systems Design and Implementation, 2004.


It seems like a pretty straightforward MapReduce, though I agree that if you're doing searches repeatedly or your data is small enough you should use a database.

(map = search a partition and return top k results, reduce = combine multiple n*k result lists into a single result list)


That's what I was telling you. A lot of people use MapReduce for simple search tasks, hence my remark.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: