Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Indeed, reduce is the difficult part. OTOH, I think this limitation is seen in many algorithms at a fairly fundamental level, and not just an artefact of MR. The only alternative framework I can think of for dealing with really large datasets in a distributed manner is sampling-based methods, with one-pass algorithms (or mostly one pass algorithm).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: