my personal favorite is BashReduce (~120 lines shell script vs ~600k lines of java code in hadoop): http://blog.last.fm/2009/04/06/mapreduce-bash-script
If you're in bioinformatics you might be interested in this talk on handling ridiculous amounts of data (PyCon 2011): http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-han...
my personal favorite is BashReduce (~120 lines shell script vs ~600k lines of java code in hadoop): http://blog.last.fm/2009/04/06/mapreduce-bash-script
If you're in bioinformatics you might be interested in this talk on handling ridiculous amounts of data (PyCon 2011): http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-han...