Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my opinion, CERN is one of the few groups out there that can call their data set "big data" without talking out of their ass. (If it can fit in RAM, it ain't Big Data. Multiple TBs fit in RAM) Their detectors produce something on the order of a petabyte per second which is then pared back immensely to become something that's actually storable. Most of the machine learning I've heard of involving CERN is in reducing that data stream and then highlighting "interesting" things for researchers to take a look at.


Although LHC experiments are toward the top of the heap in terms of data rates, many particle and astronomical experiments produce "big data". One quarter of the DUNE experiment will acquire about 50 EB/year (that's exabyte), outputting about 10 PB/year to tape. LSST will produce data in the few PB/year range.


Not to belittle your examples, but in a historical context the LHC held a specific position. Remember that the ATLAS/CMS/ALICE/LHCb experiments started recording data at 10 GB/s back in 2008. Now, ten years later it is only natural that large data rates are becoming the norm.


> If it can fit in RAM, it ain't Big Data. Multiple TBs fit in RAM

To be fair, the price for that RAM starts to steepen at (if not before) the 4TB mark, and 1.5TB might have been the limit on frugal main memory as recently as a year and a half ago.

OTOH, SSDs are very fast even at low cost, so I'd argue if it can fit in directly-attached storage whose aggregate bandwidth compares with the RAM's bandwidth, it ain't Big Data, either. Though even a single PB might be big enough, if the CPUs are too slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: