The author quickly dismisses hard drives because at the time of the Glacier laun...

batbomb · on April 25, 2014

This is probably close, at least for launch.

The problem with this theory, however, is that tape still would have been cheaper with roughly the same footprint, and tape has the benefit of often being forward-compatible too, as the drives improve, so does the storage capacity of the tape.

The author dismisses that amazon wasn't using tape, but I haven't seen much evidence to support that necessarily.

Maybe the reality is they probably use a bit of everything. A robotic disk library, a robotic tape library, maybe a robotic optical library. Maybe they're secretly ahead of the rest of current tape technology and are getting 20TB out of a single tape. They could be using custom hard drives, or even having a robotic platter library.

One thing for sure is that it's not active disk drives, and I don't believe they keep all the data on optical disks alone, given that optical disks degenerate much quicker than magnetic storage.

funkyy · on April 25, 2014

I dont understand how tape infrastructure would be cheaper than free used disks that are on only 4-5 times a day for mostly very short periods of time (actually some discs would be off for months). Disks are easy to use, they are not that easy to damage and you can have each copy of data synchronised on 2-3 different DCs easily using Amazon network during night times (low network usage).

wmf · on April 25, 2014

The server and network to host those disks ends up costing much more than the disks, so even MAID built from free disks isn't that cheap.

funkyy · on April 25, 2014

Wouldn't be possible to just stack them in custom/modified racks, connect them to the tape and custom switch? Then get very cheap servers to switch through them? The hard drives don't generate much heat anyways so only basic ventilation would be needed. Using enterprise servers for Glacier sounds like an expensive luxury. Most discs would be mostly offline anyways (small amount of changes) and I bet more than a half discs would be used less than once a week so cost of. Then if you want to increase speed of the whole structure put 1u server with 4 x 4TB SATA3 discs in Raid10 for caching the most used/changed parts, connect server to 1Gbps line and you are all done on budget. The cost is just 1 simple server, modification to racks (that would cost really small money), tapes for discs and custom hdd switch.

seanjensengrey · on April 25, 2014

Disks internally have lots of extra space to handle errors and are not pushed to the limits of their physical storage. Amazon could have partnered with a disk manufacturer or written their own firmware to have disks that are somewhat lossy but 3x of what reliable drives provide.

KaiserPro · on April 25, 2014

Its only now that HDDs are close enough to the density of tape (4Tbs) however lto6 is half the price, and designed for the job.

HDDs are not designed for long term archive. They simply fail hard when being moved or switched off for a long time.

acdha · on April 25, 2014

> HDDs are not designed for long term archive. They simply fail hard when being moved or switched off for a long time.

This is also surprisingly easy to do with tape - simply leaving an LTO tape on its side can push failure rates to significant levels within a year.

If you care about data there's no substitute for multiple copies which are regularly verified. The idea that you can leave something on the shelf and expect to reliably read it is a dangerous myth. If you care about archival, build a system with the staffing and procedures needed to make that happen. This is enormously easier and cheaper to do with spinning disk below a certain level but if you have enough data the lower media cost of tape will balance out the increased overhead.

garblegarble · on April 26, 2014

> simply leaving an LTO tape on its side can push failure rates to significant levels within a year.

Do you have a source for that? I'd not heard of that before, I'm now interested to know what % the failure rate increases by

acdha · on April 26, 2014

Thank TheCondor for posting something more official.

I'd never heard about that before, either (not having had tape storage as a primary job), but a colleague was quite surprised by 20-30% failure rates for tapes first used within a year and asked our drive vendor about it. At the time, there was nothing mentioned on the media we bought and the tape vendor didn't have any official docs but the drive technician we dealt with said that it was well known among the support group - apparently a lot of customers either didn't know or didn't assume it'd be so dramatic.

TheCondor · on April 26, 2014

http://www.clir.org/pubs/reports/pub54/5premature_degrade.ht...

They don't mention the rates though

garblegarble · on April 26, 2014

Thanks!

skrause · on April 25, 2014

> HDDs are not designed for long term archive. They simply fail hard when being moved or switched off for a long time.

A single disk certainly is bad for long term archive. But when you have thousands of disks and use multiple copies for all data with some fancy error correction you can still get extremely high reliability, you just need to calculate in the expected failure rate.

KaiserPro · on April 25, 2014

High failure == high cost.

craigyk · on April 25, 2014

Which is why I would build a HD robot that doesn't manipulate the disks, but the connector.

simonster · on April 25, 2014

You don't even need a robot. The electronics needed to digitally "switch" which disk is connected to the server would be extremely inexpensive. I'm not sure why the original article even considers the idea of a disk drive robot.

KaiserPro · on April 26, 2014

the connectors are only rated for <1000 insertions. Tape is where its at.

The best part about a robot is that you can take things out of a library and move them to a somewhere without power or epic cooling.

xyzzy123 · on April 26, 2014

The main problem I see with this is that if you have old disks the last thing you want to be doing is spinning them up and down all the time.

The drives and firmware you want for 24/7 use are not the same as what you want for intermittent use. On big old raid arrays, the scariest thing was powering them down because for sure, some of the disks would not come back again.

alexkiritz · on April 26, 2014

I was under the impressions that old disks generally fail by getting stuck. If they spin them up a few times a day that would seem to solve that problem.

They could even have them just plugged in to power and laying in racks with custom firmware that triggers drives to automatically spin up ever few hours/days. Then they could just be manually plugging them in to retrieve data when data from a drive is requested.

UntitledNo4 · on April 26, 2014

There's an interesting dead comment to this:

secretamznsqrl 7 hours ago | link [dead]

AWS does not re-use disks under any circumstance. It is strictly forbidden to protect customer data. Disks also do not leave the datacenter until they have been degaussed AND crushed.

GalacticDomin8r · on April 26, 2014

How is that interesting? Like when CNN speculates MA 370 was sucked in by a mini black hole or dolphins emit pings?

secretamznsqrl · on April 25, 2014

AWS does not re-use disks under any circumstance. It is strictly forbidden to protect customer data. Disks also do not leave the datacenter until they have been degaussed AND crushed.

jeffers_hanging · on April 25, 2014

Your personal theory is very correct.