But presuming this isn't data with low-latency access requirements (since tape is pretty useless for that, so we wouldn't be making the comparison), what's the inflection point where it becomes worth the CapEx to justify even having your own "nearline" + archival storage cluster at all, vs. just using Somebody Else's Computer, i.e. an object-storage or backup service provider?
To me, 1PB is also where I'd draw that line. Which I would interpret as it never really being worth it to go to local drives for these storage modalities: you start on cloud storage, then move to local tapes once you're big enough.
(Heck, AFAIK the origin storage for Netflix is still S3. Possibly not because it's the lowest-OpEx option, though, but rather because their video rendering pipeline is itself on AWS, so that's just where the data naturally ends up at the end of that pipeline — and it'd cost more to ship it all elsewhere than to just serve it from where it is. They do have their self-hosted CDN cache nodes to reduce those serving costs, though.)
On the other hand, with either tape or hard drives, you can leave it on a shelf for 10 years and the data has a decent chance of still being intact. Proper procedure would dictate more frequent maintenance, but if for whatever reason it gets neglected, there's graceful degradation. With AWS, if you don't pay your bills for a few months, your data goes poof. Other companies might have more friendly policies, but they also might go out of business in that span of time.
I think someone else mentioned in this very comments section that hard drives "rot" while spun down — not the platters, but the grease in their spindle motor bearings or something gets un-good, so that when you go to use them again, they die the first time you plug them in. So you don't want to use offlined HDDs for archival storage.
(Offlined SSDs would probably be fine, if those ever became competitively affordable per GB. And https://en.wikipedia.org/wiki/Disk_pack s would work, too, given that they're just the [stable] platters, not the [unstable] mechanism; they would work, if anyone still made these, and if you could still get [or repair] a mechanism to feed them into, come restore time. For archival purposes, these were basically outmoded by LTO tape, as for those, "the mechanism" is at least standardized and you can likely find a working one to read your old tape decades later.)
Even LTO tape is kind of scary to "leave on a shelf" for decades, though, if that shelf isn't itself in some kind of lead-lined bunker, given that stray EM can gradually demagnetize it. If you're keeping your tapes in an attic — or in a basement in an area with radon — then you'd better have encoded the files on there as a parity set!
I think, right now, the long-term archival media of choice is optical, e.g. https://www.verbatim.com/subcat/optical-media/m-disc/. All you need to really guarantee that that'll survive 50 years, is a cool, dry warehouse that won't ever get flooded or burnt down or bombed [or go out of business!] — something like https://www.deepstore.com/.
But if you're dealing with personal data rather than giant gobs of commercial data, and you really want your photo album to survive the next 50 years, then honestly the only cost-efficient archival strategy right now is to keep it onlined, e.g. on a NAS running in RAID5. That way, as disks in the system inevitably begin to die or suffer readback checksum failures, monitoring in the system can alert you of that, and you can reactively replace the "rotting" parts of the physical substrate, while the digital data itself remains intact. (Companies with LTO tape libraries do the same by having a couple redundant copies of each tape; having their system periodically online tapes to checksum them; and if any fail, they erase and overwrite the bad-checksum tape from a good-checksum copy of the same data — as the tape itself hasn't gone bad, just the data on it has.)
Paying an object-storage or backup service provider, is just paying someone else to do that same active bitrot-preventative maintenance for you, that you'd otherwise be doing yourself. (And they have the scale to take advantage of shifting canonical-fallback storage to being optical-disk-in-a-cave-somewhere format — which reduces their long-term "coldline" storage costs.)
Instead, you're just left with the need to do the much rarer "active maintenance" of moving between object-storage providers as they "bit-rot" — i.e. go out of business. As there are programs that auto-sync between cloud storage providers, this is IMHO a lot less work. Especially if you're redundantly archiving to multiple services to begin with; then there's no rush to get things copied over when any one service announces its shutdown.
That’s a really good point. For us (near that 150-300TB inflection point for archival storage), it made more sense to put the data on S3 glacier. First off, the data is originally transferred through S3, but mainly, glacier hits the same archival requirements as tape, at a pretty compelling cost.
To me, 1PB is also where I'd draw that line. Which I would interpret as it never really being worth it to go to local drives for these storage modalities: you start on cloud storage, then move to local tapes once you're big enough.
(Heck, AFAIK the origin storage for Netflix is still S3. Possibly not because it's the lowest-OpEx option, though, but rather because their video rendering pipeline is itself on AWS, so that's just where the data naturally ends up at the end of that pipeline — and it'd cost more to ship it all elsewhere than to just serve it from where it is. They do have their self-hosted CDN cache nodes to reduce those serving costs, though.)