DEV Community

THE ROAD TO AWS RE:INVENT 2018 – WEEKLY PREDICTIONS, PART 2: DATA 2.0

Dan Greene on November 15, 2018

Originally published here. Last week I made the easy prediction that at re:Invent, AWS would announce more so-called ‘serverless’ capabilities. It...
Collapse
 
ferricoxide profile image
Thomas H Jones II

One of my pet peeves on the S3 lifecycle management is that moving from Standard to Infrequent Access storage class has nothing to do with the frequency of accessing the file. While I would imagine that the underlying capabilities of an object store makes it very difficult to actually do this, it would provide a much-needed metric to make storage decisions.

S3's lifecycle management generally leaves much to be desired. I mean, it's great that you could have a multi-stage lifecycle for data. But, the fact that your only choice for sub-30day policies is to just straight to Glacier is kind of dreadful. S3 is potentially great as a repository for nearline/offline storage (i.e., backups) ...but it currently lacks the useful lifecycle capabilities you get used to in legacy products like NetBackup. And, even aside from the whole loss of POSIX attributes if you want to simply sync a filesystem to disk, performance of such is dreadful due to the whole common-key issue. Both the POSIX attributes an common-key problems are solveable, but it's painful to sort the programmatic logic out.

Overall, it has the feel of "you guys have been pestering us, here's something to shut you up for a while", but not really a fully-realized HSM.

Maybe what AWS will introduce is an actual HSM-style interface to S3 or a service-overlay?

Collapse
 
ferricoxide profile image
Thomas H Jones II

Also, I would hope that they're opting to flesh-out the EFS offering. Things like:

  • More/better pre-selected performance-tiers. Would be great to have a shared filesystem that was useful for busy applications that didn't have large data-size requirements:
    • The default performance tier has decent latency but throughput is dependent on how much you're storing. Sucks to have to store more data ‐ especially dummy-data — just to get better base-performance.
    • The "Max I/O Performance Mode" is better for throughput, but the penalty is increased latency.
  • An actual, built-in backup capability. Yeah, EFS itself is durable, but it doesn't really offer "oops" protection. EFS is currently like relying on RAID as your only data-protection method. While you can jury-rig backups, doing so will blow-out your daily I/O credits.
  • An actual, built-in region-to-region replication capability. While EFS is great in (a supported) region, if a region manages to get knocked off the air (or you otherwise need to do an off-region migration of your services), your EFS-hosted data is offline or otherwise not easily available. While you can jury-rig region-to-region replication, as with backups, doing so will blow-out your daily I/O credits.
  • Windows/CIFS interface would be great. Lack of CIFS support limits the ability to use EFS in Windows-based clustered deployments. I'd assume they'd be working towards this to enhance their WorkSpaces service, any way.