DEV Community

That Cloud Expert
That Cloud Expert

Posted on

How can the AWS Well-Architected Framework improve your storage layer?

We don’t talk much about the storage aspects of the AWS Well-Architected Framework, but that’s a big oversight. A storage layer will have a direct effect on every pillar in the Well-Architected Framework, from reliability and security to cost and performance.

In this blog post we’ll take a look at the AWS Well-Architected Framework and see how the best practices it introduces at the storage layer can play a big part in AWS-based file shares and NAS migrations to AWS.

What is the AWS Well-Architected Framework?

The AWS Well-Architected Framework was designed by AWS to guide customers in building secure, high-performing, resilient, and efficient development and operations on AWS. It’s a set of best practices arranged in 6 pillars—Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability—which address key aspects of your AWS architecture.

Image description

Aligning with the framework can help you design an AWS architecture that runs more efficiently and aligns with industry-leading practices.

File Sharing and the AWS Well-Architected pillars

The AWS Well-Architected Framework provides best practices that address storage challenges that can arise when planning NAS migrations or designing AWS architectures to work with file shares.

There are some important storage aspects to consider in each pillar.

Operational Excellence and Performance Efficiency

The Operational Excellence and Performance Efficiency pillars are pivotal in the AWS Well-Architected Framework. These pillars help organizations develop and operate workloads efficiently by continuously driving process improvements in close connection with both system requirements and business value.

These pillars address intricate storage challenges faced by evolving cloud architectures. A great example are data-heavy applications, such as artificial intelligence/machine learning (AI/ML) experiments or data transformation and analytics. These workloads are often deployed as containerized microservices that demand persistent and highly available attached data volumes. These architectures and their storage layers require finding a delicate balance of durability and costs without compromising performance.

How we store and access data in a given workload is also key to performance. A practical example of this is leveraging native file sharing capabilities with a container orchestration service such as Amazon EKS or Amazon ECS instead of using ephemeral local hardware storage. In theory, such setups would be less performant due to higher network latency but more durable and resilient. In practice, we know that the performance impact is negligible and this architecture largely outweighs the disadvantages.

Another point: As user bases expand, typically storage requirements grow, leading to increased operational costs. So it’s important to maintain storage agility in step with the speed of application development—without losing sight of cost efficiency in the pay-as-you-go model. Managing these types of stateful workloads requires storage agility in both development and operations. Consider a scenario that requires hundreds of data copies, such as testing cycles, backup and disaster recovery, or migrations across environments.

Another key part of operational excellence is the principle of strong data isolation between tenants and deployment environments. This segmentation principle is there to safeguard both individual and organizational interests, and it’s one that demands a real understanding of business requirements and expectations. One example is the segmentation between different customers that need to take into account not only the customer organizations but also industries (different legal requirements) and geographical locations (e.g., EU or US).

Security and Reliability

The Security and Reliability pillars play a crucial role in mitigating risks, protecting data, and enabling business continuity. These practices enable a workload to deliver value, taking into account compliance requirements in a consistent manner, even in the face of unexpected events. Every enterprise file share and storage architecture needs to maintain secure and continuous data availability—that’s non-negotiable.

Great software architectures should be designed with storage solutions that are efficient but also resilient and secure. The best practices in the Security and Reliability pillars reinforce data persistence even in evolving workloads, safeguarding against potential disruptions that could compromise the integrity of stored information.

Data lifecycle management, combined with data protection is a paramount aspect to take into account. In practice, this means identifying and categorizing your data, with a special focus on the sensitive data, and making sure they’re stored accordingly.

Another good example is leveraging native functionalities to implement data encryption, both during transit and at rest. Also, you want to enforce encryption and other security measures through automation and security control policies, which will improve your overall security posture.

Consider how these design best practices have been applied in the cases of cloud guardrails and storage policies. Both of these can be applied to the entire organization (or a subset of accounts) and substantially improve the reliability and security posture of workloads.

As data footprints expand with the increased demand and usage, having a robust data infrastructure in place is a must-have. The ability to quickly recover from unexpected events such as cyberattacks or regional outages is fundamental. The most elemental data protection task one can accomplish is to set up regular backups and craft a disaster recovery strategy.

However, that only takes you so far. A modern data infrastructure should take into account functionalities that make it easier to implement and fulfill strict recovery point objectives (RPO) and recovery time objective (RTO) requirements to withstand failures in a secure, reliable and cost efficient manner.

Cost Optimization and Sustainability

The Cost Optimization and Sustainability pillars are instrumental in addressing two key aspects of operating on AWS:

  1. Responsibly using cloud resources.
  2. Building an IT culture of thriftiness and efficiency.

While one pillar focuses on best practices that help deliver business value at the lowest cost, the other offers guidance to minimize environmental impacts by using fit-for-purpose resources with efficient energy consumption.

When an AWS workload uses data intensively or grows at a fast pace, sustainability and cost optimization can be forgotten about. Efficient storage can manage operational costs effectively. The typical pay-as-you-go cloud cost model demands a delicate equilibrium between responsive storage solutions and cost-conscious practices.

The sustainability pillar encompasses the overall environmental impact of your entire solution infrastructure. Following the best practices this pillar recommends—such as selecting AWS Regions with smaller carbon footprints or storing and processing data geographically closer to end users—can help drive eco-friendly practices that can contribute to a greener IT approach.

You can align with AWS Well-Architected best practices using Amazon FSx for NetApp ONTAP

Amazon FSx for NetApp ONTAP is a fully managed AWS service built on NetApp® ONTAP® software that can serve as an indispensable component in aligning the storage best practices present in the AWS Well-Architected Framework.

FSx for ONTAP addresses the intricate storage challenges encountered by AWS customers, enabling those best practices to become a reality by offering a suite of advanced features:

- Multi-Availability Zone deployment aligns with the Reliability pillar

Using the Multi-Availability Zone (AZ) configuration option, FSx for ONTAP mirrors your application data across two nodes located in disparate AZs. If an AZ fails, an automatic and seamless failover takes place, with the node in the functional AZ taking on the workload. Once the impacted AZ recovers, a non-disruptive failback to normal dual-node takes place.

This level of resilience mitigates risk and allows you to design with RPO of zero and RTO of under 60 seconds.

- NetApp Snapshot™ and cloning technologies provide benefits for all pillars

With FSx for ONTAP, point-in-time volume copies are created at lightning speed, using only pointers to the dataset at a specific time. This is great from both a performance perspective and from a cost optimization and sustainability standpoint since the actual data usage is kept at a minimum.

Similarly, the FlexClone® technology creates thin-clone data copies. These clone copies leverage the same pointers that Snapshot copies use, so they only consume storage capacity for changes made to the cloned copy, instead of consuming storage for an entire copy of the dataset.

These technologies are game changers, enabling better business outcomes while simultaneously lowering storage footprint and driving efficiencies.

- Cross-region replication bolsters the Reliability and Security pillars
Cross-region replication, powered by NetApp SnapMirror® data replication technology, enhances backup and disaster recovery capabilities. It enables incremental data replication between regions, achieving an impressive RPO of less than 5 minutes, and RTO of less than 10 minutes.

This makes rapid recovery possible in a consistent manner even in the face of unexpected events such as accidental deletion due to human error or regional outages, providing a very practical way to address the recommendations in the Reliability and Security, as well as the Performance and Cost Optimization pillars.

- Features that address the Security and Operational Excellence pillars

Security and compliance are bolstered through features like Write-Once, Read-Many (WORM) storage using NetApp SnapLock®, protecting against ransomware attacks. Additional security measures, including Vscan and NetApp FPolicy, coupled with encryption at rest and in transit, fortify the overall data security for workloads and applications with stringent compliance requirements.

These advanced features mitigate storage deployment and management risks, making it easier to implement the recommendations from the Security and Operational Excellence pillars.

- Automation and efficiency features address Operational Excellence, Sustainability, and Cost Optimization pillars

With FSx for ONTAP, continuous cost optimization is achieved through thin provisioning, storage efficiency features including data compression, deduplication, and compaction, automated data tiering, and thin cloning.

Multi-protocol data access in file sharing also plays a part in reducing costs in that it allows your data to be accessed no matter which file protocol your applications are using. That avoids the duplicate storage expense and the synchronization complexities involved in running separate services to serve different file access protocols, such as SMB and NFS, for example.

Plus, FSx for ONTAP leverages NetApp storage efficiencies that reduce storage footprint and costs. These aspects collectively translate into lower overall monthly storage costs, positioning FSx for ONTAP to address several recommendations from the Sustainability, Operational Excellence, and Cost Optimization pillars.

Operational Excellence is also achieved through the integration with the popular automation tools Terraform, CloudFormation, and Ansible.

What would Well-Architected change about your storage environment?

Navigating the storage landscape within AWS requires a holistic approach, and the Well-Architected Framework's pillars can be your guide. From addressing data-heavy workload challenges to ensuring robust security and reliability, and optimizing costs while embracing sustainability, each pillar contributes to a well-rounded storage strategy.

These best practices highlight the need for agility in storage solutions, rigorous security measures, continuous reliability, cost-conscious practices, and a commitment to sustainability. Organizations can leverage these pillars to build storage architectures that not only meet today's challenges but also remain resilient and adaptable in the ever-evolving cloud.

Top comments (0)