DEV Community

Artur Schneider for AWS Community Builders

Posted on • Updated on

Migrating large amounts of data to Amazon FSx for Windows File Server with AWS DataSync

Introduction

In the evolving landscape of technology and business, companies often find themselves navigating through a myriad of choices. One such choice is the decision between maintaining on-premises infrastructure or shifting to cloud-based solutions. This dilemma was recently faced by one of our clients - a global container shipment company.

Their challenge was an aging fleet of local Windows file servers. As the hardware aged, issues of inefficiency, decreased performance, and increased maintenance costs became prevalent. Faced with the significant investment of time and resources required to refresh their local servers, they began to explore alternative options.

Their search led them to AWS, and specifically, Amazon FSx for Windows File Server. Amazon FSx for Windows File Server offers fully managed Microsoft Windows file servers, providing the compatibility and features that their organization needed, all with the flexibility, scalability, and cost benefits of a cloud-based solution. It was an obvious choice, not only from a technological perspective but also for a smoother transition in terms of operating environment.

The only obstacle that remained was the formidable task of migrating vast amounts of data from their local servers to the cloud. The complexity and potential risks of such a migration could not be underestimated. Yet, AWS once again provided the solution - AWS DataSync.

AWS DataSync is a data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect. Leveraging DataSync, we aimed to streamline the file migration process from one single dashboard, ensuring a seamless and efficient transition.

In this blog post, we'll delve into our experience of this migration. I'll guide you through the process we followed, the challenges we faced, the solutions we devised, and most importantly, the success we achieved. Strap in for a deep dive into our journey from local Windows servers to Amazon FSx, orchestrated by AWS DataSync.

Requirements

Before we dive into the migration, it's essential to understand the necessary prerequisites for using AWS DataSync and Amazon FSx for Windows.

AWS DataSync requirements

Let's take a closer look at the DataSync requirements.

Network Connectivity: AWS DataSync requires network connectivity between your source servers and the AWS region where your destination is located. This can be achieved either via the internet or using AWS Direct Connect / VPN for a dedicated network connection like in our case through VPC Endpoints. It is worth it to have a closer look at the official AWS documentation AWS DataSync network requirements

DataSync Agent: The DataSync agent must be installed on a virtual machine or physical server with at least 4 vCPUs, 16 GB of RAM, and network connectivity to both your source storage and AWS. You can deploy your agent on a VMware ESXi, Linux Kernel-based Virtual Machine (KVM), or Microsoft Hyper-V hypervisor. For storage in a virtual private cloud (VPC) in AWS, you can deploy an agent even on an Amazon EC2 instance.

Outbound Network Access: The DataSync agent requires outbound network access over port 80 for HTTP or port 443 for HTTPS. These ports must be open on your firewall.

File System Compatibility: If you are transferring data from a Windows-based file system, it should support SMB (Server Message Block) protocol. For Unix or Linux source data, your file system must support NFS (Network File System).

AWS Permissions: In order to transfer data, the DataSync service needs the necessary permissions in AWS Identity and Access Management (IAM). This requires an IAM role that DataSync can assume to access resources.

Amazon for Windows File Server requirements

Now let's look at the requirements specific to using Amazon FSx for Windows File Server.

VPC and Security Group Configuration: Amazon FSx requires an Amazon VPC with the necessary security group rules. Your security groups must allow inbound traffic over the SMB port (usually port 445) from your clients, and the clients must be able to route traffic to the Amazon FSx file system.

Active Directory Integration: Amazon FSx must be joined to an AWS Managed Microsoft AD or self-managed AD. Your Windows users and groups must be part of this Active Directory.

Minimum Storage Capacity: The minimum storage capacity for Amazon FSx is 32 GiB.

Backup and Maintenance Preferences: You need to choose your preferred daily backup window, weekly maintenance window, and whether to enable automatic backup.

Network Connectivity: The FSx file server must have network connectivity to your workloads either within AWS (for workloads running on EC2 instances) or on-premises (for workloads running on your local servers).

Setting up Amazon FSx for Windows File Server

Before you can proceed with setting up AWS DataSync, your destination - in this case, Amazon FSx for Windows File Server - needs to be correctly set up and ready to receive data.

Step 1: Prepare for Amazon FSx

Before creating your Amazon FSx for Windows File Server, make sure you meet the following prerequisites:

  • Have a Active Directory ready to join your FSx and reachable from your AWS environment. Alternatively set up your AWS Managed Microsoft AD or self-managed Active Directory, which FSx will need to join.
  • Ensure you have a VPC and Security Group ready to be associated with FSx.

Step 2: Create Your Amazon FSx for Windows File System

  • Navigate to the Amazon FSx console and select "Create file system".
  • Choose the deployment type as "FSx for Windows File Server" and select "Next".

Step 3: Specify File System Details

  • Name your file system for easy identification and proceed to configure it.
  • Choose your desired storage capacity and performance level.
  • Configure the network & security settings by selecting the desired VPC, the preferred subnet, and the security groups.
  • In the Windows authentication section, select the Microsoft Active Directory that you've set up. The Amazon FSx for Windows File Server will be joined to this AD.

Step 4: Configure Optional Settings

  • You can choose to leave these settings at their default or customize them according to your needs:
  • Maintenance preferences: You can specify a weekly time window when automatic maintenance activities occur.
  • Data deduplication: This can reduce your storage costs if you have redundant data.
  • Encryption: FSx data is encrypted at rest using keys you manage through AWS Key Management Service (KMS). You can use the default key, or choose a key you created.
  • Throughput capacity: You can manually specify a throughput capacity, or you can allow Amazon FSx to automatically adjust it based on the workload.

Step 5: Review and Create

  • Review your settings to make sure everything is correct.
  • Finally, click "Create file system".

Your Amazon FSx for Windows File Server should now be set up and ready to serve as a destination for your data transfer task in AWS DataSync. Note the DNS name and Windows Remote Management (WinRM) port of your newly created FSx for Windows file system - you'll need these when configuring your data transfer task in AWS DataSync.

Now that your Amazon FSx is all set up, you can proceed with defining your target in AWS DataSync, and start transferring your data!

Setting up AWS DataSync

Step 1: Setting up VPC Endpoints

Before setting up AWS DataSync, ensure you've created a VPC endpoint to communicate with AWS DataSync securely within your Amazon VPC.

  • Navigate to the VPC Dashboard in your AWS Management Console.
  • Under the "Virtual Private Cloud" section, click on "Endpoints".
  • Click "Create Endpoint", and in the "Service category" choose "AWS services".
  • For the service name, select "com.amazonaws.[your-region].datasync" where [your-region] should be replaced with your AWS region (like us-east-1, us-west-2, etc.).
  • In the VPC section, choose the VPC where you want to create the endpoint.
  • Select the appropriate Route Table, Security Group, and other options as per your requirements and click "Create endpoint".

Step 2: Deploying the DataSync Agent

  • Visit the AWS DataSync console in your AWS account and select "Get started".
  • Choose "Deploy a new agent", and follow the instructions to download and deploy the agent in your local environment.
  • You can connect to the DataSync via console from you Hypervisor to e.g. test the connection the the VPC endpoint or to retrieve the activation key

AWS DataSync Agent local console view

Step 3: Configuring the DataSync Agent

After deploying the agent, you'll need to configure it to connect to your on-premises file system and your AWS account.

  • In the AWS DataSync console, select your newly deployed agent.
  • Click "Configure agent", and then enter the IP address or hostname of your on-premises file system.
  • Enter your AWS account ID, choose the AWS region where you want to transfer data, and input the VPC endpoint ID that you created in Step 1.
  • Choose a method to authenticate the agent to your AWS account (either by creating an IAM role or by entering your AWS access keys).

Step 4: Create a Location for your Data

  • In the DataSync console, choose "Create location".
  • Select the location type based on where your data is stored. For on-premises locations, choose "NFS" or "SMB".
  • Depending on the location type, provide the required information which includes the Agent, server hostname, and mount path for NFS or Agent, domain, user name, password, server hostname, and share name for SMB.
  • Similarly, create a location for your destination data in FSx.

Step 5: Create a Task to Transfer Data

  • After setting up locations, you can now create a data transfer task.
  • In the DataSync console, choose "Create task".
  • Select the source location and destination location that you created in Step 4.
  • Configure your data transfer settings like options for handling metadata, data verification, etc., and set a schedule if you want the task to run on a schedule.
  • Finally, start the data transfer task.

You should monitor your data transfer tasks on the AWS DataSync console. This is where you can view the progress of the task, check for any errors, and view performance metrics.Please remember that this is a simplified overview of the process and does not include every possible configuration option. Always refer to the official AWS documentation for the most accurate and up-to-date information.

Challenges

During our migration we experienced some challenges which I can summarize to the following:

Network Configuration Challenges: Establishing secure and efficient network connections between your local environment and AWS was one of the major challenges. The complexity of existing VPN connection and VPC endpoints, while ensuring that all network requirements are met and firewalls are appropriately configured, can be quite daunting.

Insufficient Local Compute Resources: Another issue was the need for a certain level of compute resources in your local environment to host the AWS DataSync Agent. If the minimum requirements were not met, although you were allowed to proceed with the migration, there could be performance issues. In such cases, using traditional methods like robocopy was an alternative.

IP Address Allocation: AWS DataSync creates an Elastic Network Interface (ENI) in your chosen VPC for each task, each with its own IP address. It was crucial to ensure sufficient IP addresses were available in your subnets to accommodate these ENIs, preventing possible IP address exhaustion.

Permissions in Active Directory Group: With Amazon FSx joining your existing domain, you needed to carefully select the AD group that would have administrative permissions. This decision is critical as it can't be modified after the FSx file system was created. Therefore, careful consideration was required when choosing the AD groups that would manage FSx in the future.

Conclusion

Reflecting on our migration journey, we've come to recognize several crucial takeaways and successes.

AWS DataSync proved to be an exceptional tool for orchestrating large-scale data migrations from multiple on-premises locations to AWS. The ability to consolidate all migration tasks in a single dashboard, regardless of the geographical location, offers an unmatched level of organization and visibility. This feature, combined with comprehensive logging and monitoring capabilities, allowed us to maintain a tight control on our migration process.

Choosing Amazon FSx for Windows File Server was indeed a strategic decision that aligned with our client's requirements and expectations. FSx provided a seamless transition from their familiar local Windows File Servers to the cloud. This ensured minimal disruption to their business operations, and allowed them to continue using file shares just as they were accustomed to, but with the added advantages of AWS' scalability, reliability, and robust security.

Our most significant achievement was successfully migrating vast amounts of data across the globe to AWS with near-zero downtime. This feat, executed effectively with a keen eye on minimizing business impact, marks a pivotal moment in our client's digital transformation journey.

Moving forward, we are confident in our ability to leverage AWS DataSync and Amazon FSx for Windows File Server to assist other businesses in their migration endeavors. As with any ambitious project, challenges are inevitable. Still, we have shown that through careful planning, robust technology, and a strong understanding of our infrastructure, even the most daunting hurdles can be overcome. Our experience stands testament to the power of cloud technology in transforming business landscapes, and we're excited to continue helping organizations navigate their unique paths to the cloud.

Latest comments (0)