What's new at AWS 📢
☑ #Amazon EKS support in Amazon SageMaker HyperPod to scale foundation model development
☑ This new availability enables customers to run and manage their Kubernetes workloads on SageMaker HyperPod, a purpose-built infrastructure for foundation model (FM) development which reduces time to train models by up to 40%.
☑ Many customers use Kubernetes to orchestrate their ML workflows due to its portability, scalability, and rich ecosystem of tools. However managing hardware failures are not automated.
☑ With this launch, customers can run deep health checks during cluster creation and automated hardware failures during ML trainings and fine-tuning.
☑ In addition, HyperPod automatically replaces faulty nodes(self-healing performant clusters) and resumes training from the last checkpoint on both AWS Trainium and Nvidia GPU at a scale of more than a thousand accelerators.
☑ EKS orchestrated HyperPod clusters also integrate with CloudWatch Container Insights to provide out-of-the-box observability of health status checks and visual dashboards.
☑ Customer can use HyperPod CLI, or their preferred tools, to submit, manage, and monitor workloads.
☑ What is Amazon EKS:
➰ AWS managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers as well.
➰ It automatically manages the availability and scalability of the Kubernetes control plane nodes and major tasks.
➰ Amazon EKS is integrated with AWS services such as Elastic load balancer, IAM, VPC, and CloudTrails are added advantage.
📌 Explore more about EKS: https://aws.amazon.com/eks/
📌 Explore more about SageMaker HyperPod: https://aws.amazon.com/blogs/aws/amazon-sagemaker-hyperpod-introduces-amazon-eks-support/
Top comments (0)