While I've been using Amazon SageMaker for training machine learning models, I found it a bit inconvenient due to the unstable connection with Weights & Biases (WandB), the tool I was using for experiment management.
In light of this, I thought that the Experiments feature in SageMaker Studio might resolve this issue, so I decided to give it a try.
Setting up Studio required me to configure basic network structures like VPC and subnets. As someone new to cloud technology, I found this quite challenging. Hence, I've put together this introductory guide to help beginners through the process.
If you notice any mistakes, I would appreciate your feedback in the comments.
Apologies for the language in the diagrams being in Japanese!
Follow the order below to configure the network and SageMaker Studio settings:
- VPC (Virtual Private Cloud)
- Private Subnet
- VPC Endpoint
- Route Table
- Security Group
- SageMaker Studio Domain
- User Profile
Taking into account that the VPC where the application is deployed and the other VPCs may be connected in the future via VPC peering, we set the CIDR block so that the CIDR blocks of each VPC do not overlap.
There are three ranges defined in the private IPv4 address space (RFC 1918), but since 10.0.0.0/16 is already in use in the existing VPC, this time we will use a different subnet (10.1.0.0/24) within the same 10.0.0.0/8 range.
- 10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
- 172.16.0.0 - 172.31.255.255 (172.16.0.0/12)
- 192.168.0.0 - 192.168.255.255 (192.168.0.0/16)
*Even if it's a different subnet within the same 10.0.0.0/8 range, it's fine to use 10.2.0.0/16 or similar.
*It's also fine to use a different private address range, such as 172.16.0.0/16 or 192.168.0.0/16.
Later in the procedure, we will create Interface VPC Endpoints for the ECR API and ECR Docker Registry, but if private DNS is not enabled within the VPC, an error will occur during creation.
In order to enable private DNS within the VPC, set both of the following VPC attributes to true in the VPC edit screen.
In the VPC dashboard, create a new private subnet where the Studio will be deployed. Specify the subnet range as 10.1.0.0/25.
Considering the connection to S3, which is the source of the training data and the upload destination of the model, and ECR, which is the source of the Docker image, create each VPC endpoint from the VPC dashboard and associate it with the new private subnet.
Create the VPC endpoint sagemaker-studio-s3.
To restrict access to S3 to private access from within the VPC, we use a gateway VPC endpoint. The gateway VPC endpoint is associated with the S3 endpoint service (in this case, com.amazonaws.ap-northeast-1.s3).
Unlike interface VPC endpoints, gateway VPC endpoints for S3 are attached to the VPC. Therefore, you do not directly associate subnets or security groups. Instead, the gateway VPC endpoint functions by adding an entry to the route table of the VPC.
The route table will be configured later, so it's okay to leave the initial configuration check empty.
Also, there are two services with the same name, com.amazonaws.ap-northeast-1.s3, but please select Type: Gateway.
Create the VPC endpoints sagemaker-studio-ecr-api and sagemaker-studio-ecr-dkr.
For ECR, we use interface VPC endpoints (the services that use gateway VPC endpoints are S3 and DynamoDB).
The interface VPC endpoints are associated with two endpoint services of ECR:
- For ECR API (com.amazonaws.ap-northeast-1.ecr.api)
- For ECR Docker Registry (com.amazonaws.ap-northeast-1.ecr.dkr)
The entity of the interface VPC endpoint is an ENI (Elastic Network Interface), so it is configured within the subnet. Therefore, unlike gateway VPC endpoints, you also need to select a subnet in the initial configuration.
Subnets are associated with a single route table, and traffic is forwarded according to the routing settings of that route table. In this case, since there is only one subnet in the VPC, we will configure it to control the routing of traffic to networks outside the VPC.
For interface VPC endpoints, there is no need to add special routes to the route table. This is because interface VPC endpoints use DNS to control communication with the target service. AWS automatically creates private DNS entries for the endpoint services it provides, and these are used to access the service.
For gateway VPC endpoints, the route table of the associated private subnet is configured to route traffic to the endpoint service to the gateway endpoint.
Create a new security group for the interface VPC endpoint for ECR.
Attach the security group you created on the endpoint screen.
Attach a role that includes AWSSageMakerFullAccess, and select the VPC and subnet you created earlier.
If multiple users are using SageMaker Studio, it is recommended to create a separate user profile for each user. This allows each user to access their own workspace and resources, reducing the likelihood of conflicts with other users.
Creating a user profile has the following advantages:
- Each user can have their own development environment
- Resources can be allocated and usage limits can be set for each user
- Access permissions can be managed for each user Once a user profile is created, it can be selected when opening Studio.