Have you come across any situation where an ECS container is taking time to initialise a new ECS task with an ECR image. ? This mostly happens when your Docker image size is more example more than 200 MiB.
In this blog, I will explain new technology which is going to help to eliminate that worry. I got to know this concept in AWS Community Day Pune 2023 when Mayur Bhagia ( Principal Solutions Architect at Amazon Web Services ) was giving a talk on "Efficient scaling: Seekable OCI for faster container startup for Amazon ECS and Fargate". This topic attracts me to do more study on this. So, here I come up with a details blog about "SOCI" - Seekable OCI.
Seekable OCI is an open-source technology developed by AWS that can launch containers faster by lazily loading the container image. It is also called as "SOCI" - Seekable OCI" " and is pronounced, "so-CHEE". SOCI works by creating an index of the files within the container image which is called as SOCI index.
Most of the time while launching containers download the entire container image from ECR or registry before starting the container. It is unnecessary to wait to load all the data in case only a small portion of data is needed for the startup process. A research paper published on Usenix says that pulling image take 76% of container start time, but only 6.4% of that data is needed to start the process. So it is very obvious that 76% of the time is an area to look for improvement in which process of pulling Docker images from the ECR registry. SOCI works in that highlighted area to improve the start-up time of the container.
There are various ways to solve this problem including reducing the size of container images by using a multi-stage docker build and pre-fetching container images into local storage. But as I mentioned above SOCI supports lazy loading. Lazy loading is the approach when data is downloaded in the background while the start-up is going in parallel. Container images are stored as an ordered list of layers, and layers are most often stored as gzipped tar files.
SOCI's approach for addressing this is to eliminate the need to download the entire image before launching the container and to instead lazily load data on demand, and prefetch data in the background. That is why SOCI is the right option for this problem.
containerd is a container runtime that manages the lifecycle of a container on a physical or virtual machine (a host). In
containerd has one component which manages the container filesystem is called a
snapshotter. The main job of a
snapshotter is to create a folder that can be used by
containerd to unpack a layer.
snapshotter pulls and decompresses the entire container image before the container gets started. With the help of lazy loading snapshotter container starts without downloading the entire image content and lazily loads files from the Amazon ECR ( or any other OCI-compatible registry). When the container is started without waiting for container content to be fully downloaded it results in a shorter start time.
Before the SOCI
snapshotter lazily loads a container image it needs to know image metadata means which files are in each layer of the image. Then only the SOCI snapshotter will be able to lazy load the container image. The SOCI index gives all required metadata to the lazy load.
AWS Fargate support for SOCI is available at no additional cost and you will only be charged for storing the SOCI indexes in Amazon ECR
Only tasks that run on Linux platform version 1.4.0 can use SOCI indexes. Tasks that run Windows containers on Fargate aren't supported.
Lazy loading with container images greater than 250 MiB compressed in size. You will likely see a reduction in smaller images.
Faster container startup times
Reduced bandwidth usage
Improved performance of containerised workloads
I hope this information excites you to give it a try on SOCI to reduce your container start time on Amazon ECS and Fargate. Feel free to reach out to me on my Twitter handle @AvinashDalvi_ or comment on the blog.
After reading multiple blogs I have consolidated my learning from Mayur's talk and research. Find below all references which can give more details about the whole execution.