Disclaimer : These are majorly notes and summaries i made while reading this article - "Borg, Omega, and Kubernetes: Lessons learned from three container-management systems over a decade".
I strongly recommend everyone using container systems should have a read.
Free BSD(Berkely Software distribution) mechanism is an implementation of OS Level virtualisation which allows us to partition Free BSD derived computer into independent mini systems sharing the same kernel with little overhead.
DRAM(Dynamic Random Access Memory) is a type of semi conductor memory that is used for program code or data needed by the computer to function. DRAM is used to access large amount of info needed by the computer quickly.
L3 memory cache is developed to increase the performance of L1 and L2 cache. In multi-core processors L3 cache are usually shared among cores while L1 and L2 are dedicated.
Containerization started with chroot (an os level virtualisation on the kernel) —> Then, FSB jails extended the contexts to namespace like process ID. —> Then, Solaris enhanced these features which are now used in Linux control groups today.
0) The resource isolation by google improved utilization.
Borg uses containers to co-locate latency sensitive, user facing batch jobs on the same VM’s. Because user facing jobs reserve more memory than needed so when there’s a spike and fail-over they can use that extra memory.
This isolation is not perfect since the containers cannot prevent interference in the resources that the OS kernel doesn’t manage such as L3 cache and memory bandwidth. Containers also need an additional layer of security in the cloud.
Containers are now more than isolation, it includes an image which is the make up of the apps in the container. Midas Package Manager MPM is used by google to build and deploy images. The relationship between the isolation mechanism and MPM can be seen in Docker Daemon and Container registry.
1) A container is essentially the runtime isolation and the image.
Containerization transforms the data center from being Machine Oriented to Application oriented. It abstracts details of the machine and OS from the app developer and deployment infrastructure.
The shift of management API from machine oriented to application oriented improves introspection and deployment.
Decoupling of image and OS makes it possible to provide similar environment in production and development environments.
2) To make this abstraction we’ll have a container image that can encapsulate all app dependency into a package that can be deployed into the container. This way only local external dependency will be on the Linux kernel system interface call.
The Linux kernal system is the core interface between the computer hardware and the processes. So the interface calls are how the a program enters the kernel to perform a call.
Combining 1 and 2 we can say Chroot, Chgrp and namespace prevents data leak in the VM and improves resource utilization while the container images isolates the app from the OS.
3) This isolation is not perfect. Applications can still be exposed to churn in the OS via socket options, arguments to ioctl calls and /proc.
A socket is one endpoint of a two way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent.
Endpoint = IP address + Port number.
A server runs on a specific computer and has a socket bound to a port number. The server waits listening to the socket for a client to make a request.
The client tries to rendezvous with the server on the machine and port. It also tries to identify itself to the server so it binds to a local port number assigned by the system that it will use during this connection. After acceptance, the server gets a new socket bound to the same local port and also has its remote endpoint set to the address and port of the client. The new socket is to listen to request of the connected client. The client also creates a socket to communicate with the server. The clients can communicate by writing and reading from their sockets.
read on sockets.
Docker Daemon can listen to Docker engine API request via three sockets types: Linux, tcp and fd.
Input and output control ioctl() system calls manipulate many underlying devices parameters of special files. e.g the terminal. IOCTL recieves the following arguments:
- fd must be an open file descriptor.
- arg2 is a device dependent request code.
- arg3 is an untyped memory pointer.
/proc is a virtual file-system created on the fly when system boots and it is dissolved ant shut down.
If you run: stat /proc
you will see it doesn’t have any size.
Same for all its sub directories obviously.
Despite the flaws mentioned in 3, the isolation and dependency reduction has been effective for google and they only run containers which consequently means they have a small number of OS version and only require small staff to maintain them.
There are many ways to achieve these images. In Borg, program binaries are statically linked at build to known libraries available company-wide. This is still bad because upgrades on the base image can affect running applications as the image are only installed once and shared by multiple programs.
Binaries are compiled code: They allow programs to be installed without having to installed without having to compile the source code.
Libraries are a collection of standard programs and subroutines(think macro?) that are available for use.
Modern containers require explicit user commands to share image data between containers trying to the issue of dependencies.
Remote Procedure Call (RPC) is a software communication protocol that one program can use to request a service from a program located in another computer on a network without having to understand the network detail. i.e It lets a program execute a procedure in a different address space without having to explicitly code details of remote interaction.
Borg Naming System BNS name for each task includes cell name task number and job name. This is written in a highly available file in chubby. which is the used by the RPC system to find task endpoints.
Chubby is a highly available and persistent distributed lock service and file-system. It manages locks for resources and stores configuration information for various distributed services in google cluster environments.
Chubby Distributed lock is a filesystem that provides reliable storage for loosely coupled system. It provides a coarse grained mechanism using locking service, Library storage, and high throughput rate.
Chubby cell contains a 5 servers called replicas. The elected master are elected by a distributed protocol like round-robin or berkeley’s algorithm.
Master duration is the duration in which no other master would be elected by replicas that have once voted for a master.
Files and directories are known as nodes. contained in the chubby namespace. Every node has a distinct meta data.
Handle specifies include check digits, handle seq no and mode info. They are file descriptors for opened nodes.
It implements is reader writer locks.
Distributed locks is complex thereby making is costly since it permits only using lock interactions. They also have status which are described as strings called sequencies.
Roundrobin is an arrangement of choosing all elements in a group equally. i.e. They take turns in a logical order like top to bottom.
Berkeley’s algorithm or clock synchronization is a method of synchronizing clock values in a distributed system by using external reference clock. It assumes that each node doesn;t have a accurate time source.
Happy Learning, I hope you have fun reading the paper.
Top comments (0)