DIY Service Mesh
This is a Do-It-Yourself Service Mesh, a simple tutorial for understanding
the internals of a service mesh. This project aims to provide a simple,
easy-to-understand reference implementation of a service mesh, which can be used
to learn about the various concepts and technologies used by a service mesh like Linkerd.
What are you going to learn?
- Build a simple proxy and add service mesh features to it.
- Use Netfilter to intercept and modify network packets.
- Create a simple control plane to manage the service mesh.
- Use gRPC to communicate between the proxy and the control plane.
- Create an Admission Controller to validate and mutate Kubernetes resources.
- Certificate generation flow and mTLS between the services.
- How HTTP/2 works and how to use it with gRPC to balance the traffic between the services.
- Add useful features like circuit breaking, retries, timeouts, and load balancing.
- Add metrics and tracing to the service mesh with OpenTelemetry.
- Canary deployments.
Some considerations
- Only for learning propose, not a production-ready service mesh.
- The proxy will print many logs to understand what is happening.
- Use IPTables instead of Nftables for simplicity.
- Keep the code as simple as possible to make it easy to understand.
- Some Golang errors are ignored for simplicity.
- Everything will be in the same repository to make it easier to understand the project.
What is going to be built?
The following components are going to be built:
- proxy-init: Configure the network namespace of the pod.
- proxy: This is the data plane of the service mesh, which is responsible for intercepting and modifying network packets.
- controller: This is the control plane of the service mesh, which is responsible for configuring the data plane.
- injector: This is an Admission Controller for Kubernetes, which mutates each pod that needs to be meshed.
- samples apps: Four simple applications will communicate with each other. (http-client, http-server, grpc-client, grpc-server)
Tools and Running the project
- kind to create a Kubernetes cluster locally.
- Tilt to run the project and watch for changes.
- Buf to lint and generate the Protobuf/gRPC code.
- Docker to build the Docker images.
- k9s to interact with the Kubernetes cluster. (Optional)
To start all the components, run the following command:
kind create cluster
tilt up
Tilt will build all the images and deploy all the components to the Kubernetes cluster.
All the images are created by the Dockerfile
in the root directory.
The main branch contains the WIP version of the project.
Architecture
The architecture of the service mesh is composed of the following components:
Creating the HTTP applications
-
http-client: This application is going to be called the
http-server
service. -
http-server: This application will be called by the
http-client
service.
In the next steps, grpc-client
and grpc-server
will be created.
http-client:
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer cancel()
n := 0
endpoint := os.Getenv("ENDPOINT")
if endpoint == "" {
endpoint = "http://http-server.http-server.svc.cluster.local./hello"
}
httpClient := &http.Client{
Timeout: 5 * time.Second,
}
// This application will call the endpoint every second
ticker := time.NewTicker(time.Second)
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
if err != nil {
panic(err)
}
resp, err := httpClient.Do(req)
if err != nil {
panic(err)
}
dump, err := httputil.DumpResponse(resp, true)
if err != nil {
panic(err)
}
resp.Body.Close()
n++
fmt.Printf("Response #%d\n", n)
fmt.Println(string(dump))
}
}
}
http-server:
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer cancel()
failRate, _ := strconv.Atoi(os.Getenv("FAIL_RATE"))
n := uint64(0)
hostname := os.Getenv("HOSTNAME")
version := os.Getenv("VERSION")
var b bytes.Buffer
b.WriteString("Hello from the http-server service! Hostname: ")
b.WriteString(hostname)
b.WriteString(" Version: ")
b.WriteString(version)
mux := http.NewServeMux()
mux.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {
// Dump the request
dump, _ := httputil.DumpRequest(r, true)
fmt.Printf("Request #%d\n", atomic.AddUint64(&n, 1))
fmt.Println(string(dump))
fmt.Println("---")
// Simulate failure
if failRate > 0 {
// Get a random number between 0 and 100
n := rand.Intn(100)
if n < failRate {
http.Error(w, "Internal server error", http.StatusInternalServerError)
fmt.Println("Failed to process request")
return
}
}
w.Header().Set("Content-Type", "text/plain")
w.WriteHeader(http.StatusOK)
w.Write(b.Bytes())
})
server := &http.Server{
Addr: ":8080",
Handler: mux,
}
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
<-ctx.Done()
return server.Shutdown(context.Background())
})
g.Go(func() error {
return server.ListenAndServe()
})
if err := g.Wait(); err != nil {
if err != http.ErrServerClosed {
panic(err)
}
}
}
The http-server
service is going to respond with a message that contains the
hostname and the version of the service.
Failures will be simulated by setting the FAIL_RATE environment variable.
Each service is going to be deployed in a different namespace:
- http-client:
http-client
Deployment in thehttp-client
namespace: http-client.yaml - http-server:
http-server
Deployment in thehttp-server
namespace: http-server.yaml
Testing the service mesh
Logs for http-client
and http-server
pods to see the communication between the services.
http-client logs:
Response #311
HTTP/1.1 200 OK
Content-Length: 71
Content-Type: text/plain
Date: Sat, 08 Jun 2024 19:38:27 GMT
Hello from http-server service! Hostname: http-server-799c77dc9b-56lmd Version: 1.0
http-server logs:
Request #171
GET /hello HTTP/1.1
Host: http-server.http-server.svc.cluster.local.
Accept-Encoding: gzip
User-Agent: Go-http-client/1.1
Implementing the proxy to intercept the HTTP/1.1 requests and responses.
Why need a proxy?
The proxy will intercept all of the inbound and outbound traffic of the services. (except explicitly ignored)
Linkerd has a Rust based proxy called linkerd2-proxy
, and Istio has a C++ based proxy called Envoy
,
which is a very powerful proxy with a lot of features.
The proxy code is going to be very simple and will be similar to linkerd2-proxy
functionalities.
For now, the proxy will listen on two ports: one for inbound traffic and another for outbound traffic.
- 4000 for the inbound traffic.
- 5000 for the outbound traffic.
This is a basic proxy implementation that intercepts HTTP requests and HTTP responses.
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
defer cancel()
g, ctx := errgroup.WithContext(ctx)
// Inbound connection
g.Go(func() error {
return listen(ctx, ":4000", handleInboundConnection)
})
// Outbound connection
g.Go(func() error {
return listen(ctx, ":5000", handleOutboundConnection)
})
if err := g.Wait(); err != nil {
panic(err)
}
}
func listen(ctx context.Context, addr string, accept func(net.Conn)) error {
l, err := net.Listen("tcp", addr)
if err != nil {
return fmt.Errorf("failed to listen: %w", err)
}
defer l.Close()
go func() {
<-ctx.Done()
l.Close()
}()
for {
conn, err := l.Accept()
if err != nil {
return fmt.Errorf("failed to accept: %w", err)
}
go accept(conn)
}
}
The listen
function listens on the port and calls the accept
function when a connection is established.
handleInboundConnection
All the inbound traffic is going to be intercepted by the proxy, print the request, forward the request
to the local destination port and print the response.
func handleInboundConnection(c net.Conn) {
defer c.Close()
// Get the original destination
_, port, err := getOriginalDestination(c)
if err != nil {
fmt.Printf("Failed to get original destination: %v\n", err)
return
}
fmt.Printf("Inbound connection from %s to port: %d\n", c.RemoteAddr(), port)
// Read the request
req, err := http.ReadRequest(bufio.NewReader(c))
if err != nil {
fmt.Printf("Failed to read request: %v\n", err)
return
}
// Call the local service port
upstream, err := net.Dial("tcp", fmt.Sprintf("127.0.0.1:%d", port))
if err != nil {
fmt.Printf("Failed to dial: %v\n", err)
return
}
defer upstream.Close()
// Write the request
if err := req.Write(io.MultiWriter(upstream, os.Stdout)); err != nil {
fmt.Printf("Failed to write request: %v\n", err)
return
}
// Read the response
resp, err := http.ReadResponse(bufio.NewReader(upstream), req)
if err != nil {
fmt.Printf("Failed to read response: %v\n", err)
return
}
defer resp.Body.Close()
// Write the response
if err := resp.Write(io.MultiWriter(c, os.Stdout)); err != nil {
fmt.Printf("Failed to write response: %v\n", err)
return
}
// Add a newline for better readability
fmt.Println()
}
The handleInboundConnection
function first reads the destination port to our service,
iptables will set the destination port using the SO_ORIGINAL_DST
socket option.
The function getOriginalDestination
returns the original destination of the TCP connection,
check the code to see how it works. (This is a Linux specific feature)
After that, read the request, forward the request to the local service port, read
the response and send it back to the client.
For visibility, print the request and response using io.MultiWriter
to write to the connection and stdout.
handleOutboundConnection
The outbound look very similar to the inbound, but forward the request to the target service.
func handleOutboundConnection(c net.Conn) {
defer c.Close()
// Get the original destination
ip, port, err := getOriginalDestination(c)
if err != nil {
fmt.Printf("Failed to get original destination: %v\n", err)
return
}
fmt.Printf("Outbound connection to %s:%d\n", ip, port)
// Read the request
req, err := http.ReadRequest(bufio.NewReader(c))
if err != nil {
fmt.Printf("Failed to read request: %v\n", err)
return
}
// Call the external service ip:port
upstream, err := net.Dial("tcp", fmt.Sprintf("%s:%d", ip, port))
if err != nil {
fmt.Printf("Failed to dial: %v\n", err)
return
}
defer upstream.Close()
// Write the request
if err := req.Write(io.MultiWriter(upstream, os.Stdout)); err != nil {
fmt.Printf("Failed to write request: %v\n", err)
return
}
// Read the response
resp, err := http.ReadResponse(bufio.NewReader(upstream), req)
if err != nil {
fmt.Printf("Failed to read response: %v\n", err)
return
}
defer resp.Body.Close()
// Write the response
if err := resp.Write(io.MultiWriter(c, os.Stdout)); err != nil {
fmt.Printf("Failed to write response: %v\n", err)
return
}
// Add a newline for better readability
fmt.Println()
}
As can be seen, the only difference is in how the external service is called.
// Call the external service ip:port
upstream, err := net.Dial("tcp", fmt.Sprintf("%s:%d", ip, port))
if err != nil {
fmt.Printf("Failed to dial: %v\n", err)
return
}
defer upstream.Close()
It is important to note that the service resolves the DNS, so only the IP and the port need to be provided.
How are the connections intercepted?
Kubernetes Pod Networking Understanding
Each kubernetes pod shares the same network between the containers, so the localhost
is the same for all the containers in the pod.
initContainers
Kubernetes has a feature called initContainers
, which is a container that runs before the main containers starts. These containers need to finish before the main containers starts.
iptables
The iptables
is a powerful tool to manage Netfilter in Linux, it can be used to intercept and modify the network packets.
Before our http-client and http-server containers starts, the proxy-init is going to configure the Netfilter
to redirect
all the traffic to the proxy inbounds and outbounds ports.
func main() {
// Configure the proxy
commands := []*exec.Cmd{
// Default accept for all nat chains
exec.Command("iptables", "-t", "nat", "-P", "PREROUTING", "ACCEPT"),
exec.Command("iptables", "-t", "nat", "-P", "INPUT", "ACCEPT"),
exec.Command("iptables", "-t", "nat", "-P", "OUTPUT", "ACCEPT"),
exec.Command("iptables", "-t", "nat", "-P", "POSTROUTING", "ACCEPT"),
// Create custom chains so is possible jump to them
exec.Command("iptables", "-t", "nat", "-N", "PROXY_INBOUND"),
exec.Command("iptables", "-t", "nat", "-N", "PROXY_OUTBOUND"),
// Jump to custom chains, if something is not matched, will return to the default chains.
exec.Command("iptables", "-t", "nat", "-A", "PREROUTING", "-p", "tcp", "-j", "PROXY_INBOUND"),
exec.Command("iptables", "-t", "nat", "-A", "OUTPUT", "-p", "tcp", "-j", "PROXY_OUTBOUND"),
// Set rules for custom chains: PROXY_INBOUND, redirect all inbound traffic to port 4000
exec.Command("iptables", "-t", "nat", "-A", "PROXY_INBOUND", "-p", "tcp", "-j", "REDIRECT", "--to-port", "4000"),
// Set rules for custom chains: PROXY_OUTBOUND
// Ignore traffic between the containers.
exec.Command("iptables", "-t", "nat", "-A", "PROXY_OUTBOUND", "-o", "lo", "-j", "RETURN"),
exec.Command("iptables", "-t", "nat", "-A", "PROXY_OUTBOUND", "-d", "127.0.0.1/32", "-j", "RETURN"),
// Ignore outbound traffic from the proxy container.
exec.Command("iptables", "-t", "nat", "-A", "PROXY_OUTBOUND", "-m", "owner", "--uid-owner", "1337", "-j", "RETURN"),
// Redirect all the outbound traffic to port 5000
exec.Command("iptables", "-t", "nat", "-A", "PROXY_OUTBOUND", "-p", "tcp", "-j", "REDIRECT", "--to-port", "5000"),
}
for _, cmd := range commands {
if err := cmd.Run(); err != nil {
panic(fmt.Sprintf("failed to run command %s: %v\n", cmd.String(), err))
}
}
fmt.Println("Proxy initialized successfully!")
}
Some important points:
- To allow outbound traffic from the proxy container, the
iptables
option--uid-owner
is used. - The use of custom chains is to make it easier to understand the rules and allow to return to the default chains if the rule is not matched.
- The
REDIRECT
option is used to redirect the traffic to the proxy and is the responsable to add theSO_ORIGINAL_DST
information to the socket.
Adding the proxy and proxy-init containers to the deployments:
spec:
initContainers:
- name: proxy-init
image: diy-sm-proxy-init
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
drop:
- ALL
containers:
- name: proxy
image: diy-sm-proxy
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 1337
- name: http-client
image: diy-sm-http-client
imagePullPolicy: IfNotPresent
The same configuration is going to be applied to the http-server
deployment.
Some important points:
- proxy-init run as init container that is going to configure the network namespace of the pod and exit.
-
NET_ADMIN
andNET_RAW
are linux capabilities that are necessary to configure theNetfilter
, without these capabilitiesiptables
can't call the system calls to configure theNetfilter
. - Using the
runAsUser: 1337
in theproxy
container is very important so the proxy traffic to outside of the pod is allowed.
Logs output for the proxy and the applications
http-server logs:
proxy Inbound connection from 10.244.0.115:60296 to port: 8080
http-server Request #13
http-server GET /hello HTTP/1.1
http-server Host: http-server.http-server.svc.cluster.local.
http-server Accept-Encoding: gzip
http-server User-Agent: Go-http-client/1.1
proxy GET /hello HTTP/1.1
proxy Host: http-server.http-server.svc.cluster.local.
proxy User-Agent: Go-http-client/1.1
proxy Accept-Encoding: gzip
proxy
proxy HTTP/1.1 200 OK
proxy Content-Length: 86
proxy Content-Type: text/plain
proxy Date: Mon, 17 Jun 2024 20:58:46 GMT
proxy
proxy Hello from the http-server service! Hostname: http-server-c6f4776bb-mmw2d Version: 1.0
http-client logs:
proxy Outbound connection to 10.96.182.169:80
proxy GET /hello HTTP/1.1
proxy Host: http-server.http-server.svc.cluster.local.
proxy User-Agent: Go-http-client/1.1
proxy Accept-Encoding: gzip
proxy
proxy HTTP/1.1 200 OK
proxy Content-Length: 86
proxy Content-Type: text/plain
proxy Date: Mon, 17 Jun 2024 21:04:53 GMT
proxy
proxy Hello from the http-server service! Hostname: http-server-c6f4776bb-slpdf Version: 1.0
http-client Response #16
http-client HTTP/1.1 200 OK
http-client Content-Length: 86
http-client Content-Type: text/plain
http-client Date: Mon, 17 Jun 2024 21:04:53 GMT
http-client
http-client Hello from the http-server service! Hostname: http-server-c6f4776bb-slpdf Version: 1.0
Adding manually the proxy-init and proxy containers?
Doing this is not so practical, right?
Kubernetes has a feature called Admission Controller
, which is a webhook that can validate and mutate the objects before they are persisted in the etcd.
Learn more about it here.
But in the next steps, a Mutation Admission Controller will be created to inject the proxy-init and proxy containers into the pods that need to be meshed.
Next
For the next part, the focus will be to create the Admission Controller to inject the proxy-init and the proxy.
Contact and Github Project
The github project contains the WIP of the project.
Feel free to contact me: https://www.linkedin.com/in/ramonberrutti/
Top comments (0)