DEV Community

Hitesh Pattanayak
Hitesh Pattanayak

Posted on

1

gRPC dynamic loadbalancing

gRPC

gRPC has many benefits, like:

  1. Multiplexes many requests using same connection.
  2. Support for typical client-server request-response as well as duplex streaming.
  3. Usage of a fast, very light, binary protocol with structured data as the communication medium among services.

More about gRPC

All above make gRPC a very attractive deal but there is some consideration with gRPC particularly load balancing.

The issue

Lets delve deep into the issue.

For this we will require a setup. The setup includes below:

  • a gRPC server, we call it Greet Server.
  • a client that acts as a REST gateway and internally it is a gRPC client as well. We call it Greet Client.

We are also using kubernetes for the demonstration, hence there are a bunch of YAML manifest files. Let me explain them below:

greetserver-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greetserver-deploy
spec:
  replicas: 3
  selector:
    matchLabels:
      run: greetserver
  template:
    metadata:
      labels:
        run: greetserver
    spec:
      containers:
        - image: hiteshpattanayak/greet-server:1.0
          imagePullPolicy: IfNotPresent
          name: greetserver
          ports:
            - containerPort: 50051
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
Enter fullscreen mode Exit fullscreen mode

The above is a deployment mainfest of Greet Server, that spins up 3 replicas of Greet Server.
The Greet Server uses hiteshpattanayak/greet-server:1.0 image.
Also each pod of the deployment exposes 50051 port.
Environment variables: POD_IP and POD_NAME are injected into the pods.

What does each pod in the above server do?

They expose an rpc or service that expects a first_name and a last_name, in response they return a message in this format:
reponse from Greet rpc: Hello, <first_name> <last_name> from pod: name(<pod_name>), ip(<pod_ip>).

From the response, we can deduce which pod did our request land in.

greet.svc.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    run: greetserver
  name: greetserver
  namespace: default
spec:
  ports:
    - name: grpc
      port: 50051
      protocol: TCP
      targetPort: 50051
  selector:
    run: greetserver
Enter fullscreen mode Exit fullscreen mode

The above is a service manifest of Greet server service. This basically acts as a proxy to above Greet Server pods.

The selector section of the service matches with the labels section of each pod.

greetclient-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greetclient-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      run: greetclient
  template:
    metadata:
      labels:
        run: greetclient
    spec:
      containers:
        - image: hiteshpattanayak/greet-client:4.0
          name: greetclient
          ports:
            - containerPort: 9091
          env:
            - name: GRPC_SERVER_HOST
              value: greetserver.default.svc.cluster.local
            - name: GRPC_SVC
              value: greetserver
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
Enter fullscreen mode Exit fullscreen mode

The above is a deployment mainfest of Greet Client, that spins up 1 replica of Greet Client.

As mentioned above the pod runs an applications that acts as a rest gateway and reaches out to Greet Server in order to process the request.

This deployment is using hiteshpattanayak/greet-client:4.0 image.

The 4.0 tagged image has the load balancing issue.

Also the pod(s) expose port 9091.

greetclient-svc.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    run: greetclient
  name: greetclient
  namespace: default
spec:
  ports:
    - name: restgateway
      port: 9091
      protocol: TCP
      targetPort: 9091
  selector:
    run: greetclient
  type: LoadBalancer
Enter fullscreen mode Exit fullscreen mode

The above service is just to redirect traffic to the Greet Client pods.

greet-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: greet-ingress
  namespace: default
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
    - host: greet.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: greetclient
                port:
                  name: restgateway
Enter fullscreen mode Exit fullscreen mode

The above ingress is to expose Greet Client Service to outside of the cluster.

Note:
minikube by default does not have ingress enabled by default

  • check enabled or not: minikube addons list
  • enable ingress addon: minikube addons enable ingress

greet-clusterrole.yaml

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: service-reader
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "watch", "list"]
Enter fullscreen mode Exit fullscreen mode

greet-clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: service-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: service-reader
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default
Enter fullscreen mode Exit fullscreen mode

The cluster role and cluster role binding are required because the default service account does not have permission to fetch service details.
And the greet client pod internally tries to fetch service details, hence the binding is required.

Create the setup in below sequence:


kubectl create -f greet-clusterrole.yaml

kubectl create -f greet-clusterrolebinding.yaml

kubectl create -f greetserver-deploy.yaml

kubectl get po -l 'run=greetserver' -o wide
<<com
NAME                                  READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
greetserver-deploy-7595ccbdd5-67bmd   1/1     Running   0          91s   172.17.0.4   minikube   <none>           <none>
greetserver-deploy-7595ccbdd5-k6zbl   1/1     Running   0          91s   172.17.0.3   minikube   <none>           <none>
greetserver-deploy-7595ccbdd5-l8kmv   1/1     Running   0          91s   172.17.0.2   minikube   <none>           <none>
com

kubectl create -f greet.svc.yaml
kubectl get svc
<<com
NAME          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
greetserver   ClusterIP   None         <none>        50051/TCP   77s
com

kubectl create -f greetclient-deploy.yaml
kubectl get po -l 'run=greetclient' -o wide
<<com
NAME                                 READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
greetclient-deploy-6bddb94df-jwr25   1/1     Running   0          35s   172.17.0.6   minikube   <none>           <none>
com

kubectl create -f greet-client.svc.yaml
kubectl get svc
<<com
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
greetclient   LoadBalancer   10.110.255.115   <pending>     9091:32713/TCP   22s
greetserver   ClusterIP      None             <none>        50051/TCP        5m14s
com

kubectl create -f greet-ingress.yaml
kubectl get ingress
<<com
NAME            CLASS   HOSTS       ADDRESS        PORTS   AGE
greet-ingress   nginx   greet.com   192.168.49.2   80      32s
com
Enter fullscreen mode Exit fullscreen mode

since we have exposed the Greet Client to outside of cluster via greet-ingress, the endpoint can be accessed on: http://greet.com/greet.
so when we make a curl request:

Request#1

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com
Enter fullscreen mode Exit fullscreen mode

Request#2

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com
Enter fullscreen mode Exit fullscreen mode

Request#3

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com
Enter fullscreen mode Exit fullscreen mode

So the ISSUE is no matter hw many request I make, the request lands up in the same server. This is happending because of sticky nature of HTTP/2.
The advantage of gRPC becomes it own peril.

The codebase to replicate the issue can be found here.

gRPC Client side load balancing

We have discussed earlier about one of the challenges with gRPC which is load balancing.

That happens due to the sticky nature of gRPC connections.

Now we shall discuss how to resolve the issue.

This particular solution is quite simple.

The onus to load balance falls on the client itself.

To be particular, client does not mean end user. All gRPC servers have a REST gateway that is used by end users.

This is because HTTP2, which is the protocol used by gRPC, is yet to have browser support.

Hence the REST gateway acts as a gRPC client to gRPC servers. And thats why gRPC is mostly used for internal communications.

Earlier we had used hiteshpattanayak/greet-client:4.0 image for Greet Client which had the normal gRPC setup without client side load balancing.
The code can be referred here.

Changes

Code changes

For this solution we use hiteshpattanayak/greet-client:11.0 image. The codebase has below changes:

Updated client deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: greetclient-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      run: greetclient
  template:
    metadata:
      labels:
        run: greetclient
    spec:
      containers:
        - image: hiteshpattanayak/greet-client:11.0
          name: greetclient
          ports:
            - containerPort: 9091
          env:
            - name: GRPC_SVC
              value: greetserver
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
Enter fullscreen mode Exit fullscreen mode
  • configuring load balancing policy while making dialing to the server.
  • configuring to terminate connection while dialing to the server.
a.conn, err = grpc.Dial(
        servAddr,
        grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
        grpc.WithBlock(),
        opts,
    )
Enter fullscreen mode Exit fullscreen mode
  • the server address used while dialing needs to the dns address of the server.
var serverHost string
if host := kubernetes.GetServiceDnsName(client, os.Getenv("GRPC_SVC"), os.Getenv("POD_NAMESPACE")); len(host) > 0 {
        serverHost = host
    }

servAddr := fmt.Sprintf("%s:%s", serverHost, serverPort)
Enter fullscreen mode Exit fullscreen mode

Headless service

  • also earlier while replicating the issue the service (greetserver) we created for Greet server pods was of normal ClusterIP type. The headless ClusterIP service is required for this solution.
apiVersion: v1
kind: Service
metadata:
  labels:
    run: greetserver
  name: greetserver
  namespace: default
spec:
  ports:
    - name: grpc
      port: 50051
      protocol: TCP
      targetPort: 50051
  selector:
    run: greetserver
  clusterIP: None
Enter fullscreen mode Exit fullscreen mode

One significant thing to notice over here is that this is a special type of ClusterIP service called Headless service.

In this service kind, the type of service is not specified. By default the type becomes ClusterIP. Which means the service becomes available within cluster.

You can set .spec.clusterIP, if you already have an existing DNS entry that you wish to reuse.

In case you set .spec.clusterIP to None, it makes the service headless, which means when a client sends a request to a headless Service, it will get back a list of all Pods that this Service represents (in this case, the ones with the label run: greetserver).

Kubernetes allows clients to discover pod IPs through DNS lookups. Usually, when you perform a DNS lookup for a service, the DNS server returns a single IP — the service’s cluster IP. But if you tell Kubernetes you don’t need a cluster IP for your service (you do this by setting the clusterIP field to None in the service specification ), the DNS server will return the pod IPs instead of the single service IP. Instead of returning a single DNS A record, the DNS server will return multiple A records for the service, each pointing to the IP of an individual pod backing the service at that moment. Clients can therefore do a simple DNS A record lookup and get the IPs of all the pods that are part of the service. The client can then use that information to connect to one, many, or all of them.

Basically, the Service now lets the client decide on how it wants to connect to the Pods.

Verify headless service DNS lookup

create the headless service:

kubectl create -f greet.svc.yaml
Enter fullscreen mode Exit fullscreen mode

create an utility pod:

kubectl run dnsutils --image=tutum/dnsutils --command -- sleep infinity
Enter fullscreen mode Exit fullscreen mode

verify by running nslookup command on the pod

kubectl exec dnsutils --  nslookup greetserver

<<com
Result

Server:         10.96.0.10
Address:        10.96.0.10#53
Name:   greetserver.default.svc.cluster.local
Address: 172.17.0.4
Name:   greetserver.default.svc.cluster.local
Address: 172.17.0.3
Name:   greetserver.default.svc.cluster.local
Address: 172.17.0.2
Enter fullscreen mode Exit fullscreen mode

As you can see headless service resolves into the IP address of all pods connected through service.

Contrast this with the output returned for non-headless service.

kubectl exec dnsutils --  nslookup greetclient

<<com
Server:     10.96.0.10
Address:    10.96.0.10#53

Name:   greetclient.default.svc.cluster.local
Address: 10.110.255.115
com
Enter fullscreen mode Exit fullscreen mode

Now lets test the changes by making curl requests to the exposed ingress.

Request#1

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-k6zbl), ip(172.17.0.3).
com
Enter fullscreen mode Exit fullscreen mode

Request#2

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-67bmd), ip(172.17.0.4).
com
Enter fullscreen mode Exit fullscreen mode

Request#3

curl --request POST \
  --url http://greet.com/greet \
  --header 'Content-Type: application/json' \
  --data '{
    "first_name": "Hitesh",
    "last_name": "Pattanayak"
}'

<<com
Response

reponse from Greet rpc: Hello, Hitesh Pattanayak from pod: name(greetserver-deploy-7595ccbdd5-l8kmv), ip(172.17.0.2).
com
Enter fullscreen mode Exit fullscreen mode

The issue no longer exists.

But what we are losing here is the capability of gRPC to retain connections for a longer period of time and multiplex several requests through them thereby reducing latency.

gRPC lookaside load balancing

Earlier we discussed about:

  • Load balancing challenge with gRPC
  • How to address the above challenge via client side load balancing

Even though we were able to resolve the load balancing issue but we traded off one of the major advantage of gRPC which is long duration connections.

So in this post we would like the achive load balancing (still client side) but we are gonna not trade off the above mentionde gRPC's advantage.

I would like to re-iterate when I say onus to load balance falls on client side, client does not mean end user. All gRPC servers have a REST gateway that is used by end users. gRPC services are not directly exposed because of lack of browser support.

Lookaside load balancer

The purpose of this load balancer is to resolve which gRPC server to connect.

At the moment this load balancer works in two ways: round robin and random.

Load balancer itself is gRPC based and since the load is not going to be too much only one pod would suffice.

It exposes a service called lookaside and an rpc called Resolve which expects the type of routing along with some details about the gRPC servers like kubernetes service name and namespace they exist in.

Using the service name and namespace, it is going to fetch kubernetes endpoints object associated with it. From the endpoint object server IPs can be found.
Those IPs are going to be stored in memory. Every now and then those IPs would be refreshed based on interval set. For every request to resolve IP, it is going to rotate the IPs based on the routing type in the request.

Code for lookaside load balancer can be found here.

We are using the image hiteshpattanayak/lookaside:9.0 for lookaside pod.

The pod manifest would be like this:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: lookaside
  name: lookaside
  namespace: default
spec:
  containers:
    - image: hiteshpattanayak/lookaside:9.0
      name: lookaside
      ports:
        - containerPort: 50055
      env:
        - name: LB_PORT
          value: "50055"
Enter fullscreen mode Exit fullscreen mode

since it is too a gRPC server, the exposed port is 50055.

The service manifest that exposes the pod is as below:

apiVersion: v1
kind: Service
metadata:
  labels:
    run: lookaside-svc
  name: lookaside-svc
  namespace: default
spec:
  ports:
    - port: 50055
      protocol: TCP
      targetPort: 50055
  selector:
    run: lookaside
  clusterIP: None
Enter fullscreen mode Exit fullscreen mode

I chose headless service for this as well but there is no such need for this.

Updated the ClusterRole to include ability to fetch endpoints and pod details

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: service-reader
rules:
  - apiGroups: [""]
    resources: ["services", "pods", "endpoints"]
    verbs: ["get", "watch", "list"]
Enter fullscreen mode Exit fullscreen mode

Changes with Greet Client

Greet Client is now integrated with lookaside loadbalancer.

The client is set to use RoundRobin routing type but can be made configurable via configmap or environment variables.

Removed setting load-balancing policy and forcefully terminating connection by setting WithBlock option while dialing.

from

conn, err := grpc.Dial(
    servAddr,
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
    grpc.WithBlock(),
    opts,
)
Enter fullscreen mode Exit fullscreen mode

to

conn, err := grpc.Dial(
    servAddr,
    opts,
)
Enter fullscreen mode Exit fullscreen mode

So how does it solve the earlier load balancing problem where we traded off terminating long duration connections for the sake of load balancing.

What we did was to store the previous connections to the server and reuse it but rotate for each request.

if c, ok := a.greetClients[host]; !ok {
    servAddr := fmt.Sprintf("%s:%s", host, serverPort)

    fmt.Println("dialing greet server", servAddr)

    conn, err := grpc.Dial(
        servAddr,
        opts,
    )
    if err != nil {
        log.Printf("could not connect greet server: %v", err)
        return err
    }

    a.conn[host] = conn

    a.currentGreetClient = proto.NewGreetServiceClient(conn)
    a.greetClients[host] = a.currentGreetClient
} else {
    a.currentGreetClient = c
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

gRPC is a great solution for microservice internal communication because of efficiency, speed and parity. But the long duration connections though an advantage results in tricky load balancing. With the help of this article we found ways to handle it.

There are ways to handle via service meshes like Linkerd and Istio. But it would be handy to have solutions incase where service meshes are not setup.

Folks, if you like my content, would you consider following me on LinkedIn.

Top comments (0)