Daniele Polencic

Posted on May 2, 2023

How etcd works in Kubernetes

#kubernetes #devops

If you've ever interacted with a Kubernetes cluster in any way, chances are it was powered by etcd under the hood.

But even though etcd is at the heart of how Kubernetes works, it's rare to interact with it directly daily.

In this article, you will explore how it works!

Architecturally speaking, the Kubernetes API server is a CRUD application that stores manifests and serves data.

Hence, it needs a database to store its persisted data, which is where etcd fits into the picture.

According to its website, etcd is:

In addition, etcd has another feature that Kubernetes extensively uses: change notifications.

Etcd allows clients to subscribe to changes to a particular key or set of keys.

The Raft algorithm is the secret behind etcd's balance of strong consistency and high availability.

Raft solves a particular problem: how can multiple processes decide on a single value for something?

Raft works by electing a leader and forcing all write requests to go to it.

How does the Leader get elected, though?

First, all nodes start in the Follower state.

If followers don't hear from a leader, they can become candidates and request votes from other nodes.

Nodes reply with their vote.

The candidate with the majority of the votes becomes the Leader.

Changes are then replicated from the Leader to all other nodes; if the Leader ever goes offline, a new election is held, and a new leader is chosen.

What happens when you want to write a value in the database?

First, all write requests are redirected to the Leader.

The Leader makes a note of the requests but doesn't commit it to the log.

Instead, the Leader replicates the value to the rest of the (followers) nodes.

Finally, the Leader waits until a majority of nodes have written the entry and commits the value.

The state of the database contains the value.

Once the write succeeds, an acknowledgement is sent back to the client.

A new election is held if the cluster leader goes offline for any reason.

In practice, this means that etcd will remain available as long as most nodes are online.

How many nodes should an etcd cluster have to achieve "good enough" availability?

It depends.

To help you answer that question, let me ask another question!

Why stop at 3 etcds, why not having a cluster with 9 or 21 or more nodes?

Hint: check out the replication part.

The Leader has to wait for a quorum before the value is written to disk.

The more followers there are in the cluster, the longer it takes to reach a consensus.

In other words, you trade availability for speed.

If you enjoyed this thread but want to know more on:

And finally, if you've enjoyed this thread, you might also like:

DEV Community