Ethan J. Jackson

Posted on Feb 1, 2020

The Dark Side of Microservices

#microservices #distributedsystems #architecture #devops

There is an endless supply of blog posts, white papers, and slide decks, evangelizing the virtues of microservices. They talk about how microservices “increase agility,” are “more scalable,” and promise that when you make the switch, engineers will be pounding at your office door looking for a job.

Let’s be clear, though occasionally exaggerated, the benefits of microservices can be real in some cases. Particularly for large organizations, with many teams, microservices can make a lot of sense. However, microservices aren’t magic -- for all of their benefits, they come with significant drawbacks. In this post, I’ll describe how the distributed nature of microservices makes them inherently more complex.

Distributed Systems Are Hard

A distributed system is any collection of computers that work together to perform a task. Microservices are simply a type of distributed system designed for delivering the backend of a web service.

Since the early days of distributed systems research going back to the 70s. We’ve known that distributed systems are hard. From a theoretical perspective, the difficulty mostly arises from two key areas: consensus and partial failure.

Consensus

From a theoretical perspective, the fundamental issue with building workable distributed systems comes down to the problem of consensus – agreement on distributed state. Nearly all distributed systems research attempts to grapple with this problem in some way. Paxos, Raft, Vector Clocks, ACID, Eventual Consistency, Map Reduce, Spark, Spanner, and most other significant advances in this area all are fiddling with the tradeoff between strong consensus and performance in some way.

To better understand the problem of distributed consensus, let’s illustrate with an example. Suppose Bob asks Server_1 to write x=5 while concurrently Jill asks Server_2 to write x=6, does x equal 5 or 6? Naively, one could look at the time x=5 occurred, and the time x=6 occurred, and choose whichever happened last. But how do you determine the time the writes happened. Look at a clock? Whose clock? How do you know that clock is accurate? How do you know, Bob, Jill, Server_1, and Server_2 agree with that clock? Clocks are notoriously out of sync, and (as Albert Einstein taught us), that's not fixable[1]. For that matter, does everyone really need to agree on the value of x? If so, how much agreement? How long should the agreement take? What if Bob dies while trying to reach agreement? It gets complicated.

So, given that distributed consensus is hard, how does this problem manifest in the context of microservices? Good microservice implementations tend to sidestep the issue altogether by simply disallowing shared state. In the case of our above example, there exists no x such that two microservices need to agree on the value of x at any particular point in time. Instead, all shared state in the system is punted to an external database or the container orchestrator.

This approach both does and doesn’t solve the consensus problem. It doesn’t solve the problem in the sense that, from a theoretical perspective, there still is shared state that still requires management. You’ve just moved it. By the way, this is why Kubernetes and databases are so darn complicated.

The approach does solve the problem in that, from a practical perspective, Kubernetes and databases are better at managing shared state than most microservices. Those systems are designed by engineers who spend all day every day thinking about these issues. As a result, they’re more likely to get consensus right.

Partial Failure

Consider an HTTP request serviced by a monolith. When the request is received, a single server handles the transaction from beginning to end. If there is a problem, be it a software bug or hardware failure, the entire monolith crashes – every failure is a total failure.

Now consider the same HTTP request coming into a microservice. That microservice may send new requests to other microservices who, in turn, may generate more requests going to yet more microservices. Now suppose one of those microservices fails. What now? One or more microservices are depending on the data that microservice was preparing. What should they do? Wait for a while? How long? Try again? Try someone else? Who else? Give up and do their best with the data they’ve got? Microservices must be engineered to handles these issues, again making them more challenging to develop.

Partial failure has been described as an unqualified good thing. The thinking goes, by supporting partial failure, an application becomes more resilient – small problems can be papered over gracefully. In my opinion, the benefits are small, rarely obtained in practice, and come at the expense of vastly increased implementation complexity.

More Moving Pieces

In addition to the theoretical challenges of microservices, there’s also just a lot of them. Having so many moving pieces complicates nearly every part of the stack and every part of the software development lifecycle.

Development
You can typically run a monolith directly on your laptop. Getting microservices to work on a local machine requires more specialized tools such as docker-compose and minikube. Furthermore, they’re CPU and memory intensive, making them painfully slow on a laptop. Note, check out Kelda, and specifically our whitepaper for a detailed description of this problem.

Debugging
Everything happening in a monolith occurs in a single process. You can attach the debugger of your choice, and you are off to the races. With microservices, a single request may be spread across dozens of different processes. Distributed tracing tools like Jaeger may help, but it’s still a challenge.

Logging
With a monolith, you can store logs in a file and grab them when needed. With microservices, you need a tool like Splunk or the ELK stack to handle this for you.

Monitoring
Simple on-server monitoring tools like Nagios don’t scale when you’ve got hundreds of microservices. Again, better tools (Prometheus/Datadog/Sysdig, etc.) make the problem tractable, but it’s still hard.

Deployment
Tools like Chef and Puppet are good enough for deploying a monolith, but for microservices, you need something much more sophisticated like Kubernetes.

Networking
Monoliths can be handled with a simple load balancer. Microservices have many more endpoints, all of which require load balancing, service discovery, consistent security policy, etc. I suppose service mesh can help with this (I’m not convinced, but that’s a topic for a future post).

Microservices Make Sense, Sometimes

From a technical perspective, microservices are strictly more difficult than monoliths. However, from a human perspective, microservices can have an impact on the efficiency of a large organization. They allow different teams within a large company to deploy software independently. This means that teams can move quickly without waiting for the slowest common denominator to get their code QA’d and ready for release. It also means that there’s less coordination overhead between engineers/teams/divisions within a large software engineering organization.

And while microservices can make sense, the key point here is that they aren’t magic. Like nearly everything in computer science, there are tradeoffs — in this case, between technical complexity for organizational efficiency. A reasonable choice, but you better be sure you need that organizational efficiency, for the technical challenges to be worth it.

[1]: Yes, of course, most clocks on earth aren’t moving anywhere near the speed of light. Furthermore, several modern distributed systems (notably Spanner), rely on this fact by using extremely accurate atomic clocks to sidestep the consensus issue. Still, these systems are, themselves, extremely complicated, proving my point: distributed consensus is hard.

Kelda makes microservices easier for developers on Kubernetes.

Join our Slack community!

Top comments (5)

Timothy McGrath • Feb 1 '20

Good post!
A microservices approach is hard. We are somewhere in the middle right now, we run multiple backend services because the scale of our app is large, and some services are request/response from a client while others are running continuous processes based on other inputs.

The multiple services helps a lot with smaller, faster deployments, more specific responsibility, and being able to load balance more intensive services.

But it does add complexity to debugging, and isolating run-time issues. It is also harder for new developers to get up to speed if they are used to just executing a monolith with a web app attached to it.

We considered moving closer to a pure microservice solution but complexity immediately ratcheted up as we experimented with things like Docker, Kubernetes/ServiceFabric, total isolation of the service by using data redundancy.

All of the microservice concepts make a lot of sense, but they all introduce trade-offs. Our current plan is to use microservice patterns to solve specific issues that our existing architecture has instead of trying to be a microservice purist.

Ethan J. Jackson • Feb 1 '20

You said something in your post here that really stuck out to me btw. You mention that you’re “considering moving closer to a pure microservice solution”. I think that’s really key ... folks (myself included) write about microservices as if it’s an all or nothing thing. But in reality, you can partially adopt microservices where it makes sense, and not where it’s doesn’t. I suspect the place where most folks should start, is shove their monolith in a container, and add microservices around it slowly as needed, rather than going through a huge rearchitecture.

Jordan Moore • Feb 2 '20

Ethan brings up a good point about it not being an all or nothing effort... In fact, there's recommended patterns about refactoring monoliths into separate services over at microservices.io/refactoring/index...

Docker is mostly the tip of the iceberg, in my mind, and Kubernetes is not the only option for running those containers. For example, k3s or Swarm are a more simple mental model to run in a small environment.

As far as observability and debugging are concerned, if you clearly define the scope of each services and their dependencies, and additionally decouple them with the help of message brokers rather than all point to point communications, it should be possible to easily inject log gathering, metrics, and distributed tracing on almost each component as part of a "chassis" for which each component of your infrastructure runs. Once you define the build pipelines and runtime environments for each service, and document and distribute knowledge about those decisions, it'll ultimately help democratize the deployments and development of each business unit, whose cost is amortized over the time it takes to gain knowledge in these areas.

And of course, there's companies available that'll teach or otherwise make that burden more manageable.

Erebos Manannán • Feb 2 '20

It seems you fail to make a clear and valid point in the complaints about microservices being "difficult".

Consensus

This has nothing to do with microservices. If you have a web server, it likely accepts more than 1 connection at the same time, now you have to deal with the race condition problems. Doesn't matter AT ALL if you're building a monolithic or a microservice architecture. Use distributed locks (e.g. etcd), trustworthy update operations (count += 1 vs count = old_count + 1), and make it clear when the response value is not guaranteed to be exact.

Your claim of "disallowing shared state" is also just plain wrong. There is often a database, etc. for sharing state - but what is disallowed is poking from one service to another service's state by bypassing the API.

Partial Failure

Again, complete misrepresentation. If there is a problem ... the entire monolith crashes – every failure is a total failure, flat out false. There are often bugs that cause partial failures in monolith applications. Some part of the API might work, while others do not. If it does multiple operations during one API call, some might succeed (e.g. taking money away from a bank account) while others do not (depositing the money to another bank account), leading to the exact same problems having to be solved in case of a monolithic application as well.

The benefit of microservices here is that if your search API or other such less important bits and pieces crash and burn, the critical parts of your system might still work fine.

Development

You can typically run a monolith directly on your laptop - whoah, if you do that there's much bigger issues that you have than monolith vs. microservice. Learn to use controlled development environments, Vagrant, Docker, etc. will help you. If it takes more than a couple of clear commands from your README.md akin to vagrant up, you're doing it wrong.

Claims about them being "CPU intensive" are just random FUD as well. Get a laptop from this century, which has decent amounts of RAM, a modern CPU, and most importantly an SSD, and you won't have any issues. If your software has "hundreds of microservices", then if it was built as a monolithic application instead, you also couldn't run it, and it would be incomprehensible to everyone. With microservices you'd at least allow for people to build simple mocks and other such things for the parts that are less important to run when developing the pieces they are working on.

If you don't know which service you're having a problem with before playing around with a debugger, you also have bigger issues to worry about.

On logging, jesus, With a monolith, you can store logs in a file and grab them when needed., just no. You can't. Have you ever heard of e.g. hardware failure? Regardless of how you deploy your application, if you are sane, you will set up the same logging processes to collect the logs from various sources. Also when building "monolithic" you will run multiple instances of the API to handle high availability, etc. needs, and now you already don't have a single log to "grab when needed".

When building microservices on Kubernetes though, guess what, no need for complicated machinery to fetch your logs from multiple machines - you log to stdout and stderr and set up Kubernetes logging and voilá all your logging is taken care of. This is typically automatically handled by hosted Kubernetes instances in e.g. Azure.

On Deployment tools yes - you shouldn't use bad tools like Chef and Puppet in general, if you need tools like that at least get up to date and use Ansible/Salt Stack. Docker + Kubernetes are going to make your deployments easier and faster though when you spend the little effort it takes to get to know them, so I wouldn't say it's a bad thing in any way. Dockerfile with e.g. a few shell scripts is much easier to follow than a big bunch of ruby scripts managed by a complicated configuration management tool, and you're much better able to control the desired state.

Networking again is such a random weird argument. Kubernetes takes care of all of the internal load balancing etc. out of the box, easily.

Microservices also keep every piece you're working with a lot simpler, so it's easier to develop on it. Monoliths tend to end up with 20,000 line files where people have no idea what code is dead code, what is important, and what is required by what. With microservices it's much easier to keep your code under control. Something getting too complicated? Think about separating it to another service.

In short: weird invalid complaints made just with the goal to try and sell your product.

Ethan J. Jackson • Feb 2 '20

Hey thanks for the feedback! Glad this post has engendered some debate =)

Couple of specific points:

Yep of course you can use etcd for distributed locks. Etcd is a Raft implementation which does distributed consensus. The problem doesn't go away, it's just moved.
Good point on the issue of monoliths and partial failure. Even monolithic applications will typically have multiple components (databases, load balancers etc). I still think the problem is more pronounced in the case of microservices ...

And my response to nearly all of your criticisms can be summed up as this: yes of course, nearly all of the problems that exist in microservices, exist in monoliths as well. The issue is simply that microservices make a lot of those problems worse. I.E. yes you need log management for a monolith, but it's absolutely crucial once you've got 40 microservices running.

All of that said, I really do genuinely appreciate the criticism. I'm just some guy on the internet, and these are just my opinions – we all learn from the back and forth.

DEV Community

The Dark Side of Microservices

Distributed Systems Are Hard

Consensus

Partial Failure

More Moving Pieces

Microservices Make Sense, Sometimes

Kelda makes microservices easier for developers on Kubernetes.

Join our Slack community!

Top comments (5)

Read next

SaaS Cost Management: Strategic Tips for Modern Enterprises

How to Create a Business Model Around an API

What is JSON Merge Patch?

5 Signs You’ve Built a Secretly Bad Architecture (And How to Fix It)