As microservices observability becomes more and more of a challenge for engineers, companies are turning to distributed tracing solutions.
Now, they have to decide whether to buy or to build their distributed tracing infrastructure. While building a solution from scratch might sound compelling, and can even be supported with open source Jaeger and OpenTelemetry, buying solutions provide more features and easier usage.
Let’s see how the two compare and when you should choose each one. But first, let’s remind ourselves about what distributed tracing is.
Logging is a difficult and time-consuming practice, which doesn’t always provide the relevant information for solving performance issues and regressions. Traces complement logs, by presenting the relationships between services and components and tracking requests across them. It’s no surprise, then, that many developers are adding tracing to their workflows, and engineering departments are using it to reduce their MTTR.
But we live in a world of microservices, and we need tracing to adapt.
That’s where distributed tracing comes in. Distributed tracing is tracing that is suited for microservices. It provides observability into the microservices architecture. For example, it shows why failed integrations between components occur. These insights are used for faster troubleshooting and accelerating time-to-market.
Read more about distributed tracing, its advantages, and when to use it in this blog post.
So you’ve determined your organization needs a distributed tracing solution. Congratulations! Now you need to decide whether you’re going to build one or to buy one.
Building a solution means the product is built internally, from the ground up. This includes researching the problem, designing the tracing solution with open source solutions like Jaeger, using OpenTelemetry to instrument, gather and analyze data, allocating resources to build it, and continuously maintaining it.
One of the most famous examples of building an internal distributed testing solution is Netflix, whose developers wrote a blog post about the progress they’ve made four years after they started building it.
But even if you’re not Netflix, you can probably still build your own basic tracing solution.
What if you need a tracing solution that can scale and provide lots of added value? Then, you might want to look into buying a solution.
This means purchasing a ready-made out-of-the-box distributed tracing solution. You can either choose a managed solution, i.e a solution that manages open source (like Jaeger) for you. Or, an unmanaged solution, i.e a distributed tracing solution from a vendor, with unique features and capabilities.
Whichever you choose, buying a solution enables immediate implementation, observability, and insights.
Let’s dive deeper into how these two types of solutions compare.
You’re on the quest for data. Luckily, you’ve come to the right place. However, maintaining and storing all that data isn’t always easy, or easy on your resources. Scaling your tracing operation is another huge challenge.
Ingesting and storing large amounts of data from many services requires a reliable infrastructure. This infrastructure should enable scaling the amount of traces collected, processing them, and storing them in a database like ElasticSearch.
Building a solution requires constant operation and maintenance as you scale your data collections. Buying a solution ensures your data is scalable and available, no matter the load or requirements.
Built and bought solutions offer different features and capabilities. With built solutions, you can design the system according to your specific needs and requirements. On the other hand, buying solutions provide out-of-the-box capabilities that are meant to give you value beyond your basic needs. There is usually a wide variety, and they are built on industry best practices and requirements from multiple companies.
Built solutions require many resources for configuring, designing, managing, maintaining, and operating. While this might seem more cost-effective, the truth is that when you build your own, you know where you start – but not where you’ll end up. You have to factor in the time developers need to learn and build the solution, as well as the cloud resources cost.
Buying a solution, on the other hand, has a fixed cost. The price might seem higher, but everything is taken care of. You can also always start out with a free trial or free tier to make sure the solution answers your needs.
Security is shifting left, making it more and more the responsibility of developers. When building a distributed tracing solution, you also have to take security controls into consideration. Data privacy, reducing the attack surface and mitigating vulnerabilities have to be designed. Purchased solutions, on the other hand, do the security heavy lifting. They take security concerns off your plate and offer it out-of-the-box.
Sure, your super developer powers can build anything. But should they? When developers build a distributed tracing solution internally, they’re diverted from the core business. Why not let other developers, who work at companies where distributed tracing is the core business, build the solution for you?
When distributed tracing solutions are built internally, the engineers who built them control the knowledge and accessibility to the product. Buying a solution enables more democratization since access and training is available for all.
Unless you are a large company with available resources and unique architecture, or an individual looking to test the water of distributed tracing (in which case maybe a hybrid model would also work for you), we believe buying is the way to go.
Here are some of the tools you can choose from for both building and buying a distributed tracing solution:
Jaeger is an open-source, distributed tracing tool that you can use as the basis for building your own distributed tracing solution. You can use Jaeger for transaction monitoring, latency optimization, root cause analysis, dependency analysis, and context propagation. Jaeger can run on any infrastructure and leverage OpenTelemetry regardless of the code language.
If you’re into Jaeger but not into getting your hands dirty with it, managed solutions will bring the hand sanitizer and do it for you. With such solutions, you get Jaeger’s capabilities, but the vendor packs them up for you in dashboards that already provide you with insights and observability.
It’s also easier to get started than having to download Jaeger and building your infrastructure from scratch.
Aspecto is a distributed tracing platform with plenty of new features and a unique UI. Aspecto helps developers find, fix, and prevent distributed application issues across the entire development cycle.
It’s the Chrome DevTools for your distributed applications.
OpenTelemetry based, Aspecto allows developers to prevent issues before they reach production by implementing telemetry data that learns the system, then compares what they do locally to the production, staging, or other locals baseline data. You can easily install Aspecto, for free, with a one-liner SDK, or give the Live Playground a spin.
And if you’re already using the OpenTelemetry collector, you can use it to export traces to the Aspecto OTEL collector.
Building your distributed tracing solution might sound like a nice challenge. Unless you’re a huge company with endless resources and unique architecture – you might be embarking on a very costly and ineffective adventure.
Buying a tracing solution offers you all the capabilities and observability you need right when you get started. You can buy Jaeger-based managed solutions or get more feedback and visibility with tons of additional features through solutions like Aspecto.
We recommend getting started with distributed tracing as soon as you can, no matter your choice. Make your microservices management and troubleshooting a lot more effective with tracing.