DEV Community

Cover image for Why we built our streaming data platform in C++
The Team @ Redpanda for Redpanda Data

Posted on • Originally published at redpanda.com

Why we built our streaming data platform in C++

We're reinventing and expanding what was previously possible with data streaming by building a platform from the ground up for cloud-native computing platforms and by designing a system that’s easy to use, even for non-experts. 

For years, there have been inefficiencies in infrastructure that result in a significant amount of computer waste, but hardware is fundamentally different today than it was a decade ago, as this article by Avishai Ish-Shalom eloquently explains. Disk speeds, for example, grew by 1,000 times over the past 10 years. Processing capabilities have also significantly increased alongside developments in core processing. 

Despite these critical improvements in computing hardware, today's software hasn't caught up - it's still engineered for a decade-old paradigm computer platform. That sets up a disparity between hardware and software that’s difficult to reconcile. 

At Redpanda, we firmly believe that the only true platform is the hardware, so we asked ourselves if we were to design software for modern hardware, what could we do differently? The answer is Redpanda. 

Redpanda differs from other projects of its kind by streamlining the complexity of the program and by presenting a simple interface to the user. We cannot entirely remove complexity from the system, but we can move it around. Because our developers are the experts, it makes more sense for us to own the complexity rather than push it down to the end user. 

We do this by focusing on two core principles to shift that complexity: Redpanda needs to function well without constant human attention, and the results and output need to be predictable. 

The advantages of C++ 

For Redpanda to pull this off, we chose to use a programming language that both allows direct communication with hardware and has predictable latencies. 

We wrote the early Redpanda prototypes in several different programming languages, but only C++ gave us the ability to create a developer and user experience aligned with our goals. It allows Redpanda to extract every ounce of performance from the available hardware while also maintaining predictability. 

Most programmers only view performance in terms of latency averages, and we think that's an inefficient metric. Latency is measured in percentiles, and there’s no way to measure the average of a percentile. It’s math that doesn’t tell a useful story. 

Instead of focusing on the latencies at 99.9%, 99.99%, or even 99.999%, we focus on the entire 0-100% latency distribution. It’s not enough to look at the experience of 99.999% of the transactions – we need to fix the problems that show up at the 100th percentile. When a system is processing millions of messages a second, the difference between 99.999% and 100% matters. C++ provides a high level of tail latency predictability. 

Another benefit of C++ is its stable and mature repository of libraries. Redpanda only uses a few dozen libraries, while other comparatively sized projects use hundreds of dependent libraries. Having so many dependencies weakens the security posture of the software. We avoid vulnerabilities by utilizing C++ libraries that have worked for decades and which are very good at finding precise information.

C++ also allows us to control as much as possible from the platform. Through the efficiency of our own code, combined with the amazing Seastar framework and other best-in-class libraries, Redpanda speaks directly to the hardware. It only depends on the Linux kernel to launch the process, after which Redpanda is very deterministic in terms of performance, runtime characteristics, memory utilization, and CPU speed. We own the entire end-to-end experience, which provides safety and allows Redpanda to build impactful products. 

Building the best present and future for streaming data

Redpanda creates new possibilities for developers, like what airplanes did for passenger liners. Ships are a slower mode of transportation, even if they're reliable, and even today, you can take a passenger ship from New York to London. Transcontinental travel used passenger ships for centuries, but when airplanes came into existence, they fundamentally changed the way people traveled. In doing so, air travel invented entire industries that people had never thought of before. 

That’s the impact of Redpanda on where the streaming industry is headed. 

We discovered that when you give programmers a new infrastructure primitive like Redpanda, something that's fast, predictable, and geared towards zero data loss, it expands the realm of possibilities about what they can do. Although Redpanda was initially designed to be a replacement for Kafka, it has started to transition into operational workloads. 

For us, discovering new ways that developers are using Redpanda is probably the most exciting aspect of our job. For example, a satellite currently in orbit is running Redpanda, and the Alpaca platform uses Redpanda to trade millions of dollars in securities every single day. Redpanda will soon power the process of monitoring both a pregnant mother's heartbeat and her baby's vital signs during labor. 

Redpanda Data is the only company that can cross this chasm and move to the foreground of operational workloads. 

Redpanda expands the toolset for developers and crosses multiple computing paradigms, allowing us to expand what's possible in software development and operational workloads.

While we couldn't have imagined this when we started, we can't wait to hear what developers are going to build tomorrow. Take Redpanda for a test drive today, and introduce yourself to the Redpanda Community in Slack.

Top comments (0)