AutoMQ

Posted on Aug 5 • Edited on Aug 6

Introducing AutoMQ: a cloud-native replacement of Apache Kafka

#opensource #productivity #cloud #tooling

Author: Xinyu Zhou, AutoMQ CTO

AutoMQ is a Kafka alternative designed with a cloud-first philosophy. AutoMQ innovatively redesigns the storage layer of Apache Kafka based on the cloud, bringing a 10x cost reduction and a 100x increase in elasticity while being 100% compatible with Kafka by separating persistence to EBS and S3. It also has better performance than Apache Kafka. Low latency, high throughput, low cost, easy to use, all in one. The community edition of AutoMQ is source code available on Github, and you can deploy and test AutoMQ for free now.

The Growing AutoMQ Community

The AutoMQ community is a vibrant and diverse group of individuals and organizations committed to the growth and development of AutoMQ. As a source-available software on GitHub, AutoMQ has amassed an impressive following. With 2900+ stargazers and counting, the community's enthusiasm for our project is palpable.

Our community's diversity and engagement are testaments to the broad appeal and applicability of AutoMQ. We're excited to continue fostering this dynamic community, driving innovation, and shaping the future of "Data in Motion" together.

The Evolution of the Streaming World

The stream storage industry has undergone a significant transformation over the past decade, marked by technical evolution and the emergence of innovative solutions.

Kafka is the Begining: Apache Kafka, birthed a decade ago, marked the beginning of a new era in stream storage. Kafka integrated advanced technologies of its era, such as the append-only log and zero-copy technique, which dramatically enhanced data writing efficiency and throughput.
Commercial Leads Innovation: As the industry matured, commercial opportunities began to surface. Companies like Confluent and Redpanda emerged, driving technical innovations in the Kafka ecosystem. Confluent introduced significant architectural innovations, namely KRaft and Tiered Storage, which streamlined the architecture and substantially reduced storage costs. Redpanda rewrite Kafka in the native language CPP and replacing the ISR with the Raft replication protocol to achieved lower tail latency. They are both based on a Shared-Nothing replication architecture and have adopted tiered storage optimization.
Cloud Reshapes Architecture:The advent of cloud-native technologies has further reshaped the stream storage industry. Warpstream has rewritten Kafka in Go language, with a storage layer fully built on S3. It achieves a cloud-native elastic architecture by sacrificing latency and is compatible at the Kafka API protocol level. AutoMQ innovatively redesigns and implements the storage layer of Apache Kafka based on the cloud. On the basis of 100% compatibility with Kafka, it achieves a 10x cost reduction and a 100x elasticity improvement by separating persistence to EBS and S3, without sacrificing any latency and throughput performance.

Truly Cloud-Native Architecture of AutoMQ

The cloud-native architecture of AutoMQ is a result of careful design decisions, innovative approaches, and the strategic use of cloud storage technologies. We aimed to create a system that could leverage the benefits of the cloud while overcoming the limitations of traditional stream storage solutions.

Decoupling Durability to Cloud Storage

The first step in realizing the cloud-native architecture of AutoMQ was to decouple durability to cloud storage. Unlike the typical decoupling of storage, where we refer to separating the storage to a distributed and replicated storage software, decoupling durability takes it a step further. In the former case, we are left with two types of clusters that need to be managed, as seen in Apache Pulsar, where you need to manage both the broker cluster and the bookkeeper cluster.

However, AutoMQ has taken a different route, opting to decouple durability to cloud storage, with S3 serving as the epitome. S3 already offers a durability rate of 99.999999999%, making it a reliable choice for this purpose. In the realm of cloud computing, merely decoupling storage is insufficient; we must also decouple durability to cloud storage.

The essence of the Decoupling Durability architecture lies in its reliance on cloud storage for durability, eliminating the need for replication protocols such as Raft. This approach is gaining traction over the traditional Decoupling Storage architecture. Guided by this philosophy, we developed S3Stream, a stream storage library that combines the advantages of EBS and S3.

Stateless Broker with S3Stream

With S3Stream in place, we replaced the storage layer of the Apache Kafka broker, transforming it from a Shared-Nothing architecture to a Shared-Storage architecture, and in the process, making the Broker stateless. This is a significant shift, as it reduces the complexity of managing the system. In the AutoMQ architecture, the Broker is the only component. Once it becomes stateless, we can even deploy it using cost-effective Spot instances, further enhancing the cost-efficiency of the system.

Automate Everything for Elasticity

The final step in realizing the cloud-native architecture of AutoMQ was to automate everything to achieve an elastic architecture. Once AutoMQ became stateless, it was straightforward to automate various aspects, such as auto-scaling and auto-balancing of traffic.

We have two automated controllers that collect key metrics from the cluster. The auto-scaling controller monitors the load of the cluster and decides whether to scale in or scale out the cluster. The auto-balancing controller minimizes hot-spotting by dynamically reassigning partitions across the entire cluster. This level of automation is integral to the flexibility and scalability of AutoMQ, and it is also the inspiration behind its name.

Moving Toward Multi-Cloud Native Architecture

As we move toward a multi-cloud native architecture, the need for a flexible and adaptable storage solution becomes critical. AutoMQ's shared storage design is an embodiment of this flexibility, designed to integrate seamlessly with a variety of cloud providers.

Shared Storage: WAL Meets Object Storage

At the heart of this design lies the concept of S3Stream, a shared stream storage repository. It is essentially composed of a shared Write-Ahead Log (WAL) and shared object storage.

Data is first persistently written to the WAL and then uploaded to object storage in near real-time. The WAL does not provide data reading capabilities. Instead, it serves as a recovery mechanism in the event of a failure. Consumers read data directly from S3. To enhance performance, a memory cache is implemented for acceleration, which means that tailing-read consumers do not need to access object storage directly.

This architecture of S3Stream is highly flexible due to the variety of mediums that can be used for the WAL. For instance, EBS, Regional EBS, S3, or even a combination of these can be used to form a Replication WAL. This flexibility is primarily due to the varying capabilities of cloud storage offered by different cloud providers. The aim is to pursue an architecture that is optimal across multiple cloud providers.

Adapting Architecture to Different Cloud Providers

The architecture of AutoMQ's shared storage model is designed to be adaptable to the specific capabilities of different cloud providers. The choice of architecture depends primarily on the specific features and services offered by each cloud provider.

For instance, Azure, Google Cloud, and Alibaba Cloud all provide regional EBS. Given this feature, the best practice for these cloud providers is to use regional EBS as the WAL. This allows the system to tolerate zone failures, ensuring reliable and consistent performance.

In contrast, AWS does not offer regional EBS. However, AWS does provide S3 Express One Zone, which boasts single-digit millisecond latency. Although this service is limited to a single availability zone, AutoMQ can still ensure tolerance to zone failures by using a replication WAL. In this setup, data is written both to the S3 One Zone bucket and an EBS volume.

In cases where you have access to a low-latency alternative to S3 or your business can tolerate hundreds of milliseconds of latency, it is possible to use S3 as the WAL. This means the entire architecture relies solely on S3 for both WAL and data storage. Yes, AutoMQ also provides a warpstream-like architecture easily.

By understanding and leveraging the unique features of each cloud provider, AutoMQ ensures optimal performance and reliability across a variety of cloud environments. This flexibility and adaptability are key to the success of a multi-cloud native architecture.

Performance Data and Benefits of AutoMQ

To fully appreciate the capabilities and advantages of AutoMQ, let's take a look at some key benchmark data and performance metrics.

The advantages of AutoMQ compared to Apache Kafka can be summarized as follows:

⚡ 10x cost-effective than Apache Kafka: AutoScaling、Support Spot Instance、Separate Storage to S3. All this make AutoMQ 10x cost-effective than Apache Kafka.
👍 Easy to operate: No need to manage the cluster's capacity yourself. Stateless Broker that can autoscale in seconds. Forget data skew, hot and cold data competition. Self-blancing fixes them all automatically.
🚀 High performance: Single digit ms latency with high throughput as Apache Kafka, but with much better catch-up reads performance.
😄 Easy to migrate: 100% Compatible with Apache Kafka, so you don't need to change anyting you already have. Access to the new bootstrap server endpoint and all things are done.

10x Cost Effective

AutoMQ's innovative architecture brings unprecedented cost savings in the realm of data-intensive software. Its design focuses on optimizing both computational and storage resources, resulting in a cost advantage that's nearly tenfold compared to traditional solutions.

The first major advantage comes from the optimization of EC2 resources. By eliminating data replication, AutoMQ removes the need for extra resources to manage replication traffic. And, coupled with the platform's elastic nature that dynamically adjusts the cluster size in response to workload, results in a dramatic reduction of EC2 resources—up to 90%.

Furthermore, AutoMQ's stateless architecture allows the use of Spot instances. This strategy leads to a significant cost reduction, further enhancing computational resource savings.

On the storage front, AutoMQ also shines. Instead of adhering to the traditional three-replication EBS storage, it utilizes a single-replica object storage model. This innovative approach reduces storage costs by as much as 90%.

Our detailed cost comparison chart, based on real bill comparisons from stress testing on AWS, illustrates these savings. For more in-depth information, we invite you to access the complete report from our website.

Instant Elastic Efficiency

AutoMQ's shared storage architecture greatly enhances operational efficiency. For example, reassigning partitions in AutoMQ no longer involves data replication and can be completed within seconds, unlike in Kafka where it could take up to several hours. Additionally, when it comes to cluster scaling, AutoMQ can balance the traffic of new nodes with the cluster in just about one minute by reassigning partitions in batches. In contrast, this process could take days with Kafka.

100% Compatibility

Perhaps one of the most important aspects of AutoMQ is its compatibility. We've replaced Kafka's storage layer with s3stream while keeping all the code from the computation layer. This ensures that AutoMQ is fully compatible with Kafka's protocols and features. For instance, newer versions of Apache Kafka that support features such as Compact Topics, Idempotent Producer, and Transactional Messages are fully supported by AutoMQ.

Furthermore, we replace Kafka's storage layer through a very small LogSegment aspect. This approach makes it very easy for us to synchronize code from the Kafka upstream, meaning that we can easily merge new features of Apache Kafka in the future. This is a significant advantage over solutions like WarpStream, where such compatibility and future-proofing can be a challenge.

In summary, AutoMQ's flexible architecture, cost savings, operational efficiency, and compatibility make it a powerful solution for stream storage in the cloud.

Roadmap: streaming data to data lake

In this final section, we outline our vision for the future of streaming data into data lakes, a critical aspect of our roadmap.

The Shift Toward Shared Data

We're witnessing a trend where all data-intensive software eventually stores data on object storage to leverage the benefits of shared storage. However, even with all data stored on object storage, there isn't a straightforward way to share data between different systems. This process typically requires Extract, Transform, Load (ETL) operations and data format conversions.

We believe the transition from shared storage to shared data will be the next critical evolution in modern data technology. Table storage solutions like Delta Lake and Iceberg have unified the data format in the data lake, making this transition feasible.

From Stream to Lake: A Data Journey

In the future, we envision data usage to be a seamless, interconnected process that maximizes data utility and operational efficiency.

The journey begins with data generation. As data is produced in a streaming manner, it is immediately stored in stream storage. This continuous flow of information forms the foundation of our data landscape.

Next, we unlock the real-time value of this data. Tools like Flink Jobs, Spark Jobs, or Kafka consumers dive into the data stream, extracting valuable insights on the fly through the Stream API. This step is crucial in keeping pace with the dynamic nature of the data.

As the data matures and loses its freshness, the built-in Compactor in AutoMQ steps in. Quietly and transparently, it transforms the data into the Iceberg table format. This conversion process ensures the data remains accessible and usable even after it has passed its real-time relevance.

Finally, we arrive at the stage of large-scale analysis. The entire big data technology stack can now access the converted data, using a zero ETL approach. This approach eliminates the need for additional data processing, allowing for direct, efficient analysis.

In conclusion, as we continue to innovate and evolve, our goal remains the same: to provide a powerful, efficient, and cost-effective solution for stream storage in the cloud. By streamlining the process of streaming data to data lakes, we aim to further enhance the value and utility of big data for businesses.

Embracing the Future with AutoMQ

AutoMQ, our cloud-native solution, is more than an alternative to existing technologies—it's a leap forward in the realm of data-intensive software. It promises cost savings, operational efficiency, and seamless compatibility.

We envision a future where data effortlessly streams into data lakes, unlocking the potential of real-time generative AI. This approach will enhance the utility of big data, leading to more comprehensive analyses and insights.

Finally, we invite you to join us on this journey and contribute to the evolution of AutoMQ. Visit our website to access the GitHub repository and join our Slack group for communication: https://www.automq.com/. Let's shape the future of data together with AutoMQ.