DEV Community

Franklyne Oluoch
Franklyne Oluoch

Posted on

Cross Account MSK Connectivity using AWS PrivateLink

Sometime back, a couple of months ago actually, I got into a call, and someone presented me with an interesting problem. In my line of work, I do more of risk and compliance, and that usually means that I get to know about a lot of stuff. I love working with AWS services, and when I got into this call and saw the architecture diagrams, my eyes lit up, this is exactly what I needed. I quickly opened my bookmarked AWS well architected framework and was ready to roll.

The Problem.

The team was working on a PoC architecture that involved AWS Managed Apache Kafka. The proposed solution required the MSKs to be set up in three regions, with one region acting as a source for the other two remaining regions. Sounds simple enough. I had not worked with AWS MSKs before and I did not know if this was really accurate. On top of that, I had to ensure that the solution adheres to domestic and international privacy laws such as GDPR, and even FedRAMP. This was an interesting problem. So I ended the call and got into another call with one of my teammates who is much more experienced as a security architecture, she always has the answers. We brainstormed for a bit and got some really interesting stuff. The whole point of me sharing this is to share and hopefully someone out there finds this useful someday.

What Exactly is AWS MSK?

“…Apache Kafka is an open-source, distributed event streaming platform commonly used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. However, Apache Kafka clusters can be challenging to set up, scale, and manage in production. When you run Apache Kafka on your own, you need to provision servers, configure Apache Kafka manually, replace servers when they fail, orchestrate server patches and upgrades, architect the cluster for high availability, ensure data is durably stored and secured, set up monitoring and alarms, and carefully plan scaling events to support load changes…” This is the definition of AWS MSK from the AWS Big Data Blog. In Essence, MSK allows one to: ingest and process log and event streams, form real-time, centralized, and privately accessible data buses, and to generally power even driven systems.

Our Solution to the challenge.

In the end, we came across a blog post that described How Goldman Sachs had done an implementation of the MSK in their environment. Earlier, I had thought that simply doing a VPC peering and the magic of lambdas would solve our problem. My colleague analyzed this approach, however, and concluded that this was a bad practice since VPC peering is more suited for environments that have a high degree of trust between the parties that are peering the VPCs.
This is mostly because after a VPC peering connection is established, the peered networks have broad access and trust between them, with resources in either VPC able to initiate a connection, this if from the AWS Big data blog. This sounds bad, right? Potential recipe for GDPR violations perhaps? We are responsible for implementing fine-grained network access controls with SGs to make sure that only specific resources intended to be reachable are accessible over a VPC peering connections. There were other consideration against the establishment of a peering and we just had to abandon this approach. Read the AWS Blog to find out more.
To eliminate this overhead, the publication on how Godman Sachs built a cross account connectivity to their amazon MSK clusters with AWS private link came in handy. This was extremely useful and gave us an easier alternative to implement this MSK broker in a compliant and secure manner. Since then, I have read quite a lot about MSKs, I feel like an MSK expert! I am kidding, I prefer the security side of things! That it, that’s the story, for now.

Image description

Top comments (0)