DEV Community

Cover image for First Look at Amazon DataZone
Zach A. Thomas for AWS Community Builders

Posted on

First Look at Amazon DataZone

AWS has released a public preview of an impressive suite of capabilities called Amazon DataZone.

This is based on a thorny problem in trying to tap the potential of business data: data is often trapped within the corporate silo where it was created, e.g. Sales, Operations, Marketing, etc. Even when we succeed in breaking through the silo to get access to some other group's data, we have a host of additional problems: cleaning the data, normalizing data definitions, questions of compliance (encryption, the handling of personally identifying information, retention policy, etc.), and making it easy to use.

Businesses have tried solving this problem by centralizing the data access problem in a new group, sometimes called Data Science. This group is in charge of the data lake, and cleaning and describing all the data sources. This is almost a good idea, but it creates an unfortunate bottleneck; the agility of the business is hampered by the need to get the data source tidied and blessed by an overworked group of people who are not the domain experts in any of the data sources they're made responsible for. The capabilities of the Data Science team are stretched thin and they don't scale with the appetite for business data.

A new, decentralized approach to these problems was described by Zhamak Dehghani from ThoughtWorks. It's called Data Mesh, and it's based on the idea that each data source should be treated as a data product, and the owners of the data source are also the owners of the data product. You can think of a data source as a kind of API, and many groups can contribute to the shared catalog, including providing the documentation and access rules. The final piece of the Data Mesh puzzle is that governance should be federated rather than centralized.

Amazon appears to be taking this approach seriously, and DataZone is a system of systems for connecting producers of data with consumers from a hub that is self-service for both kinds of participants. From their product page, the key features are:

  • catalog: Search for published data, request access, and start working with your data in days instead of weeks.
  • projects: Collaborate with teams through data assets, and manage and monitor data assets across projects.
  • portal: Access analytics with a personalized view for data assets through a web-based application or API.

It's clear you still need the data specialists, but they can be the ones empowering disparate groups with these new capabilities so they can begin to be autonomous. It's consistent with the concept of a platform team from Team Topologies.

I look forward to getting my hands dirty with Amazon's new offering. I don't know how good their implementation is yet, but I have become convinced that this architecture is a source of significant competitive advantage.

(image credit Rhk111, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons)

Top comments (0)