Hatem Hassan 👨‍💻☕️💻🌺😎

Posted on Oct 13, 2019 • Edited on Mar 24, 2022 • Originally published at iammowgoud.com

Designing Data Intensive Applications - Book Review

#data #database #book #review

Originally posted on my personal blog

This book is definitely a classic. The book basically touches upon the surface of a wide array of topics related to handling data in a distributed environment; ranging from basic database theory, ACID, replication and partitioning to more complex (and "modern") topics like stream and batch processing on the cloud.

"Data outlives code.”

- Martin Kleppman

Martin Kleppman lays down all the needed information on what every engineer needs to know about designing systems that deal with any kind of data.

The first part is about basic database concepts:

Relational vs. NoSQL
Different query languages
How data is actually persisted on storage devices. (B-Trees / LSM Trees)
Encoding and serialization/deserialization

The second part goes deeper and discusses the following concepts (and their issues):

CAP Theorem
Replication
Partitioning
Transactions
Consistency

The final chapter discusses derived data and aims to tackle the issues discussed in the previous part, as well as introducing more "modern" concepts like:

Batch and Stream processing (MapReduce/Spark)
Eventual consistency and "Change Data Capture"

Although the book doesn't dive into any deep technical or implementation details, it has a very good bibliography and footnotes that leads you to all of the academic papers you need. Overall, the book is an essential read for any software/data engineer in 2019.

Instagram Art:

DEV Community

Designing Data Intensive Applications - Book Review

"Data outlives code.”

Top comments (0)