DEV Community

jzfrank
jzfrank

Posted on

Big Data for Engineers - Chapter 1: Introduction

Magnitude is an interesting concept. We have a good feeling about measuring real world objects:

  • Our height -> meter
  • Driving Cars -> kilometers
  • Size of Earth -> megameters
  • Distance of Earth to Sun -> gigameters
  • Distance of Jupyter to Sun -> terameters
  • Entire Solar System -> petameters ......

However, we seem to fail to appreciate the magnitude of data: we take 1GB, 1TB for granted, without thinking what a large number that is.
In terms of bytes, approximately:

  • kB = 1,000 B
  • MB = 1,000,000 B
  • GB = 1,000,000,000 B
  • TB = 1,000,000,000,000 B
  • PB = 1,000,000,000,000,000 B
  • EB = 1,000,000,000,000,000,000 B
  • ZB = 1,000,000,000,000,000,000,000 B
  • YB = 1,000,000,000,000,000,000,000,000 B

kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, yottabyte. Isn't that impressive?

To understand the importance of learning big data, we discuss 3 V's.

3 Vs of big data

Volume

The volume of data stored is increasing exponentially. In 2021, it's around 100ZB.

Variety

Data exists in various forms:

  • trees (JSON, XML)
  • unstructured (text, pictures, audio, video)
  • cubes
  • graphs (neo4j, Oracle PGX)

Velocity

How fast can we process data? Three factors affect the velocity: capacity, thoughput, latency.

From 1956 to 2021, capacity has increased by a factor of 200,000,000,000, throughput by 10,000, latency (decreases) by 150. So the throughput and latency cannot catch up with the capacity.

However, if we do parallel or batch processing, we may reduce that gap.

Top comments (0)