DEV Community

Cover image for Driven by Data! (Part1)
raviklog
raviklog

Posted on

Driven by Data! (Part1)

Everything we do (Write, Learn, Speak, Eat, Play, Travel, Work.... etc.) gets translated into Data and it's only getting multi-fold every single day...There is an estimate that every single day online we generate close to 2.5 quintillion bytes(around 2020 stats and this number would have crossed now in big way). If we consider a penny as a byte of data...then the whole earth laid out flat can be filled 5 times with that number...that's quite staggering!

1 Quintillion = 1,000,000,000,000,000,000(10^18)
2.5 Quintillion = 2.5 Billion Giga bytes.

According to Netflix_Data, you use 1 GB of data per hour when you stream a standard definition (SD) video. High definition (HD) videos, on the other hand, use 3 GB per hour. And 4K Ultra HD streams use up to 7 GB per hour of video...So you can calculate what's the 2.5 Quintillion comes to.

Some Stats online to check:

  • Checkout the links for more stats on Data usages online FinOnline
  • Wide variety of Data captures across the world checkout OurWorldinData

With this flow of Data, we need to categorize to find a meaningful information/insight out of it. Some common ways to categorize data as given below,

Structured vs Unstructured
Structured data is organized in a predefined format, such as a table with rows and columns. Unstructured data does not have a fixed format and may include things like text documents, emails, and social media posts.

Image description

Quantitative Data: Data that can be measured in numbers
Qualitative Data: Quality or Characteristic of a Data...No Measurement but core attributes/characteristics
Quantitative

Image description
Qualitative
Image description

Continuous vs Discrete:
Continuous data can take on any value within a certain range (e.g., Time Measurement).
Discrete data can only take on specific values/finite numbers (e.g., the number of children in a class).

Continuous
Image description

Discrete
Image description

Primary vs Secondary:
Primary data that is collected directly from the source (such as experimental data which is original and unbiased)
Secondary data that is already collected by other person or organization and external (such as Internet source, books, article and blogs)

Primary
Image description

Secondary

Image description

Public vs Private:
Public data is freely accessible by everyone (published online such as government data, social media posts)
Private data is secured and could be accessible only by a few authorized personnel for security and confidentiality(such as Individual’s financial records, trade secrets)

Public
Image description

Private
Image description

Some other categories

  1. (Data originating from IOT devices, Video Telecasts called as Live/Streaming data)
  2. Historical Data from books, Articles, recordings collected over time as offline/historical)
  3. Based on Text/Image/Geometric/Video/Audio data collected.

Properties of Data: We need to organize this huge collection of data based on the properties, that helps us to access, search and retrieve through a defined system that we now call as "Databases", "Datastores", "DataLakes" and other modern terms.

  1. Accuracy -> Free from errors/inconsistencies
  2. Validity -> Data which meets the requirements of the system or application in which it is being used.
  3. Timeliness -> Available when it is needed
  4. Completeness -> Includes all the information that is required for a given purpose
  5. Relevance -> Related to the problem at task at hand
  6. Consistency -> Uniform and follows the rules and standards
  7. Integrity -> Protected from unauthorized access or modification.
  8. Accessibility -> Retrieved or Accessed by those who need it.
  9. Understandability -> Easily interpreted and understood by those who need to use it. 10.Usability -> Data is usable if it can be effectively used for a given purpose.

So, all major databases are designed to take into consideration all the above properties and much more for the benefit of users,

What is the Purpose of the Database?

Image descriptionWhen you need to store and manage large amounts of data: A database is a good way to store and organize large amounts of data so that it can be easily accessed, managed, and updated.

Image descriptionWhen you need to support multiple users: A database can be used to store data that is shared among multiple users, allowing them to access and update the data concurrently.

Image descriptionWhen you need to store data for analysis: A database can be used to store data that needs to be analyzed or queried in various ways.

Image descriptionWhen you need to store data that needs to be backed up: A database can be used to store data that needs to be backed up regularly to prevent data loss.

Image descriptionWhen you need to enforce data integrity: A database can be used to enforce data integrity rules, ensuring that the data is consistent and accurate.
When you need to store data that needs to be secured: A database can be used to store data that needs to be protected from unauthorized access.

Image descriptionWhen you need to store data that needs to be accessed quickly: A database can be optimized to allow fast access to the data it stores.

That's a wrap on the Data basics and in the Next post, we will detail the Database concepts and latest trends. Please do give your suggestions/comments.

Top comments (0)