DEV Community

Neha Sharma
Neha Sharma

Posted on • Updated on

Introduction to Database

TOC

  • What is a database?
  • Features of Database
  • Why Database?
  • Different types of databases
  • Important concepts
    • ACID
    • Data partition
    • Data sharding
    • BASE

What is Database:

A database is a system to efficiently store, retrieve, and search the data. A database can be imagined as an organised way of storing the data of any application.

Features

  • Organized
  • Availability
  • easy to retrieve, search, read, and write data
  • Low Latency
  • Data recovery
  • ACID*
  • Flexible Schema*

*not all databases provide it.

Why Database

  • to organized the data
  • to store the data
  • to retrieve the data
  • to have a backup of data
  • low latency search

Popular database

  • RDBMS: MySQL, Oracle, etc.
  • NoSQL: MongoDB, DynamoDB

Type of database

  • Table
    • Table based database consists of the rows and columns.
    • Table based database has a primary key to link the data subsets or partition the database
    • Best fit for the vertical scaling
    • Schema-based
  • Document
    • The document-based database consists of the collection, documents, and fields
    • The data is stored as key-value pair in fields.
    • Best fit for the horizontal scaling
    • Schema-less
  • Graph
    • The graph-based database store the data as nodes and relations
    • It is schema-less
    • Easy to scale

RDBMS vs NoSQL

RDBMS NoSQL (Not Only SQL)
Example MySQL, Oracle, Maria, etc. mongo, couch DB, Dynamo DB
Design tables document, key-value, graphs
Scaling vertical horizontal
Ease of scaling hard easy
Flexible not so flexible very flexible
Schema need schema no need for schema
Primary key Yes no
DBA required only if, DB is big
Dynamic (doing any change at any time) no yes
ACID yes no
Sharding no yes
Partition yes no
complexity High Low
Language Query language MQL (mongo query language)
Integrated Caching Needs to implement separately inbuilt
Security Yes not as MySQL
Store procedures Yes No
Usecases Where the ACID properties are required eg: financial banks Where ACID is not the priority
Performance Faster at selecting the number of data Faster in inserting and updating the data. Writes are faster
When to use Best fit for data in suitable for table/rows Best fit for unstructured data
Best suitable for the small data set Best suitable for the large data set
- hight availability during unstable environments
Strong dependency on multi-row transactions -

Important concepts:

1) Transactions

For any database, we communicate with terms of ‘transactions’. How many transactions are happening?

eg: There are A and B accounts with Rs 500 and Rs. 1000. A will transfer the 200 to the B account and increase the balance of B 1200.

Here, we have 3 steps:

  1. Account A transfer 200 Rs
  2. Account B will receive the amount from A
  3. Account B's balance will increase

The above 3 steps are one transaction.

2) ACID

a. Atomicity

Atomicity means that all the transactions should be either 100% successful or fail. There should not be any partial state. Basically All or none.

eg: In the example of transactions, if 1 step will fail then the whole transaction should be failed. For atomicity, the whole transaction should be 100% done

b. Consistency

The database must be consistent before and after the transaction always. Concurrency should not impact the database integrity.

eg: If a user has deposited 100 in an account and it failed. The database should reflect the last value to keep the database consistency.

c. Isolation

Multiple transactions occur independently without interference. In the end, the database should have the data after all the transactions.

eg: Bank balance is 600. User A withdraws the 100 and user B is adding 50. For user A the balance would be 600 and after withdrawing it would be 500. For user B the balance is 500 and on that 50 will get added and the new balance would be 550. It would look like the transactions are running sequentially.

d. Durability

The changes made by a transaction should persist.

eg: A user has done a transaction of Rs 200. If there is an outage still the transaction success/failure should be persisted in the database. This can be done by the WALs (Write-Ahead-logs).

3) Sharding

Data Sharding is a way of distributing a single database across multiple databases which can be stored across multiple machines. It is a way of reducing the load to a single database.

Data sharding is a kind of ‘Horizontal scaling’. While doing sharding, the data is distributed into multiple machines known as ‘physical shards’. Each shard would be ‘autonomous’ - they don’t share any data or compute resources. Sharding can be implemented at the application level or a few databases supports it natively.

Advantages of sharding:

  • Fast search
  • Avoiding SPoF
  • Easy to upgrade horizontal

The disadvantages of sharding:

  • Unbalanced shard
  • Not natively supported by many databases
  • Shard architecture is hard
  • Unsharding is difficult

Techniques for Sharding:

  • Key-based
  • Range-based
  • Directory-based

4) Partition

The data partition is a way of distributing the data into multiple subsets of data under a single instance. MySQL supports the partition. There are different types of partition.

The way to link the different tables together is by the primary key.

Eg: There is data of users. Instead of storing whole data in one table. We can use data-partition to avoid making one table too big, avoid SPoF, and easy maintenance of the database. We can break it into multiple databases.

Benefits:

  • Query performance
  • Data availability
  • Performance
  • Data manageability
  • Improve security

Data Partition methods:

  • Horizontal partition (Database Sharding)
  • Vertical partition
  • Functional partition

Techniques:

  • Range-based
  • Hash-based
  • List-based

Difference between data-partition and data-sharding.

Data sharding and partition goal is the same to distribute the data. The data partition is the backbone of the distributed system architecture. Where Data-partition focuses on distributing the database into multiple data subsets under one single instance. Data sharding is about distributing the data into multiple machines.

5 BASE

BA: Basically Available

S: Soft State

E: Eventually consistent

Use Cases:

RDBMS

  • MySQL will be a great fit for financial transactions where ACID compliance is required
  • Data is structured
  • Horizontal scaling is not the requirement
  • Search is faster

NoSQL

  • Where ACID compliance is not the priority. Eg: blog application, e-commerce (product)
  • Data is unstructured. Eg: products
  • Read and Writes are higher. Eg: Blog where we will be creating blogs and reading them
  • Looking for horizontal scaling of Database.

A big thanks to folks who reviewed this and gave valuable feedback: Ahsan, Tauseef, Naman,Anand,Uddeshya

Liked the blog? Follow me on [Twitter],(https://twitter.com/hellonehha)

Oldest comments (0)