Introduction to Database

Table
- Table based database consists of the rows and columns.
- Table based database has a primary key to link the data subsets or partition the database
- Best fit for the vertical scaling
- Schema-based
Document
- The document-based database consists of the collection, documents, and fields
- The data is stored as key-value pair in fields.
- Best fit for the horizontal scaling
- Schema-less
Graph
- The graph-based database store the data as nodes and relations
- It is schema-less
- Easy to scale

RDBMS vs NoSQL

	RDBMS	NoSQL (Not Only SQL)
Example	MySQL, Oracle, Maria, etc.	mongo, couch DB, Dynamo DB
Design	tables	document, key-value, graphs
Scaling	vertical	horizontal
Ease of scaling	hard	easy
Flexible	not so flexible	very flexible
Schema	need schema	no need for schema
Primary key	Yes	no
DBA	required	only if, DB is big
Dynamic (doing any change at any time)	no	yes
ACID	yes	no
Sharding	no	yes
Partition	yes	no
complexity	High	Low
Language	Query language	MQL (mongo query language)
Integrated Caching	Needs to implement separately	inbuilt
Security	Yes	not as MySQL
Store procedures	Yes	No
Usecases	Where the ACID properties are required eg: financial banks	Where ACID is not the priority
Performance	Faster at selecting the number of data	Faster in inserting and updating the data. Writes are faster
When to use	Best fit for data in suitable for table/rows	Best fit for unstructured data
	Best suitable for the small data set	Best suitable for the large data set
	-	hight availability during unstable environments
	Strong dependency on multi-row transactions	-

Important concepts:

1) Transactions

For any database, we communicate with terms of ‘transactions’. How many transactions are happening?

eg: There are A and B accounts with Rs 500 and Rs. 1000. A will transfer the 200 to the B account and increase the balance of B 1200.

Here, we have 3 steps:

Account A transfer 200 Rs
Account B will receive the amount from A
Account B's balance will increase

The above 3 steps are one transaction.

2) ACID

a. Atomicity

Atomicity means that all the transactions should be either 100% successful or fail. There should not be any partial state. Basically All or none.

eg: In the example of transactions, if 1 step will fail then the whole transaction should be failed. For atomicity, the whole transaction should be 100% done

b. Consistency

The database must be consistent before and after the transaction always. Concurrency should not impact the database integrity.

eg: If a user has deposited 100 in an account and it failed. The database should reflect the last value to keep the database consistency.

c. Isolation

Multiple transactions occur independently without interference. In the end, the database should have the data after all the transactions.

eg: Bank balance is 600. User A withdraws the 100 and user B is adding 50. For user A the balance would be 600 and after withdrawing it would be 500. For user B the balance is 500 and on that 50 will get added and the new balance would be 550. It would look like the transactions are running sequentially.

d. Durability

The changes made by a transaction should persist.

eg: A user has done a transaction of Rs 200. If there is an outage still the transaction success/failure should be persisted in the database. This can be done by the WALs (Write-Ahead-logs).

3) Sharding

Data Sharding is a way of distributing a single database across multiple databases which can be stored across multiple machines. It is a way of reducing the load to a single database.

Data sharding is a kind of ‘Horizontal scaling’. While doing sharding, the data is distributed into multiple machines known as ‘physical shards’. Each shard would be ‘autonomous’ - they don’t share any data or compute resources. Sharding can be implemented at the application level or a few databases supports it natively.

Advantages of sharding:

Fast search
Avoiding SPoF
Easy to upgrade horizontal

The disadvantages of sharding:

Unbalanced shard
Not natively supported by many databases
Shard architecture is hard
Unsharding is difficult

Techniques for Sharding:

Key-based
Range-based
Directory-based

4) Partition

The data partition is a way of distributing the data into multiple subsets of data under a single instance. MySQL supports the partition. There are different types of partition.

The way to link the different tables together is by the primary key.

Eg: There is data of users. Instead of storing whole data in one table. We can use data-partition to avoid making one table too big, avoid SPoF, and easy maintenance of the database. We can break it into multiple databases.

Benefits:

Query performance
Data availability
Performance
Data manageability
Improve security

Data Partition methods:

Horizontal partition (Database Sharding)
Vertical partition
Functional partition

Techniques:

Range-based
Hash-based
List-based

Difference between data-partition and data-sharding.

Data sharding and partition goal is the same to distribute the data. The data partition is the backbone of the distributed system architecture. Where Data-partition focuses on distributing the database into multiple data subsets under one single instance. Data sharding is about distributing the data into multiple machines.

5 BASE

BA: Basically Available

S: Soft State

E: Eventually consistent

Use Cases:

RDBMS

MySQL will be a great fit for financial transactions where ACID compliance is required
Data is structured
Horizontal scaling is not the requirement
Search is faster

NoSQL

Where ACID compliance is not the priority. Eg: blog application, e-commerce (product)
Data is unstructured. Eg: products
Read and Writes are higher. Eg: Blog where we will be creating blogs and reading them
Looking for horizontal scaling of Database.

A big thanks to folks who reviewed this and gave valuable feedback: Ahsan, Tauseef, Naman,Anand,Uddeshya

Liked the blog? Follow me on [Twitter],(https://twitter.com/hellonehha)