Skip to content

DEV Community

Pragya Sapkota

Posted on Nov 16, 2022 • Originally published at pragyasapkota.Medium

Data Indexing, Replication, and Sharding: Basic Concepts

#database #indexing #replication #sharding

A database is a collection of information that is structured for easy access. It mainly runs in a computer system and is controlled by a database management system (DBMS). Let’s see some concepts of the database here — Indexing, Replication, and Sharding respectively.

Indexing

The database can have a large amount of data with up to millions of records. In the time of need, the disorganized data with no index is very hard to retrieve and the whole database would have to be iterated one by one. And if it’s old data, then that would be an absolute nightmare. The solution to getting out of this complication is INDEX.

Database Indexing is a kind of data structure that helps with fast retrieval of the information held in the database. We use indexes to look up those data which is assigned at the time the information is stored. When the data is too large to be able to search for data iteratively, we use database indexing. This is a core necessity to a relational database and is offered on non-relational databases as well. We have a very optimized lookup time when the data is indexed.

Replication

Replication means making copies of things to duplicate them. In a database, the term replication is heard when we learn scaling. We can duplicate our database so that if the database overloads and crash at some point, the other duplicated database handles the load, and we can avoid system failure. This creates redundancy in the system which will maintain high availability in the system.

We can have the data replication both synchronously and asynchronously. When chosen the synchronous way, the replicated database updates in sync with the changes in the main database. You can allocate a time interval where your main database and the replica database can be synchronized and updated. One other thing to ensure is that if the write operation to the replica fails somehow, the write operation to the main database also fails. This falls under the feature Atomicity as we discussed earlier in the article — Relational Database.

However, the dispute that might occur in the replication is when the data is too large, and the only concern is to make the system more available but not to improve latency and throughput. And thus, we chunk down the data which leads us to Sharding.

Sharding

Data sharding means breaking the huge database into smaller databases so that the latency and throughput are maintained after the database replication. You can choose how you want your data to be broken. There are two types of ways to shard your data — horizontal and vertical sharding. In horizontal sharding, the rows of the same table are stored in multiple database nodes whereas, in vertical sharding, different tables and columns are stored in a separate database.

pragyaasapkota / System-Design-Concepts

Though the concepts of system design might be tricky, let's see them individually to their core concepts and have a better understanding.

System Design

System design defines system elements like modules, architecture, components, and their interfaces and data for a system based on the specified requirements.

This is an index for the concepts of the system.

If you wish to open these in a new tab, Press CTRL+click

S.N.	Table of Content
1.	Caching
2.	Network Protocols
3.	Storage: The Underrated Topic
4.	Latency and Throughput
5.	System Availability
6.	Leader Election
7.	Proxies
8.	Load Balancing
9.	Endpoint Protection
10.	HTTPS: Is it better than HTTP?
11.	Polling and Streaming
12.	Long Polling
13.	Hashing
14.	CAP Theorem
15.	PACELC Theorem
16.	Messaging and Pub-Sub
17.	Database Relational Database Non-relational Database Data Indexing, Replication, and Sharding Database Indexes Database Federation Database Replication
18.	Logging, Monitoring, and Alerting
19.	Distributed System
20.	Scaling
21.	Event Driven Architecture (EDA)
22.	CQRS
23.	Message Queue
24.	Architectural Patterns
25.	Enterprise Service Bus (ESB)
26.

…

I hope this article was helpful to you.

Please don’t forget to follow me!!!

Any kind of feedback or comment is welcome!!!

Thank you for your time and support!!!!

Keep Reading!! Keep Learning!!!

Top comments (2)

Subscribe

Prasad Saya • Nov 16 '22

Modern databases (for example, NoSQL databases) allow replication and sharding easily - can be configured on the basic installations.

Pragya Sapkota • Nov 16 '22

yes, exactly!!

Read next

Vector Search and Semantic Search in Depth

Nozibul Islam - Nov 19

Filament Database Notification Sound

Isa Andrean - Oct 17

In-Memory Databases vs. Relational Databases: Key Advantages and Use Cases

Aditya Pratap Bhuyan - Nov 9

Overcoming MongoDB Limitations with Fauna

Kirk Kirkconnell - Oct 16