Hargunbeer Singh

Posted on Sep 8, 2021

Data Structures for Dummies

#datascience #algorithms

Introduction

A data structure is a way how data is stored in the memory, it is is really important to store your data in the correct data structure for optimal algorithmic performance. In technical words, a data structure is a data organization, management and storage format that enables efficient access and modification. It is an algebraic structure about data. Using a data structure, makes it easy to retrieve and read data from the memory. They are generally based on the ability of a computer to fetch and store data at any place in its memory. A simple example of a data structure is an array, it is used to store a series of values in memory.

Types of Data Structure

Choosing the right data structure for performing a particular task is a choice that pays you off in the future. The use of right data structures make a computing task a lot more efficient and easy to code. There are a lot of data structures used for different types of tasks, some of the popular data structures are described below.

Array

Arrays are a series of values stored in memory. One-dimensional arrays are the arrays which just have the values stored in a series in them and do not have other nested arrays inside them. One-dimensional arrays are also referred to as vectors. We can point to a value in an array using index, most programming languages start indexes with 0, for example we can reference the first value in an array called hobbies by hobbies[0]. An array is stored in memory in the following way: its consequent values are stored in the memory as corresponding data types, like an array containing 4 values would store the values as their corresponding data types in memory from lets say memory location 1000 to 1003, although they are stored as common data but still the compiler knows that all the values are a part of a single array. An array of values is fixed and cannot be enlarged to add more values. Arrays also need to be stored in order in memory which makes it hard to add a new value(*do not relate to pushing a value into an array)

Strings

A string datatype is also stored as an array in memory, lets suppose the string Hello World, the compiler would store the string by splitting it into individual characters, so H would be stored in memory location 1ooo and the space would be stored in memory location 1005 and so on, the compiler denotes the ending of the string by a null character, giving the compiler a signal that it needs to stop reading the data from the memory, and when the compiler fetches the data from the memory, it merges the individual characters and forms a string.

Matrix

We can use arrays to make one-dimensional lists but sometimes we need multi-dimensional data structures to store data like the value of each pixel on a screen or the data stored in a spreadsheet, a multi-dimensional data structure is called a matrix. It is an array of arrays, so it contains a series of arrays which further contain a series of values. For example: A 3 by 3 matrix would be an array of 3 sub-arrays which are an array of 3 values each. They are stored in memory similar to how arrays are stored in memory. The values of sub-arrays of a matrix are stored as their corresponding data types in the memory and when the compiler reads them, it knows that they are the sub-arrays of a matrix. Matrices are not only limited to two or three dimensions, they can be of any dimensions, so a five dimensional matrix would have 5 layers of arrays, so you can point to a value in a five dimensional array like this hobbies[3][5[8][4][5], higher level matrices are used to store relatively complex and more nested data sets.

Struct

It is a great programming practice to store a set of related variables together, like you should store the data about an employee, like his age, salary and designation in a struct as all these variables are related, objects(as popularly called in programming languages) are a higher level implementation of structs. For example: you can retrieve the age of an employee from the memory from the employee struct by employee.age. We can even make arrays of structs that we define, and structs are automatically bundled up in memory. So in an array of structs, we can retrieve a struct by pointing to its index in the array and then to retrieve a value from the specific struct we would point to the variable name. Structs can be used to make more complex data structures which avoid the problems with arrays, i.e. new values cannot be added in the middle of an array because they are stored in order in memory. These structures are called Nodes

Node and Linked List

It is a type of struct that stores both the variable and a pointer in itself. A pointer is a special variable that points to a location in memory. Using node, we can create a linked list, which is a complex data structure which is flexible and stores many nodes. A linked list stores multiple nodes by pointing each node to the next node in the linked list. A linked list eliminates the problems of an array as it is very flexible in storing nodes, the nodes do not need to be stored in a particular order and two nodes can have other data between them. The pointer in each node comes into play in joining the linked list, so suppose we have a linked list with three nodes, the first node would point to the second node's location in memory and the second node will point to the third node's location in the memory, if the third node points back to the first node's location in memory, that means that the particular linked list is a circular linked list.

If the value of the pointer of the last node is null, then the linked list ends there. We can insert new nodes in the middle of the linked list, unlike arrays, as we just need to change the pointer value of the previous node while adding another node. Developers do not need to worry about the pointer values, that is work for the compiler and the developer can just visualize linked list by a layer of abstraction. Linked lists, due to their flexibility, can also be easily re-ordered, trimmed, split, reversed, etc. Linked lists are really useful for algorithms like sorting algorithms. Because of the flexibility that Linked Lists offer, many more data structures are build on top of linked lasts, the most famous ones are queues and stacks.

Queues

A queue is a data structure that follows a first-in first-out(FIFO) approach. As its name states, FIFO is an approach where the first arriver is served first. It is built upon linked lists. Suppose in a linked list, we have a pointer that points to a bank queue visualized like this:

So the bank queue is a queue data structure where Steve points to Michael, Michael points to Anna and Anna points to Della, Della is the last node in the list as its pointer value is null. So after we have served Steve, we would read Steve's next pointer and dequeue Steve from the queue. To add a new node to the linked list, we would have to traverse down the linked list until we reach the node with pointer value null, and then we add the new node to the end of the linked list and update the pointer values of the former last node of the linked list. Adding a node to the queue is also called enqueueing.

Stacks

Stacks are data structures build upon linked lists. They follow a Last-in First-out(LIFO) approach. As its name states, LIFO is an approach where the last arriver is served the first, and the first arriver is served the last. An analogy to explain that can be a pile of papers, so the paper that has been piled the last would be picked up the first, and the paper that was piled up the first would be picked up the last. Adding nodes to the stack is called pushing data to the stack. Removing data from the stack is called popping data from the stack.

Trees

Trees are data structures built on linked lists; a tree data structure is just a node with multiple pointers. This data structure is used in the implementation of a lot of algorithms. Programmers generally do not consider looking at and evaluating the list of pointers and rather use a layer of abstraction to conceptualize trees like this:

The top most node of the tree is called the "Root". Children nodes are the nodes a particular node points to. A parent node is a node which points to other nodes(children nodes). The nodes which do not have any children nodes in a tree data structure, they are called the leaf nodes. A tree data structure whose nodes can have maximum two children nodes are called binary trees. Their is only one way roots are connected to leaves and there cannot be a tree in which the roots link to leaves and the leaves link to roots.

Graph

Graphs are used to store data that links arbitrarily. This type of data usually uses loops. A branch of mathematics - Graph Theory studies the graph data in detail. A graph is a data structure in which nodes are pointed to each other. In graph data structure too, the programmer doesn't need to care about the pointer values of various nodes; the programmer can usually visualize the graph with a level of abstraction as:

The above image is a real life example of graph data, it represents all the paths a soldier can take to reach a particular place(node). Graph theory studies the paths(node pointers) and finds statistical data from the graph to find efficient techniques to perform various functions. Dijkstra's algorithm is a vital part of graph theory and it is an algorithm to find the shortest paths between nodes in a graph, in this case the shortest distance to reach a particular place. Graphs, other than computer science, are also used in real world scenarios, like finding the shortest distance between two cities or finding the shortest path to attack an enemy trench in a war.

Others

These are the most popular common data structures, on top of these data structures, programmers have build more data structures with specific functionalities like red-black trees and heaps.

Conclusion

You must use data structures for writing efficient code and algorithms. Programmers do not need to implement these data structures from scratch as most programming languages have default packages which have implementations for common data structures.

Top comments (1)

mzhasan2016 • Sep 7 '22

Really excellent article.

DEV Community