DEV Community

Harsh Joshi
Harsh Joshi

Posted on

Data Structures - A complete Reference

CHAPTER 1: AN INTRODUCTION

One of the most fundamental concepts in programming and computer science is the concept of data structures. Data structures refer to the way of organizing data for efficient usage. The data stored on your hard drive, the data on the RAM, data on pen drives, CD ROMs and all other places you can think of, the key underlying concept is of data structures that powers these devices to work efficiently.

NOTE: Data structures not just describe the storage of data piled into a memory unit, rather describe the data management alongside storing the data.

HOTS: It is easier to search a word from a dictionary because the words in the dictionary have been organized for that purpose. This is one of the many uses of data structures.

WHY SO MANY OF DATA STRUCTURES FOR A SIMPLE PURPOSE?:

Different kind of data exists and a different set of organization techniques and tools are adhered to while storing data and hence there required is a need for different types of data structures.

e.g. A key-value pair data cannot be simply stored in a simple static array.

Other factors which govern the choice of data structures is memory management, system utilization, usage, etc.

HOLD ON!! IT’S PERHAPS NOT WHAT YOU THOUGHT:

It was earlier mentioned that the data stored on devices like hard drives and pen drives is powered with the data structures. This statement has high chances of getting misinterpreted because data structures are not physical structures rather logical ones.

“A physical data structure refers to the actual organization of data on a storage device. The logical data structure refers to how the information appears to a program or user. For example, a data file is a collection of information stored together. This is its logical structure”

**DEFINITION: **A data structure is a format for data organization, management, and storage which enables efficient access and updating.

WAYS TO LOOK AT A DATA STRUCTURE:

Whatever is being talked about in this section does not specify the types of data structures rather it talks about the two perspectives data structures should be looked at with.

  1. LOGICAL VIEW
  2. IMPLEMENTATIONAL VIEW

Two eyes, Two perspectives! It’s one for each. Let’s dive into details:


The logical view of data structures defines the associated operations that would be carried out.

So, this perspective of data structure does not do anything, it only defines stuff and hence is never used in the actual implementation but it is handy because it is the backbone of the actual implementation.

Wohaa! That was a long sentence to understand in one go.

In simpler words, logical view describes WHAT but not HOW. **It is because of this reason that these data structures are called Abstract Data Types. **

“Abstract Data type (ADT) is a type (or class) for objects whose behaviour is defined by a set of value and a set of operations.” - GeeksForGeeks

Let’s break it down with an example:

There is something called a Car. The attributes of a car (only to name a few) are:

  • It has four wheels.
  • It provides roof
  • It drives us to place we want to go

Okay! Now, do we actually in practice own something called the car in our daily lives. We actually own an extension of the car. We might own a Maruti Swift Desire or even a Honda Amaze. These are cars too but these are the ones actually used (Implementational View).

The attributes are (Honda Amaze):

  • Mileage: 23.8 kmpl
  • Engine Displ. : 1498 cc
  • Transmission: Automatic
  • Fuel Type: Diesel
  • Boot Space: 420

If you are familiar with the concepts of Aggregation and Association **and is-a relationship** you might be finding it easy. Otherwise, think of it as there are some generic data structures which only define the operations which the structure will carry out without talking about how will it do it. And then come the implemented data structures (concrete data structures) which extend the Abstract data types (generic data structures) to describe how will the operations actually implemented.

One of the example in terms of programming is:

A list is an abstract data type which defines that there is some capacity for the elements to be stored in sequential order here and array extends it to define explicit properties and operations describing the extended properties.

CHAPTER 2: LIST AS AN ABSTRACT DATA TYPE

In general, a collection of objects which are of the same type form a list. A list can be a group of words, a group of car names or anything that follows the above-mentioned definition.

Now that we are aware of what an abstract data type is, Let define list as an abstract data type.

An abstract data type will only define the data to be stored and the associated operations for handling the data.

In view of a list which contains a similar group of objects so the abstract view of it can be summarized as:

  • It contains a fixed number of elements of the same datatype
  • It allows accessing the element present at any location.
  • Write/Modify any element at any position

The abstract view is not limited to only these points. However, if we think of the concrete implementation for such an abstract data type, a familiar name that strikes is an Array.

Array:

An array stores the items in contiguous memory locations. It can be seen as a concrete implementation of the abstract data type list. The contiguously allocated memory makes it easy to access the elements using the base address and indices.

Based on the implementation, arrays can be categorized as static or dynamic. Static arrays have fixed memory already assigned at the time of definition. Dynamic arrays are capable to change the size as per the run time requirements.

Note: In case of the dynamic array, whenever the maximum size of the array is reached, a new array with a larger size is created and the content of the previous array is copied to that array. (Because it is not possible to extend the size of the already created array)

The memory for the previous array is made free. Since the operation to copy content from one array to another is costly in terms of time, every time the array is full, a new array of double the capacity of previous is created.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

Every element in an array is uniquely identified by the index.

Array Programs:

**PROGRAM 1: **Write a function rotate(ar[], d, n) that rotates arr[] of size n by d elements.

#include <iostream>
using namespace std;
void leftRotateByOne(int arr[],int n);
void leftRotate(int arr[],int k,int n);
void
leftRotate (int arr[], int k, int n)
{
  for (int i = 0; i < k; ++i)
    {
      leftRotateByOne (arr, n);
    }
}

void
leftRotateByOne (int arr[], int n)
{
  int temp = arr[0];
  int i;
  for (i = 0; i < n - 1; ++i)
    {
      arr[i] = arr[i + 1];

    }
  arr[i] = temp;
}

int
main ()
{
  int arr[] = { 1, 2, 3, 4, 5, 6, 7 };
  int n = sizeof (arr) / sizeof (arr[0]);
  leftRotate (arr, 2, n);
  for (int i = 0; i < n; ++i)
    {
      cout << arr[i];

    }
}
Enter fullscreen mode Exit fullscreen mode

Program: Array Reversal

#include <iostream>
using namespace std;

int main(){
    int arr[]={ 1,2,3,4,5,6 }; //Input Array which has to be reversed.
    int start=0;
    int end=sizeof(arr)/sizeof(arr[0])-1;
    while(start<end){
        int temp=arr[start];
        arr[start]=arr[end];
        arr[end]=temp;

        start++;
        end--;
    }
    for(int i=0;i<sizeof(arr)/sizeof(arr[0]);++i){
        cout<<arr[i];

}}
Enter fullscreen mode Exit fullscreen mode

Program: Array Rearrangement such that if found element i should be at an ith position.

#include <stdio.h>
using namespace std;

int main(){
    int arr[]={ 1,2,3,4,5,6 }; //Input Array which has to be reversed.
    int n=sizeof(arr)/sizeof(arr[0]);
    for(int i=0;i<n;++i){
        if(arr[i]>=0 && arr[i] !=i){
            arr[arr[i]]=(arr[arr[i]]+arr[i])-(arr[i]=arr[arr[i]]);
        }
        else{
            i++;
        }
    }
    for(int i=0;i<n;++i){
        cout<<arr[i];
    }
}
Enter fullscreen mode Exit fullscreen mode

Limitations of Array:

Let us assume a fixed section of computer memory.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

Suppose each segment in the memory is of 1 byte.

Now,

If a programmer types int x; in the program, the equivalent memory (4 Bytes) is allocated in the memory. Similar Happens with the array.

When an array is declared

int arr[5];

This means that 5 integer type elements are to be stored in the array. So, the space that would be required in memory would be 5 * 4 = 20 Bytes. Now, when it comes to memory allocation, **arrays require contiguous memory. **This requires that 20 bytes of memory (20 segments) should be continuously free in memory area else the array can’t be allocated memory.

This is one of the biggest limitations of the array.

CONTIGUOUS MEMORY ALLOCATION AND ARRAY:

The answer to why array requires contiguous memory is, array uses a base addressing for indexing the elements. Since the base address is increment sequentially to retrieve subsequent elements, the addresses in memory need to be contiguous too.

NOTE:

Dynamic Array Memory Allocation:

_As it was mentioned earlier that dynamic array is capable to adjust its size, but the memory management somehow does not support the extension of array size if there is no contiguous memory available. To solve this a new array with the double size is created wherever the contiguous memory is found. If not found, the dynamic array can’t be created. _

INTRODUCTION TO LINKED LIST

Starting off from the problem of memory management in case of arrays, it can be a solution that memory is sought for one individual element of the list, rather than the list (array) as a whole.

In the linked list, the request to memory is made of one element at a time.

Again, if the previous example is considered. When 4 elements of integer types are to be stored, then instead of asking memory for the entire collection to be stored at one place, memory can be asked for separate units of data each linked to each other via some means. This means the need for contiguous memory has been overcome.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

Suppose there are two elements to be stored {4,5}. For each element, a new request for memory allocation is taken and memory is allocated for each element ( Not the entire list as a whole). It is also not required that the elements 4 and 5 be stored contiguously in the memory area.

But somehow these two elements need to be linked to each other so that the list can be accessed continuously. For this purpose, pointers are used.

Technically, each entry in a linked list is called a node. A node is much more than the raw data to be inserted. A node contains the data and the link to traverse to the previous and next node.

So the node can be represented using the structure as:

struct node {
int data;
node* next;
}
Enter fullscreen mode Exit fullscreen mode

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

In the above-mentioned code example, the second component of the structure is marked as node* which marks that the second field of the node will only store an address.

In the linked list, the first node is also called the** head** node. The address field of the last node of the linked list points to null, which marks the end of the list.

The only information we need to have is the address of the head node. Once we have the address of the head node, we can traverse the entire list. Although it is advantageous in terms of memory, the drawback of using a linked list is that no element of the list is directly accessible which means for accessing the 4th element, one would need to first access 1st, 2nd, and 3rd elements compulsorily.

The time taken to access elements has O(n) complexity. (because each element has to be traversed and in the worst case the element to be searched will be the last element of this list.

CHAPTER 3: ARRAY & LINKED LIST ANALYSIS

Before starting this chapter, it should be clear that it is not suited to say one data structure is better than other. The usage of every data structure comes along different applications of it.

E.g if there’s a requirement to quickly access elements and memory is sufficient, an array can be used but if memory has to managed efficiently and access time is not given priority, Linked List is a good option to use.

The differences between the data structures can be measured as functions or in terms of cost, speed, etc. Some of these parameters are:

Accessing Elements :

Array: Since an array is a contiguous block of memory, it takes constant time to access elements in the array. It as also mentioned earlier that the array utilizes base addressing to access the elements.

Time Complexity: O(1)

**Linked List: **There are independent nodes stored in discontinuous memory locations. Initially, the address of the head node is given and to access the elements in a list we need to traverse sequentially. So it takes linear time.

Time Complexity: O(n)

Memory Usage :

Array: To declare an array we need to know the memory needed before inserting values in it because contiguous memory is required for the array. To achieve the flexibility of insertion, we initially assign a large block of memory to the array is utilize only some part for initially storing the values. So there is a heavy chance of memory wastage in the array.

**Linked List: **Since the linked list does not need to be contiguously allocated and each node is inserted in memory at the time it is created, there is no memory wastage involved in the linked list. Although in a linked list, extra memory is used for pointer variables.

NOTE: Memory usage differs from memory management. When it comes to usage, a linked list also uses the amount of memory which is used by array because of pointer overhead.

Here’s an example:

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

But this case would differ and the linked list would be useful is the data part is very large and the pointer is small.

Cost Of Insertion:

  1. At the Beginning

    ARRAY: O(n)

    LINKED LIST: O(1)

  2. At the end:

    Array: O(1) if array is not full and O(n) if array is full

    **Linked List: **O(n)

  3. At ith Position:

    Array: O(n)

    **Linked List: **O(n)

Chapter 4: Concept Of Pointers in C/C++

Pointers are the variable which holds the address of other variables. Pointers are used to carry references to other variables. Pointer Variables are strongly typed. A pointer variable of a particular data type is needed to store the address of the corresponding variable. For an integer type variable, the pointer has to be an integer, this extends to the fact that a char value can only be pointed by a character pointer and also goes similar for other data types.

When we talk about memory management in programming, we generally deal with the Random Access Memory. Whenever a variable is declared, a certain amount of memory in the RAM is allocated which is based on the data type of the variable being used in the program. This allocation in memory in allocated while the program is in running and the addresses allocated to the variables might be different every time a program runs. A pointer variable is represented using an Asterisk.

EXAMPLE:

A working example of pointers in c++ :

int a;
a=5;
int *p;
p=&a;
cout << a << endl;
cout << p << endl;
cout << *p << endl;
Enter fullscreen mode Exit fullscreen mode

Let us assume a fixed section of computer memory.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

Suppose each segment in the memory is of 1 byte.

Now,

If a programmer types int x; in the program, the equivalent memory (4 Bytes) is allocated in the memory. When a pointer *p for x is created, the pointer points to integer x and** p* stores the value of a whereas** p** stores the address of a.

WHY DIFFERENT DATA TYPES FOR DIFFERENT POINTERS?

It might strike that why do pointers have to be strongly typed when they only need to store addresses. The answer is pointers do a lot more than simply storing the addresses. Pointers are also used to dereference the addresses so that we can access and modify the values in these addresses.

HOW ARE NUMBERS ACTUALLY STORED?

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

This is the binary representation of 1010

In modern-dayompilers, an integer takes 4 bytes. The memory is byte addressable.

The leftmost bit stores the information whether the integer is positive or negative. This bit is also called the *sign bit. *

int a = 1010;
int *p=&a;
Enter fullscreen mode Exit fullscreen mode

if the value of p is printed, 202 is printed because the address of byte 0 is used to address the integer variable. Now, if we print *p, the machine understands that p is an integer pointer hence it has 4 bytes beginning with address 202. This is because different pointer data types are needed.

*NOTE: The way the information is stored for float, char and different data types is different in memory. *

#include <stdio.h>

int main()
{
    int a=1010;
    int *p;
    p=&a;
    printf("Value of p is %d and value of *p is %d",p,*p);
    char *c;
    c=(char*)p;
    printf("Value of c is %d and value of *c is %d",c,*c);
}

/*OUTPUT:
Value of p is -629538260 and value of *p is 1010 Value of c is -629
538260 and value of *c is -14                                     

...Program finished with exit code 0                              
Press ENTER to exit console.  */
Enter fullscreen mode Exit fullscreen mode

This happens because char is allocated one byte in memory, once the char pointer points to the typecasted integer it only takes the leftmost byte i.e. Least Significant Byte (LSB) and since it is char so does not extend to check memory for 4 next bytes and outputs decimal equivalent of LSB of binary 1010.

Void Pointer:

A void pointer is a generic pointer type which does not correspond to any particular data type and thus is called void. A void pointer can not be dereferenced i.e. * can not be used with a void pointer

e.g:

int a=1010;
void *v;
v= &a;
int *p;
p=&a;
v=p //Correct 
printf("%d", v*); //Incorrect Because Void Pointer can not be dereferenced.
Enter fullscreen mode Exit fullscreen mode

Pointer To Pointer Concept:

It was made clear in the previous text that pointers are strongly typed. A pointer can point to an memory location which contains the corresponding data type.

int a;
int *p;
p=&a;
Enter fullscreen mode Exit fullscreen mode

But there’s an interesting fact that a pointer can point to another pointer variable. It is represented using double aistrick **.

int a;
int *p;
p=&a;
int **q;
q=&p;
Enter fullscreen mode Exit fullscreen mode

USE CASES:

As Function Arguments (Call By Reference):

Memory is reserved for a program in execution and it is split as:

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

These portions in memory describe the type of data they hold during the execution of the program. Out of these Stack,Static/Global and Code(Text) are fixed whereas Heap is resizable during the execution of the program.

Codes stores all the instructions. This includes declarations and statements. The static and global variables are deal with a specific memory assigned. The local variables and methods are stores in the Stack.

Suppose there’s a program,

#include <stdio.h>
void increment(int a)
{
a=a+1;
}
int main()
{
int a;
a=10;
increment(a);
printf("%d",a);
}
Enter fullscreen mode Exit fullscreen mode

The program intends to increase value of a variable named a but it outputs 10 and does not show the incremented value.

This is because:

When the program is in execution, at first main method is triggered so all the information associated with main function is stored in the stack. This information typically includes local variable, returning to which function, etc. So, some part of stack memory is occupied with information about the main function, which is called a stack frame. Each function has a stack frame in stack.

When the main function calls increment function, the execution of main function is stopped for some time and the control is passed to the called function. Now, another stack frame is created for increment function. In this stack frame, local variables related to increment function are created.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

The real problem was, when the** a=a+1** statement is called, the local variable pertaining to stack frame of function increment gets incremented which only got the value of a from the main function as the argument. The value of** a** was copied to local variable of increment function and the same got incremented and thus did not result in incrementing the local variable of main function because the local variables are not accessible outside the stack frame.

Once the control is passed back to main function, the stack frame created for increment function is cleared. This implies that the lifetime of a local variable is till the time the function is executing.

The above described scenario is called Call By Value.

If we look from a different perspective called Call By Reference, The code would look like,

#include <stdio.h>
int increment( int *p)
{ *p= (*p)+1;
}
int main()
{
int a;
a=10;
increment(&a);
printf("%d", a); //Prints 11
}
Enter fullscreen mode Exit fullscreen mode

While executing the value which is stored in the local variable p ( for function increment) is the address of memory location where a ( of function main) is stored, Hence the value of a is incremented using call By reference.

Concept of Array and Pointers:

When talking about an array, the concept of pointers is very crucial.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

Suppose the array named A is allocated memory as shown above. Since one integer takes 4 bytes the address of next integer will be 4 bytes more than the previous address.

int a= { 2 , 4 , 6 , 8 , 10 }

To print the address of initial element we can print &a[0] or we can also print the name of the array. To print the initial value** a* can be used and for every subsequent value **(a + 1) *

#include <stdio.h>
int main()
{
    int a[]= {2,4,6,8,10};
    printf("Array is: { 2 , 4 , 6 , 8 , 10 } \n"); 
    printf("First Address Using printing &a[0]: %d \n",&a[0]);
    printf("First Address Using printing a :%d \n",a);
    printf("Value at First Address Using *&a[0] is : %d \n",*&a[0]);
    printf("Value at First Address Using *a is : %d \n",*a);
    printf("Value at First Address Using *(a+1) is : %d \n",*(a+1));
}
Enter fullscreen mode Exit fullscreen mode

Array as Function Arguments

Pointer have a role to play when array is passed as function argument.

e.g. To print the sum of an array, the code can be like

#include <stdio.h>
int calculateSum(int a[], int n){
    int i,sum;
    for(i=0;i<n;++i){
        sum+=a[i];
    }
    return sum;
}
int main()
{
    int a[]= {2,4,6,8,10};
    int n=sizeof(a)/sizeof(a[0]);
    int sum=calculateSum(a,n);
    printf("%d",sum);
}

Enter fullscreen mode Exit fullscreen mode

This code calls calculateSum function from main and passes the array and size of array to it. It calculates the sum and outputs 30.

But if we wanted to calculate the size of array in the function calculateSum(), the code would look like,

#include <stdio.h>
int calculateSum(int a[]){
    int i,sum;
    int n=sizeof(a)/sizeof(a[0]);
    printf("%d \n",n);
    for(i=0;i<n;++i){
        sum+=a[i];
    }
    return sum;
}
int main()
{
    int a[]= {2,4,6,8,10};
    int sum=calculateSum(a);
    printf("%d",sum);
}
Enter fullscreen mode Exit fullscreen mode

But this outputs the incorrect sum

This is because whenever the array is passed as function argument, the compiler understands it as a pointer to the first element of the array. This is done to save memory because arrays can be really large and to create a local array and copy content would require large amount of memory in many cases. So the local memory created for the function calculateSum points to the array in main function using pointer.

Pointers and Dynamic Memory

Memory assigned to a program is divided in 4 segments as mentioned earlier.

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

There 4 segments of memory are used to store various components of a program. The memory assigned for stack,static/global and code segments can’t be changed during run time.When we talk about stack, it is used for the local storage while the program is running.

When a function is called, a stack frame is allocated for the function and the required memory is assigned. The most recent called function sits on top of the stack and after the function is done processing, the stack frame is cleared.

The memory for the stack is assigned by the operating system but the actual allocation of memory for each function (the local variables etc. for the function) happens during the runtime.

If the number of functions are too many or by any other reason the memory assigned to the stack if totally utilized, it is called Stack Overflow.

One of the common example of stack overflow is when the recursion goes infinitely.

The allocation and deallocation of memory in the stack happens by a set rule, When a function is called it is pushed on the top of the stack and when it finishes execution it is popped out from the stack. The scope of variable does not change when the function is in stack.

Example for Demonstration:

#include <stdio.h>
int total;
int Square(int x)
{
return x*x;
}
int SquareOfSum(int x , int y)
{
int z = Square(x+y);
return z;
}
int main()
{
int a= 4, b= 8;
total=SquareOfSum(a,b);
printf("Output = %d" , total);
}
Enter fullscreen mode Exit fullscreen mode

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

Most Recently Called Function Sits on Top of the stack and is also the one to be popped first.

PHASE 1: main() is called and the program begins.

PHASE 2: main() calls SumOfSquares()

**PHASE 3: **SumOfSquares() calls Square()

PHASE 4: Square returns value and memory for Square() is deallocated

**PHASE 5: **Control is passed back to main

**PHASE 6: **Program finishes, main() is popped of stack and stack memory is empty

Heap:

Unlike the stack and other segments, the memory for heap is not fixed. It’s size can change during the run time. Also, there is no set rule for memory allocation and deallocation ( like push and pop in stack).

A programmer can decide the amount of memory to be used and the duration for which the memory has to be used. A heap can grow as long as the system does not run out of memory.

Note: When talking about memory associated with heap or stack, it is always referred to the Random Access Memory.

Heap is also called a free pool of memory. The actual implementation of heap is complex and varies from system to system, but the abstract view of heap can be seen as a free pool of memory.

Heap is also called dynamic memory. Using heap is also called dynamic memory allocation.

NOTE: The term “Heap” is also used to denote a data structure but when talking about memory, it does not refer to the “Heap” as used in data structures. Although the data structure “Stack” is the same in memory as in data structures. The implementation of stack, the data structure is used in memory but in terms of heap in memory, it only refers to a free pool of memory.

Dynamic Memory Allocation In C:

Dynamic Memory allocation in C is carried out using 4 functions

  1. malloc
  2. calloc
  3. realloc
  4. free

Dynamic Memory Allocation in C++ is carried out using 2 operators

  1. new
  2. delete

Note: Since C++ has backward compatibility, the functions of C can also be used in C++.

C Program for Dynamic Memory Allocation:

#include <stdio.h>
#include <stdlib.h>
int main()
{
int a; //Since it is a local variable, it goes to stack
int *p;
p = (int*) malloc (sizeof(int));
/* Malloc returns a void pointer to the starting address of the memory block assigned. We have to typecast because it returns void pointer */ 
*p=10;
// free(p);
p= (int*) malloc(sizeof(int));
*p=20;
/* a new memory block to store 20 is assigned at some other address in heap and the previous is not cleared */
free(p);
p=(int*) malloc (20*sizeof(int));
Enter fullscreen mode Exit fullscreen mode

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

>>>>> gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See Google Drawings by reference for details. The img URL below is a placeholder.
(Back to top)(Next alert)
>>>>>

drawing

IMPORTANT POINTS:

  • Memory in heap is not cleared when the pointer is changed, it needs to be cleared every time when the the value/variable stored is not required
  • The malloc(size) function returns a void pointer to the address of memory in heap so the void pointer has to be converted to integer pointer for storing integer values.
  • To allocate memory for array in heap, the malloc function needs to be given the size of array:
    • For integer: 4 * Number of Elements
    • For Character Array: 1 * Number of Elements
  • Assigning/Changing values in heap is only done using referencing i.e. obtaining a pointer to the address of heap and then assigning the value using aistrick. ( *p =10 )

Dynamic Memory Allocation in C++

In C++ the two operators used are New and Delete. They are used similar to malloc and free functions in C. The difference is they are typesafe which means that the typecasting in not required.

Syntax:

int *p;
p = new int;
*p = 10;
p= new int[20];
delete[] p;
Enter fullscreen mode Exit fullscreen mode

Malloc,Calloc, Realloc and Free in C

Malloc:

** **Malloc is most frequently used function for dynamic memory allocation.

Definition: void* malloc( size_t size)

  • It returns a void pointer which has to be typecasted
  • size_t means only positive number can be entered (unsigned)
  • Assignment can only be done using dereferencing (Using pointers)

Syntax: int *p=(int*) malloc(sizeof(int));

Calloc:

Calloc is similar to malloc but takes in two parameters. The first argument is the number of elements of particular data type and second argument is the size of the data type.

Syntax:

int *p = (int*) calloc(1,sizeof(int));
Enter fullscreen mode Exit fullscreen mode

Another difference between malloc and calloc is that, malloc initializes the assigned memory location with some garbage values while calloc does not initialize the values.

Calloc:

If we already have a dynamic block of memory and we need to change the size of the memory, then realloc is used.

Syntax:

int *p = (int*) realloc(void *ptr, size_t size);
Enter fullscreen mode Exit fullscreen mode

*ptr= Address of initial block.

EXAMPLES:

#include <stdio.h>

int main()
{
  int n;
  scanf("%d",&n);
  int *arr=(int*) malloc(n*sizeof(int));
  arr[0]=1;
  printf("%d",arr[0]);
}
Enter fullscreen mode Exit fullscreen mode
#include <stdio.h>

int main()
{
  int *p;
  p=(int*)malloc(sizeof(int));
  *p=10;
  printf("%d",*p);
}
Enter fullscreen mode Exit fullscreen mode

Function Pointers:

Function Pointers are used to store address of functions. It simpler words they are used to point to the function and the function can be called by dereferencing it.The compiler converts the high level language code to the machine understandable binary code. The execution of a program does not directly happen from the secondary storage (Where the program is stored). The executable instructions are copied to the Code/Text section of the Random Access Memory and then the execution begins.

Normally, the execution of a program is sequential. But it can be non sequential when the instruction asks to go to another instruction at some different address i.e. during the function calls.

So function pointers contain the initial address of the memory location of all the instruction of a function. A function pointer is created using the return type of the function.

Syntax:

#include <stdio.h>
int Add(int a, int b)
{
return a+b;
}
int main(){
int c;
int (*p)(int,int);
p=&Add;
c=(*p)(2,3);
}
Enter fullscreen mode Exit fullscreen mode

CHAPTER 5: IMPLEMENTATION OF LINKED LIST

  • The only identifier for a linked list is the address of the first block of the linked list
  • Implementation Declaration in C:

    struct Node

        {
            int data;
            struct Node* link;
        }

Enter fullscreen mode Exit fullscreen mode
  • Implementation Declaration in C++:

    struct Node
        {
            int data;
            Node* link;
        }
    
Chapter 6:  Asymptotic Analysis
Enter fullscreen mode Exit fullscreen mode

Time Complexity of a Program:

A program to find the total number of prime numbers between a given range can be solved by two approaches. One is to iterate from 1 to to** n-1** and check the satisfying numbers to be prime while the other is to iterate till square root of n, onto the same conditions.

When we talk about time complexity of a program, we generally deal with large input sizes, because the factor of time does not simply imply to smaller input values. In the above given condition to find number of prime numbers till a number n.

\
Let n= 10000;

Suppose each iteration takes 1 ms to complete.

Case 1: **Iterating till **n-1:

Total number of iterations = 99999

Total Time Taken = 99999*10^-3= 99.999= 1 minute 40 Seconds 
Enter fullscreen mode Exit fullscreen mode

Case2: **Iterating till sqrt(n):**

** **Total number of iterations: 315 (approx)

Total Time Taken= 315 ms = 0.315 seconds
Enter fullscreen mode Exit fullscreen mode

The performance of both cases regarding the time taken to execute it very evident with the previous example. This is why Time Complexity plays a crucial role in programming solutions for problems requiring large inputs.

Basics Of Asymptotic Notations:

Time complexity of an algorithm can be determined and expressed mathematically using Asymptotic Notations. When we talk about Time complexity, we always refer to the associated algorithm and not the computer program. Complexity of an algorithm can be measured in space and in time. There are three ways to measure it namely best case, average case and the worst case.

There are different types of asymptotic notations widely used to mathematically represent time complexity. One of the most common one is the big-O **notation. The big-O notation generally deals with the large inputs. It gives complexity in terms of input size, N. The big-O notation is used to test the worst case complexity. big-O is expressed as **O(complexity term).

Big-O Basics:

  • Constants are ignored i.e. O(5n) is same is O(n)
  • Order of complexity is:
    • *O(1) < O(log n) < O(n) < O (nlogn) < O(n^2) < O(2^n) < O(n!) *

**Side Note: https://www.bigocheatsheet.com/ **is a great tool to learn more about big-O

Suppose to solve a problem, two algorithms have been devised. Let the running time of algorithm A is a function f(n) and the running time of other algorithm is g(n). If g(n) **grows faster than f(n)** then there must be some values of n for which g(n) is larger than *f(n). *

*Explanation: * There is a function f(n) and another function g(n). The faster growth of functions means that for different values of the independent variable n, the function return larger values. The larger the return values, faster is the growth!

Now, if g(n) grows faster than f(n) simply means that there must be one independent value of n, let’s say n* **for which and for values larger than n* the value returned by g(n) will always be greater than **f(n).

i.e g(n*)>=f(n*)

This can also be written as** f(n*)<= C.g(n)** where C is a constant.

*Now, we can say that f(n) is contained in O(g(n)). *

Introduction To Asymptotic Analysis:

Each algorithm is compared on this basis of time taken to execute and memory required.

**Big-O: **Describes the worst case running time of an algorithm

Big Omega (Ω): Describes the best case running time of an algorithm

**Big-Theta (θ): **Describes the average running case of an algorithm

Analysis of Algorithms:

When we talk about design and analysis of an algorithm, the analysis of an algorithm deals with the efficiency of an algorithm. Efficiency is generally measured in terms of time and space complexity. Generally, more than space, time complexity is considered a driving factor for measuring the efficiency of an algorithm. This is because time is a limiting parameter. However, space is comparatively more flexible as memory can be added and removed. In technical words, time depends on the processing speed while the space is function of available memory. Instead of measuring the time complexity in terms of concrete time because in different systems the running time of an algorithm can vary as per the variable specs of the machine it is being executed upon. This is why the abstract units of measurements are utilized to measure the time complexity of the algorithm. This ensures that the time complexity of an algorithm is uniform across systems and machines.

The abstract units of measurements is based on the steps taken to successfully execute an algorithm. The step is a kind of basic operation. However, the notion of a step varies as per the programming language used. For example the steps used in assembly language might include to fetch the data from registers to the main memory but when devising algorithms to problems, we use a higher level approach and a programming language like c, c++ or java.

Basically we measure time complexity in terms of input size.So, if the input is of size n, then the time taken will be a function of n.

Example: We have an array of n elements, and we want to sort the array in ascending order. Now any naive algorithm will compare each pair of elements of the array and take time proportional to O(n2) time. [The sequential speed of CPU is about 108 operations per second]

Chapter 7 : Understanding Algorithms for Sorting

When we have a large collection of any sort of data, at times we might need to arrange the data in a particular order. Sorting algorithms if implied over a collection like array, list etc. sort the constituents of the collection in a particular order. Typically, the order is increasing, decreasing or even alphabetical.

There are many sorting algorithms associated with every type of collection. Some of the widely known ones are:

  • Quick Sort
  • Bubble Sort
  • Merge Sort
  • Insertion Sort
  • Selection Sort
  • Heap Sort
  • Radix Sort
  • Bucket Sort

Time complexities for the algorithms can be given as:

**Algorithm**   **Best**        **Average** **Worst**
Enter fullscreen mode Exit fullscreen mode
  • Quick Sort Ω(n log(n)) Θ(n log(n)) O(n^2)
  • Bubble Sort Ω(n) Θ(n^2) O(n^2)
  • Merge Sort Ω(n log(n)) Θ(n log(n)) O(n log(n))
  • Insertion Sort Ω(n) Θ(n^2) O(n^2)
  • Selection Sort Ω(n^2) Θ(n^2) O(n^2)
  • Heap Sort Ω(n log(n)) Θ(n log(n)) O(n log(n))
  • Radix Sort Ω(nk) Θ(nk) O(nk)
  • Bucket Sort Ω(n+k) Θ(n+k) O(n^2)

Classification of Sorting Algorithms:

Sorting algorithms can be classified on the basis of the time complexity or space complexity. Time complexity is a measure of the rate of time taken with respect to the input size while space complexity deals with the space taken in memory. With space complexity, there is associated a concept of in-place or constant memory. Some algorithms take constant space while some take memory as the algorithms proceeds in execution ( Local Variables etc.)

Another important factor is stability. Stability describes the property which preserves of relative order of elements. e.g. if there is a collection of 4 playing cards and 9 Black Diamond preceeded 9 Red Heart, then upon sorting it must be preserved.

Understanding Discrete Mathematics for Programming

A good knowledge of discrete mathematics helps tackle many programming questions easlily.

Factorization:

A factor of a number N is a number d such that d completely divides N i.e. N%d=0;

Character Array and Pointers

In C, Strings are represented using character arrays.

How are strings stored?

Size of character array should be = length of string +1

The extra space of 1 is required to indicate that the string is over. A null character is put on the end of string. So the rule in C is, the strings are null terminated.

include

include

int main()

{

char C[20];

c[0]= ‘H’;

c[1]= ‘A’;

c[2]= ‘R’;

c[3]= ‘S’;

c[4]= ‘H’;

printf(“%s”,C); //Prints something else after “HARSH” because string is not null terminated.

c[5]= ‘\0’;

printf(“%s”,C); //Correctly Prints

int len=strlen(C); //It also counts till null character i.e. len=5

}

But while giving string literal the null termination is implicit. i.e. c[20]= “Harsh”, does not require explicit null termination because it is implicitly done.

Top comments (0)