DEV Community

maxwizard01
maxwizard01

Posted on

How I build my own calculator that group any data set given in to classes using R.

How I build my own calculator that group any data set given in to classes using R.

I can vividly remember when I was in college, There was one of our lecturer who liked to keep us busy with too much assignments. there was a time we were given 20 different data sets. they are large ones and the question is to group each of them and calculate some necessary statistical value for each.

I must say I was a little lazy boy then, I just don't like repeating something and stressing myself when I already understand the basic. so I thought through the way to make it faster, But I couldn't see any better way. luckily, it comes to my mind that I should be able to build calculator that could do this for me. that was how I got the idea. lol.

How to group large data set into classes and calculate all important statistical value

Firstly let me say if you don't know how to group data set before kindly click here so that my codes doesn't look like magic. But before I showed you the codes let me explain the step taken to arrive at my simple project.

steps taken to create a calculator that group large data set in to classes and also calculate some statistical value showing important workings

Step1: I need a codes to group the data in to classes with any given class width to form a group frequency table.

step2: I need to calculate the class boundary; this is done by subtracting 0.5 from lower limit and adding 0.5 to upper class limit of each class.

step3: I need to write codes that will calculate the class mark; this is the mid point of each class.

Step4: I need codes to find the multiplication of each class mark with respective frequency.

Step5: I need codes to calculate the mean using summation of Fx divided by summation of the frequency

Step6: I need codes to get me the deviation and its squared.

step7: I have to calculate the variance and standard deviation using the formula.

As you can see above the problem I want my calculator to for me was listed above. I will start picking the step one by one.

How to write a codes that group any data set in to classes using the given width.

I will be writing codes in form of function as i want a reusable codes so everything will be inside function so that I can always call the function when needed. Before the codes if you have no knowledge on how to group data before click here to read my article on as there are much things to explain here so I won't explain much on it. take a look at the following codes

Codes>>

reateGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth #to get the last upper class limit
lowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.
upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.
classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the data
mytable=data.frame(alldata) # turn the table to two column 
mytable
}
Enter fullscreen mode Exit fullscreen mode

The codes above is just a function and when you run it it gives no output since the function has not been executed. So let us try the following data to test it.

Example.

Construct a group frequency table for the following score of 30students.
24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13

14, 16, 16, 13, 16, 13, 18, 19, 7, 9, 8, 6, 20, 26, 28, 29, 30, 18, 19, 15, 17, 12, 14, 15, 14, 16, 13, 12, 14, 13.

Codes>>>

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth #to get the last upper class limit
lowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.
upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.
classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the data
mytable=data.frame(alldata) # turn the table to two column 
mytable
}
 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,15) 
Enter fullscreen mode Exit fullscreen mode

Result>>

   Var1    Freq
1  0 - 9     1
2 10 - 19   13
3 20 - 29    8
4 30 - 39    3
5 40 - 49    4
6 50 - 59    1
7 60 - 69    0
Enter fullscreen mode Exit fullscreen mode


'
If you tried this with different data set and provide your preference class-width you will realize that there is additional class at the last that we never need at all that is a flawless from the codes. So, I need to adjust that by removing the last row.

How to remove the last rows of a table using R.

to remove the last row here is never a problem, all i need to do is to find the index of the last row, then use form a new data without it. like the following:

Codes>>

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth #to get the last upper class limit
lowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.
upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.
classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the data
mytable=data.frame(alldata) # turn the table to two column 
lastIndex=length(mytable$Freq)
newTable=mytable[-lastIndex,]
newTable}

 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,10) 
Enter fullscreen mode Exit fullscreen mode

Result>>

  Var1     Freq
1  0 - 9     1
2 10 - 19   13
3 20 - 29    8
4 30 - 39    3
5 40 - 49    4
6 50 - 59    1
Enter fullscreen mode Exit fullscreen mode

Can you see that we have eliminate the last index. You can call the functions as many times as you want for any large data set, just input the variable name of the data and your class width.
Now I need to deal with other steps. These are not going to take time.

how to create group frequency table with class boundaries using R.

Since we are able to construct class interval with frequency now we need to subtract 0.5 from all lower class and add 0.5 to all upper classes. Study the following codes.

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth
lowerclass=seq(minimumValue,MaximumValue,classwidth)
 upperclass=lowerclass+classwidth-1
classInterval=paste(lowerclass,'-',upperclass)
lowerclassBound=lowerclass-0.5
upperclassBound=upperclass+0.5
classBoundary=paste(lowerclassBound,'-', upperclassBound)
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval))
mytable=data.frame(alldata)
mytable$classBound=classBoundary
pureTable=mytable[!(mytable$Freq==0),]
pureTable
}
 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,10) 
Enter fullscreen mode Exit fullscreen mode

Result

  Var1     Freq  classBound
1   0 - 9    1  -0.5 - 9.5
2 10 - 19   13  9.5 - 19.5
3 20 - 29    8 19.5 - 29.5
4 30 - 39    3 29.5 - 39.5
5 40 - 49    4 39.5 - 49.5
6 50 - 59    1 49.5 - 59.5
Enter fullscreen mode Exit fullscreen mode

Now we should include the codes to calculate the classMark(x) and Fx.
See the codes below.

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth
lowerclass=seq(minimumValue,MaximumValue,classwidth)
 upperclass=lowerclass+classwidth-1
classInterval=paste(lowerclass,'-',upperclass)
lowerclassBound=lowerclass-0.5
upperclassBound=upperclass+0.5
classBoundary=paste(lowerclassBound,'-', upperclassBound)
classMark=(lowerclass+upperclass)/2
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval))
mytable=data.frame(alldata)
Freq=mytable$Freq
Fx=Freq*classMark
mytable$classBound=classBoundary
mytable$classMark(x)=classMark
mytable$Fx=Fx
pureTable=mytable[!(mytable$Freq==0),]

pureTable
}
 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,10) 
Enter fullscreen mode Exit fullscreen mode

Result>>

     Var1 Freq  classBound classMark    Fx
1   0 - 9    1  -0.5 - 9.5       4.5   4.5
2 10 - 19   13  9.5 - 19.5      14.5 188.5
3 20 - 29    8 19.5 - 29.5      24.5 196.0
4 30 - 39    3 29.5 - 39.5      34.5 103.5
5 40 - 49    4 39.5 - 49.5      44.5 178.0
6 50 - 59    1 49.5 - 59.5      54.5  54.5
Enter fullscreen mode Exit fullscreen mode

As you can see above we just include two more columns. Now we can go ahead and calculate for the mean using £Fx/£F i.e the sum of Fx column divided by the sum of frequency column.
To do that we would add few lines of codes.
I shall show you that later in my next article including the standard deviation.
Consider follow me so that you don't miss any of my article.
Happy coding!🖐️🖐️

Discussion (0)