Linear regression analysis is a statistical process used to predict the value of a variable based on the values of other variables. We refer to the variable we want to predict as the dependent variable and the variables used for the prediction as the independent variables.
For our demonstration we are going to use the mexico population history to train our model.
public class MexicoPopulationLineChart {
private final double[][] mexicoPopulation = new double[][]{
{1955, 31452141},
{1960, 36268055},
{1965, 42737991},
{1970, 50289306},
{1975, 58691882},
{1980, 67705186},
{1985, 74872006},
{1990, 81720428},
{1995, 89969572},
{2000, 97873442},
{2005, 105442402},
{2010, 112532401},
{2015, 120149897},
{2020, 125998302},
};
}
We visualize the data using a line chart ending with this result.
Now, we will attempt to predict the population for the next decades to see what we can obtain. For this we are going to use SimpleRegression
class from Apache Commons Math.
SimpleRegression regression = new SimpleRegression();
regression.addData(mexicoPopulation);
NumberFormat formatter = NumberFormat.getNumberInstance();
formatter.setMaximumFractionDigits(0);
for(var i = 2025; i < 2050; i = i + 5) {
System.out.println(i + " - " +
formatter.format(regression.predict(i)));
}
giving us this output
1955 - 29,184,914
1960 - 36,735,619
1965 - 44,286,325
1970 - 51,837,031
1975 - 59,387,737
1980 - 66,938,442
1985 - 74,489,148
1990 - 82,039,854
1995 - 89,590,559
2000 - 97,141,265
2005 - 104,691,971
2010 - 112,242,676
2015 - 119,793,382
2020 - 127,344,088
2025 - 134,894,794
2030 - 142,445,499
2035 - 149,996,205
2040 - 157,546,911
2045 - 165,097,616
2050 - 172,648,322
Finally, let's incorporate the predictions into our chart so that we can visually represent the data.
Top comments (0)