DEV Community

Priscilla Parodi for Elastic

Posted on • Edited on

Elastic Data Frame - Regression Analysis

| Menu | Next Post: Elastic Data Frame - Classification Analysis |

Unlike the Anomaly Detection models, this is a multi-variate analysis, it enables a better understanding of complex behaviors that are described by many features. For this analysis we have 3 models with different algorithms and learning types (Outlier, Regression and Classification) and in this post we'll talk about Regression Analysis.

Regression makes predictions on your data after it determines certain relationships among your data points (Supervised ML).

For example, suppose we are interested in finding the relationship between apartment size and monthly rent in a city, to do this we need to find the relationship between a number of features and a target variable to know if the target variable can/can't be explained by the feature(s).

Alt Text

This example is a one-dimensional regression problem, because we only have one feature variable (size), but we could easily add more features.

Evaluation of the Regression analysis

For the regression analysis we use a variation of the XGBoost algorithm, which combines decision trees with gradient boosting methodologies.

The two measures that we can use to evaluate regression in the stack is R2 and MSE:

Alt Text

  • R Squared (ranges from 0 to 1 | the higher the better): measures goodness of fit, to know if the target variable can/can’t be explained by the feature variable, 1 is a perfect fit.

  • Mean Squared Error (ranges from 0 to 1 | the lower the better) - measures the average error between the actual datapoints and the predicted data points telling you how close a regression line is to a set of points. It does this by taking the distances from the points (purple dots) to the regression line (blue line), these distances (red lines) are the “errors”, and squaring them.

Alt Text

We can measure how well our model is performing by computing the average squared sum of the difference between the true and predicted value - Mean Square Error.

When you view the regression results in Kibana. It provides information about the analysis, model evaluation metrics, total feature importance values, and a scatterplot matrix.

| Menu | Next Post: Elastic Data Frame - Classification Analysis |

This post is part of a series that covers Artificial Intelligence with a focus on Elastic's (Creators of Elasticsearch) Machine Learning solution, aiming to introduce and exemplify the possibilities and options available, in addition to addressing the context and usability.

Top comments (0)