DEV Community

Priscilla Parodi for Elastic

Posted on • Updated on

Data preparation for Data Frame Analysis with Transforms

| Menu | Next Post: Trained Models for Supervised Learning |

When you are using Data Frames (multi-variate analysis), Transforms can be useful in the data preparation step.

It converts existing Elasticsearch indexes into summary indexes, allowing you to define a pivot, which is a set of features that transform the index into a different, more digestible format, providing opportunities for new insights and analysis.

In fact, it performs search aggregations on the source index and indexes the results on the destination index. Therefore, a transformation never takes less time or uses less resources than the aggregation and indexing processes.

You can decide whether you want the transform to run once or continuously.

In this example we have 3 documents from a source index that stores reviews, with these fields: user-id, vendor and review.

Source Index (reviews)
{
...
user-id: 123,
vendor: "abc",
review: 4
},
{
...
user-id: 123,
vendor: "def",
review: 3
},
{
...
user-id: 123,
vendor: "ghi",
review: 5
}
Enter fullscreen mode Exit fullscreen mode

With Transforms we can have a Destination Index grouped by user-id, for example, with the number of reviews per user (3 reviews in this case), and a simple average of the reviews (4+3+5)/4.

Destination Index (reviews-result)
{
...
user-id: 123,
num_reviews(sum): 3,
avg_review: 4
}
Enter fullscreen mode Exit fullscreen mode

And it could be updated if running continuously, which means we could use the data we need in the way we need it, e.g., sum, max, cardinality, etc.

| Menu | Next Post: Trained Models for Supervised Learning |

This post is part of a series that covers Artificial Intelligence with a focus on Elastic's (Creators of Elasticsearch) Machine Learning solution, aiming to introduce and exemplify the possibilities and options available, in addition to addressing the context and usability.

Top comments (0)