DEV Community

Cover image for FeatInsight: Leveraging OpenMLDB for Highly Efficient Feature Management and Orchestration
Hana Wang
Hana Wang

Posted on

FeatInsight: Leveraging OpenMLDB for Highly Efficient Feature Management and Orchestration

The OpenMLDB community has recently released a new open-source feature platform product — FeatInsight (https://github.com/4paradigm/FeatInsight).

FeatInsight is a sophisticated feature store service, leveraging OpenMLDB for efficient feature computation, management, and orchestration.

FeatInsight provides a user-friendly user interface, allowing users to perform the entire process of feature engineering for machine learning, including data import, viewing and update, feature generation, store, and online deployment. For offline scenarios, users can choose features for training sample generation for ML training; for online scenarios, users can deploy online feature services for real-time feature computations.

Key Features

The main objective of FeatInsight is to address common challenges in machine learning development, including facilitating easy and quick feature extraction, transformation, combination, and selection, managing feature lineage, enabling feature reuse and sharing, version control for feature services, and ensuring consistency and reliability of feature data used in both training and inference processes. Application scenarios include the following:

  • Online Feature Service Deployment: Provides high-performance feature storage and online feature computation functions for localized deployment.

  • MLOps Platform: Establishes MLOps workflow with OpenMLDB online-offline consistent computations.

  • FeatureStore Platform: Provides comprehensive feature extraction, deletion, online deployment, and lineage management functionality to achieve low-cost local FeatureStore services.

  • Open-Source Feature Solution Reuse: Supports solution reuse locally for feature reuse and sharing.

  • Business Component for Machine Learning: Provides a one-stop feature engineering solution for machine learning models in recommendation systems, natural language processing, finance, healthcare, and other areas of machine learning implementation.

For more content, please refer to FeatInsight Documentation.

QuickStart

We will use a simple example to show how to use FeatInsight to perform feature engineering. The usage process includes the following four steps: data import, feature creation, offline scenarios, and online scenarios.

1. Data Import

Firstly, create database test_db and data table test_table. You can use SQL to create.

CREATE DATABASE test_db;
CREATE TABLE test_db.test_table (id STRING, trx_time DATE);
Enter fullscreen mode Exit fullscreen mode

Or you can use the UI and create it under “Data Import”.

For easier testing, we prepare a CSV file and save it to /tmp/test_table.csv. Note that, this path is a local path for the machine that runs the OpenMLDB TaskManager, usually also the machine for FeatInsight. You will need access to the machine for the edition.

id,trx_time
user1,2024-01-01
user2,2024-01-02
user3,2024-01-03
user4,2024-01-04
user5,2024-01-05
user6,2024-01-06
user7,2024-01-07
Enter fullscreen mode Exit fullscreen mode

For online scenarios, you can use the command LOAD DATA or INSERT. Here we use "Import from CSV".

The imported data can be previewed.

For offline scenarios, you can also use LOAD_DATA or "Import from CSV".

Wait for about half a minute for the task to finish. You can also check the status and log.

2. Feature Creation

After data imports, we can create features. Here we use SQL to create two basic features.

SELECT id, dayofweek(trx_time) as trx_day FROM test_table
Enter fullscreen mode Exit fullscreen mode

In “Features”, the button beside “All Features” is to create new features. Fill in the form accordingly.

After successful creation, you can check the features. Click on the name to go into details. You can check the basic information, as well as preview feature values.

3. Offline Samples Export

In “Offline Scenario”, you can choose to export offline samples. You can choose the features to export and specify the export path. There are “More Options” for you to specify the file format and other advanced parameters.

Wait for about half a minute and you can check the status at “Offline Samples”.

You can check the content of the exported samples. To verify online-offline consistency provided by FeatInsight, you can record the result and compare it with online feature computation results.

4. Online Feature Service

In “Feature Services”, the button beside “All Feature Services” is to create a new feature service. You can choose the features to deploy, and fill in the service name and version accordingly.

After successful creation, you can check service details, including the feature list, dependent tables, and lineage.

Lastly, on the “Request Feature Service” page, we can key in test data to perform online feature calculation, and compare it with offline computation results.

Summary

This example demonstrates the complete process of using FeatInsight. By writing simple SQL statements, users can define features for both online and offline scenarios. By selecting different features or combining feature sets, users can quickly reuse and deploy feature services. Lastly, the consistency of feature computation can be validated by comparing offline and online calculation results.

If you want to have a further understanding of how to use FeatInsight and its application scenarios, please refer to Application Scenarios.

Appendix: Advanced Functions

In addition to the basic functionalities of feature engineering, FeatInsight also provides advanced functionalities to facilitate feature development for users:

  • SQL Playground: Offers debugging and execution capabilities for OpenMLDB SQL statements, allowing users to execute arbitrary SQL operations and debug SQL statements for feature extraction.

  • Computed Features: Enables the direct storage of feature values obtained through external batch computation or stream processing into OpenMLDB online tables. Users can then access and manipulate feature data in online tables.

Read More:


For more information on OpenMLDB:

This post is a re-post from OpenMLDB Blogs.

Top comments (0)