DEV Community

leo
leo

Posted on • Updated on

The future of OpenGauss-AI is here

The two-day training camp was really eye-opening and refreshing. OpenGauss has benefited a lot from its ability to provide extreme multi-core performance, full-link business and data security, AI-based tuning, and efficient operation and maintenance. Especially what OpenGauss is capable of in terms of AI autonomy is a real surprise. Self-tuning, self-diagnosis, self-healing, and self-assembly are the three major autonomy of all DBAs. I believe that for many DBAs, tuning is a process that makes people lose a lot of hair. Thinking of such a scene, the interviewer: So-and-so, your resume says that you will be able to tune database. How do you tune it?

so-and-so: add an index.

Interviewer: Any more?

So-and-so: No more.

Interviewer: Do you know where the door of our company is, go by yourself or I'll take you off?

This scene is an interview scene I imagined, but do you think it is very real? Everyone's resume will write a sentence in the back whenever the database is written, and will be able to optimize the database. But here comes the question. As soon as the interviewer asks about database tuning, everyone says adding indexes. Do you know anything else besides adding indexes? Or do you know all the index related points? What is the maximum TPS for clustered index, non-clustered index, common index, unique index, change buffer, table lock, row lock, gap lock and row lock concurrently? And why is the index selected wrong? Do you all know this?

So tuning has never been a simple matter. Whether it is from the optimization of the hardware architecture, or the optimization of database parameters and the optimization of SQL statements, it is not achieved overnight. Opengauss, on the other hand, simplifies tuning. Historical information collection and visualization, SQL rewriting, slow SQL diagnosis and suggestion, index recommendation, parameter recommendation, parameter performance estimation, trend prediction (AI load, performance prediction), capacity expansion prompt (AI storage space prediction), health index, risk warning, Comprehensive reports, DB system fault detection and early warning, business portrait intrusion threat detection, etc., these are just a corner of the autonomy of OpenGaussAI. The following is the power of OpenGauss AI that I have experienced.

AI in DBMS: Introduction to Parameter Tuning and Diagnosis Capability
Tuning parameter list: preset according to different scenarios, users can also configure according to experience;
tuning method summary: combined with deep reinforcement learning and global optimization algorithms, for different types of parameters Fine-grained tuning.
Tuning effect evaluation: Observe the benchmark results. The preset benchmarks are rich, simple and scalable.
Overview of offline parameter tuning process:

  1. Use the a priori rules summed up from long-term parameter tuning to diagnose parameter configuration and generate database workload report;
  2. Recommend initial parameter configuration based on system workload and environment information, including recommended parameter values, recommendations Maximum value and minimum value (to ensure stability, for users to choose based on their own experience);
  3. Use the trained reinforcement learning model for tuning, or use the global optimization algorithm to search within the given parameter space; 4 . .The conventional way to evaluate the tuning effect is to run benchmarks to get feedback. The tuning framework not only supports conventional benchmarks such as TPC-C, TPC-H, etc., but also provides users with a framework for custom benchmarks, and users only need to do a small amount of work It can be adapted; the parameter tuning that supports the Performance Model is still evolving, and the tuning speed will be further accelerated.

AI in DBMS: Introduction to Slow SQL Discovery Capability

  1. Pre-detection of online services: Before launching a batch of new services, use the SQL diagnostic function to evaluate the estimated execution time of the online services, so that users can refer to whether they should modify the online services. 2. Workload analysis: can analyze the existing workload, automatically divide the existing workload into several categories, and analyze the SQL statement execution cost of this category in turn, as well as the similarity between each category;
  2. SQL perspective: can analyze the workload Visualize, judge the similarity between SQL statements through different colors and distances, so that users can intuitively analyze the characteristics of SQL statements.

AI in Kernel: Query performance prediction based on online learning by AI optimizer
Data collection -> feature extraction -> model training/inference -> multi-task learning

AI in DBMS: Database Monitoring and Anomaly Detection
Database indicators are an important indicator of the health of the database and user behavior. Abnormal behavior in the database may lead to abnormal database indicators. Therefore, it is necessary to effectively monitor indicators.
Database status monitoring refers to all-round real-time monitoring of database operation indicators. The system can discover and identify database anomalies and potential performance problems, and report database anomalies to users in a timely manner. Through statistical analysis reports for various operating indicators, it helps administrators, operation and maintenance personnel, and decision makers to understand the operating status of the database from multiple perspectives. , so as to better respond to the needs and planning of the database.

AI in Kernel: Introduction to the index recommendation capability at the single Query/Workload level
According to the overall information of the user's workload, it recommends the index to be created for the user.
The core method of single query index recommendation:
adopt the relevant theories of index design and optimization, analyze and process the clauses and predicates in the query statement based on the native lexical and syntactic analysis, and then combine the field selectivity, aggregation conditions, multi-table Join relationships, etc. output final recommendations.
The method of index performance verification:
By modifying the corresponding data structure of the optimizer, using the optimizer evaluation, and then judging the impact on the execution plan generated by the optimizer after the index is created. This process does not need to actually create an index, that is, the so-called "hypothetical index" in the industry, and this method is also widely used in the industry.
The core method recommended for workload-level indexing:
Through the workload information obtained by user input (or self-collection), according to the preset model, further evaluate the impact of index creation on the overall workload, so as to filter out the core indexes from the candidate indexes. From a single to the workload side.

AI in SQL: AI in the whole process, building an end-to-end DB4AI system in the database, without manual participation
Key components description:

Automatic Feature Engineering: Automatically extract useful and meaningful features from relevant data tables, reducing the time required for feature engineering. Reduced weekly processing time to days.
Optimal model selection: Among multiple models, select the model that is most suitable for the data.
Hyperparameter optimization: After selecting a suitable model, it can set its optimal parameters.
Optimization algorithm selection: An optimization algorithm (such as SGD, L-BFGS, GD, etc.) is automatically selected to achieve a balance between efficiency and accuracy.
Deep Learning Support: Supports mainstream deep learning frameworks. And automatically select an optimization algorithm to achieve a balance between efficiency and accuracy.
Model Management: Record model accuracy and other related information. Add model life cycle management, support incremental training, model update, and support model import and export.
All of the above are the development and innovation of opengauss in the direction of AI autonomy. The times are progressing, and database technology is changing with each passing day. In this database technology, I believe that OpenGauss will definitely occupy a place!

Top comments (0)