To help users minimize related costs, the 5.0.0 GA version has made many optimizations at the API level as well. According to some community feedback, Data Sharding API was too complex and difficult to understand. After a community-level discussion, we decided to provide a brand-new data sharding API in the new GA version.
With Apache ShardingSphere project positioning changed from a database middleware to a distributed database ecosystem, we had to develop a transparent data sharding function. To be precise, in the 5.0.0 GA, we provide users with Auto Sharding Strategy, so they don't need to worry about the details of the databases and tables because they can use auto sharding to specify the number of shards. Due to the new pluggable architecture and some enhanced functions such as shadow database stress testing, kernel function APIs have been adjusted accordingly. In this section, we introduce the adjustments made in different APIs.
Data Sharding API
Following the previous 4.x version was released, users often reached out to us in the community and complained that the API for data sharding was too complex and hard to use. The code block below shows you the data sharding configuration in the 4.1.1 GA version. In the old version, there were five sharding strategies, namely standard
, complex
, inline
, 'hint', and none
. It was difficult for users to understand and use different parameters of different sharding strategies.
shardingRule:
tables:
t_order:
databaseStrategy:
standard:
shardingColumn: order_id
preciseAlgorithmClassName: xxx
rangeAlgorithmClassName: xxx
complex:
shardingColumns: year, month
algorithmClassName: xxx
hint:
algorithmClassName: xxx
inline:
shardingColumn: order_id
algorithmExpression: ds_${order_id % 2}
none:
tableStrategy:
...
In the 5.0.0 GA version, we simplify the sharding strategies in Data Sharding API. First, the original inline
, strategy is now removed, and we retain the remaining four sharding strategies i.e. standard
, complex
, hint
and none
.
At the same time, the Sharding Algorithm is extracted from Sharding Strategy. Now users can configure it under the property shardingAlgorithms
and shardingAlgorithmName
as a reference in Sharding Strategy.
- !SHARDING
tables:
t_order:
databaseStrategy:
standard:
shardingColumn: order_id
shardingAlgorithmName: database_inline
complex:
shardingColumns: year, month
shardingAlgorithmName: database_complex
hint:
shardingAlgorithmName: database_hint
none:
tableStrategy:
...
shardingAlgorithms:
database_inline:
type: INLINE
props:
algorithm-expression: ds_${order_id % 2}
database_complex:
type: CLASS_BASED
props:
strategy: COMPLEX
algorithmClassName: xxx
database_hint:
type: CLASS_BASED
props:
strategy: HINT
algorithmClassName: xxx
The code block above is the new configuration, which differs from the Sharding configuration in the 4.1.1 GA version. The new sharding API is more concise and clear.
To help users reduce configuration workload, Apache ShardingSphere provides many built-in sharding algorithms, and they can also choose custom settings via the sharding algorithm CLASS_BASED
.
To implement transparent data sharding, we add Automated Sharding Strategy into the 5.0.0 GA version. The code block below shows you the difference between Automated Sharding Strategy configuration and manual sharding strategy configuration:
rules:
- !SHARDING
autoTables:
# Automated Sharding Strategy
t_order:
actualDataSources: ds_0, ds_1
shardingStrategy:
standard:
shardingColumn: order_id
shardingAlgorithmName: auto_mod
keyGenerateStrategy:
column: order_id
keyGeneratorName: snowflake
shardingAlgorithms:
auto_mod:
type: MOD
props:
sharding-count: 4
tables:
# Manual Sharding Strategy
t_order:
actualDataNodes: ds_${0..1}.t_order_${0..1}
tableStrategy:
standard:
shardingColumn: order_id
shardingAlgorithmName: table_inline
dataBaseStrategy:
standard:
shardingColumn: user_id
shardingAlgorithmName: database_inline
Automated Sharding Strategy must be configured under autoTables
attribute. Users only need to specify the data source for data storage as well as the number of shards via Automated Sharding Algorithm. They no longer need to manually set data distribution through actualDataNodes
, or to pay extra attention to setting database sharding strategy and table sharding strategy, as Apache ShardingSphere automatically helps users manage data sharding.
We also remove defaultDataSourceName
from Data Sharding API. We have repeatedly highlighted that Apache ShardingSphere is a distributed database ecosystem now. The message we want to send to users is that you can directly use the services provided by Apache ShardingSphere but when you use the services, you'll probably feel like you are just using a traditional database. You don't have to perceive underlying database storage. Apache ShardingSphere's built-in SingleTableRule
can manage single tables beyond data sharding, aiming to help users implement single table automatic loading & routing.
Additionally, to further simplify configuration, in conjunction with the defaultDatabaseStrategy
and defaultTableStrategy
sharding strategies in Data Sharding API, defaultShardingColumn
as the default sharding key is added as well.
When multiple tables have the same sharding key, the user only needs to use the default defaultShardingColumn
configuration rather than shardingColumn
. The sharding strategy of the t_order table is set via the default defaultShardingColumn
configuration(see the code below).
rules:
- !SHARDING
tables:
t_order:
actualDataNodes: ds_${0..1}.t_order_${0..1}
tableStrategy:
standard:
shardingAlgorithmName: table_inline
defaultShardingColumn: order_id
defaultDatabaseStrategy:
standard:
shardingAlgorithmName: database_inline
defaultTableStrategy:
none:
Read/Write Splitting API
We didn't make a lot of changes to the Read/write Splitting API in the 5.0.0 GA version. We only adjusted from MasterSlave
to ReadWriteSplitting
while other usages are unchanged. The following code block shows you the differences between the Read/write Splitting API of the 4.1.1 GA version and that of the 5.0.0 GA version.
# 4.1.1 GA Read/Write Splitting API
masterSlaveRule:
name: ms_ds
masterDataSourceName: master_ds
slaveDataSourceNames:
- slave_ds_0
- slave_ds_1
# 5.0.0 GA Read/Write Splitting API
rules:
- !READWRITE_SPLITTING
dataSources:
pr_ds:
writeDataSourceName: write_ds
readDataSourceNames:
- read_ds_0
- read_ds_1
Additionally, the High Availability function developed in the pluggable architecture plus Read/write Splitting can provide an automated switch between master and slave, producing a high availability version of read-write splitting. If you are interested in the high-availability function, keep an eye on our GitHub repo or socials. We will soon publish related documents and technical blogs.
Encryption & Decryption API
We add queryWithCipherColumn
property at the table
level into Encryption & Decryption API, making it convenient for users to switch plaintext and ciphertext of encrypted/decrypted fields in a table. There are no other changes in the 5.0.0 version API.
- !ENCRYPT
encryptors:
aes_encryptor:
type: AES
props:
aes-key-value: 123456abc
md5_encryptor:
type: MD5
tables:
t_encrypt:
columns:
user_id:
plainColumn: user_plain
cipherColumn: user_cipher
encryptorName: aes_encryptor
order_id:
cipherColumn: order_cipher
encryptorName: md5_encryptor
queryWithCipherColumn: true
queryWithCipherColumn: false
Shadow Database Stress Testing API
We completely adjust the Shadow Database Stress Testing API in version 5.0.0 GA. The first adjustment is the deletion of logical columns in Shadow Database, and the creation of Shadow Database Matching Algorithm to help users flexibly control routing.
The code block below is the Shadow Database Stress Testing API of the old 4.1.1 GA version. Honestly, the function is quite simple: according to the logic column value, users can judge whether the shadow database stress test is enabled or not.
shadowRule:
column: shadow
shadowMappings:
ds: shadow_ds
In the 5.0.0 GA version, Shadow Database Stress Testing API is much more powerful. Users can enable the test via enable
attribute. At the same time, fine-grained control of production tables is implemented.
The new API also supports a variety of matching algorithms, such as column value matching algorithm, column regular expression matching algorithm, and SQL comment matching algorithm.
rules:
- !SHADOW
enable: true
dataSources:
shadowDataSource:
sourceDataSourceName: ds
shadowDataSourceName: shadow_ds
tables:
t_order:
dataSourceNames:
- shadowDataSource
shadowAlgorithmNames:
- user-id-insert-match-algorithm
- simple-hint-algorithm
shadowAlgorithms:
user-id-insert-match-algorithm:
type: COLUMN_REGEX_MATCH
props:
operation: insert
column: user_id
regex: "[1]"
simple-hint-algorithm:
type: SIMPLE_NOTE
props:
shadow: true
foo: bar
Due to the word limit of the article, we cannot introduce the shadow database stress testing function in detail - but we will share more related technical content soon. If you're interested in shadow database matching algorithms, please read Shadow Algorithm
Conclusion
In the future, we will continue to develop more new features of the pluggable kernel to expand the Apache ShardingSphere ecosystem with amazing functions. We will make more efforts to optimize Apache ShardingSphere. ** As always, you're welcome to join us in developing the Apache ShardingSphere project.**
Open Source Project Links:
ShardingSphere Github: https://github.com/apache/shardingsphere
Twitter: https://twitter.com/ShardingSphere
ShardingSphere Slack Channel:https://join.slack.com/t/apacheshardingsphere/shared_invite/zt-sbdde7ie-SjDqo9~I4rYcR18bq0SYTg
GitHub Issues:https://github.com/apache/shardingsphere/issues
Contributor Guide:https://shardingsphere.apache.org/community/cn/contribute/
Top comments (0)