Compute Node is a type of backend nodes in Apache Doris that is designed for remote federated query workloads, such as those in data lake analytics. Unlike normal backend nodes, the Compute Nodes are stateless and do not store any data, making them flexible and easy enough to join a cluster for scaling.
How to enable Compute Nodes in Apache Doris
Firstly, follow the doc to install Apache Doris.
Note:
FE configuration
Add the following configurations to fe.conf
:
prefer_compute_node_for_external_table=true
min_backend_num_for_external_table=1
prefer_compute_node_for_external_table=true
: This means that external table queries will be preferentially assigned to the Compute Nodes. If this is set tofalse
, external table queries will be assigned to any backend nodes. The maximum number of Compute Nodes is determined bymin_backend_num_for_external_table
.min_backend_num_for_external_table
: This is only effective whenprefer_compute_node_for_external_table
is true. If there are less Compute Nodes in the cluster than the value ofmin_backend_num_for_external_table
, some external table queries will be executed on mixed nodes; otherwise, all external table queries will be executed on Compute Nodes. The default value of this parameter is 3.
BE configuration
Add the following configurations to be.conf
:
be_node_role=computation
By default, this parameter is set to mix
, which means the normal mixed backend nodes.
If you have added nodes to your cluster and started the nodes, you can check information of backend nodes. mix
in the NodeRole
field means it is a mixed node while computation
means it is a Compute Node.
In the following example, 192.168.0.128
and 192.168.0.129
are set to be Compute Nodes.
mysql> show backends\G;
************************* 1. row *************************
BackendId: 11007
Cluster: default_cluster
IP: 192.168.0.114
HostName: 192.168.0.114
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2023-06-03 21:51:24
LastHeartbeat: 2023-06-03 21:51:40
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 21
DataUsedCapacity: 0.000
AvailCapacity: 177.323 GB
TotalCapacity: 196.735 GB
UsedPct: 9.87 %
MaxDiskUsedPct: 9.87 %
RemoteUsedCapacity: 0.000
Tag: {"location" : "default"}
ErrMsg:
Version: doris-2.0.0-alpha-a925ec9
Status: {"lastSuccessReportTabletsTime":"2023-06-03 21:51:26","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
NodeRole: mix
*************************** 2. row ***************************
BackendId: 11026
Cluster: default_cluster
IP: 192.168.0.128
HostName: 192.168.0.128
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2023-06-03 21:50:34
LastHeartbeat: 2023-06-03 21:51:40
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 0
DataUsedCapacity: 0.000
AvailCapacity: 177.323 GB
TotalCapacity: 196.735 GB
UsedPct: 9.87 %
MaxDiskUsedPct: 9.87 %
RemoteUsedCapacity: 0.000
Tag: {"location" : "default"}
ErrMsg:
Version: doris-2.0.0-alpha-a925ec9
Status: {"lastSuccessReportTabletsTime":"2023-06-03 21:51:38","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
NodeRole: computation
************************* 3. row *************************
BackendId: 11045
Cluster: default_cluster
IP: 192.168.0.129
HostName: 192.168.0.129
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
LastStartTime: 2023-06-03 21:49:52
LastHeartbeat: 2023-06-03 21:51:40
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 0
DataUsedCapacity: 0.000
AvailCapacity: 177.319 GB
TotalCapacity: 196.735 GB
UsedPct: 9.87 %
MaxDiskUsedPct: 9.87 %
RemoteUsedCapacity: 0.000
Tag: {"location" : "default"}
ErrMsg:
Version: doris-2.0.0-alpha-a925ec9
Status: {"lastSuccessReportTabletsTime":"2023-06-03 21:51:02","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
NodeRole: computation
3 rows in set (0.00 sec)
Test
The following uses MySQL external tables as an example.
Create MySQL Catalog:
CREATE CATALOG mysql properties (
"type"="jdbc",
"jdbc.user"="root",
"jdbc.password"="NewPass4321!",
"jdbc.jdbc_url"="jdbc:mysql://192.168.0.250:3306/test",
"jdbc.driver_url"="mysql-connector-java-8.0.25.jar",
"jdbc.driver_class"="com.mysql.cj.jdbc.Driver"
)
Query external tables from the Catalog:
mysql> set enable_profile = true;
Query OK, 0 rows affected (0.00 sec)
mysql> select date,user_src,new_order,payed_order from mysql.test.order_analysis limit 2;
+--------------------+------------+-----------+------------+
| date | user_src | new_order | payed_order|
+--------------------+------------+-----------+------------+
| 2015-10-12 00:00:00| QR code | 15253 | 13210 |
| 2015-10-14 00:00:00| H5 page | 17134 | 11270 |
+--------------------+------------+-----------+------------+
2 rows in set (0.03 sec)
Check FE WebUI QueryProfile
From the SQL Profile, you can see that the query on this MySQL Catalog external table is executed by a Compute Node, instead of a mixed node.
Ingest data from the external table into Doris:
mysql> create table test_01 as select * from mysql.test.order_analysis;
ERROR 1105 (HY000): Unexpected exception: errCode = 2, detailMessage = Failed to execute CTAS Reason: errCode = 2, detailMessage = Failed to find 3 backend(s) for policy: cluster=default_cluster | query=false | load=false | schedule=true | tags=[{"location" : "default"}] | medium=HDD
mysql>
From the above, you can tell that the create table as select
operation fails. This is because in this case, we have only one mixed node while the table is created with 3 replicas by default. This is also a demonstration of the fact that Compute Nodes are stateless and do not store any tablet replicas.
You can fix the above problem by specifying the tablet number to be 1 upon table creation. Again, in this case, we have two Compute Nodes and one mixed node (192.168.0.114
).
mysql> create table test_01 PROPERTIES("replication_num" = "1") as select * from mysql.test.order_analysis;
Query OK, 5061 rows affected (0.29 sec)
{'label':'insert_9c013d7ccf064a16_a7ca128d72869a35', 'status':'VISIBLE', 'txnId':'1'}
mysql> select count(*) from test_01;
+----------+
| count(*) |
+----------+
| 5061 |
+----------+
1 row in set (0.07 sec)
mysql>
Then, execute the query on Doris internal table. It can be told from the WebUI that the query is executed by a mixed node:
Offlining the Compute Nodes
Taking a Compute Node offline is the same as doing that to a mixed node, except it is faster. This is because Compute Node do not store any data, so there won't be any tablet balancing processes.
alter system DECOMMISSION backend "192.168.0.128:9050";
See? The Compute Nodes in Apache Doris are quick and easy to use. If you need any help, come join the Apache Doris community on Slack
Top comments (0)