DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Major Release! SeaTunnel 2.3.0-beta supports the self-innovate SeaTunnel Engine and more connectors!

Image description

Apache SeaTunnel(Incubating) 2.3.0-beta is officially released recently. In the new version, the long-awaited SeaTunnel self-developed data synchronization engine — SeaTunnel Engine debuted for the first time. In addition, the new version supports more connectors, and fixes bugs for the early supported connectors.
This article will introduce the details of Apache SeaTunnel(Incubating) 2.3.0-beta version update.

SeaTunnel Engine released!

In the 2.0.3 beta version of SeaTunnel, the community-developed data synchronization engine designed for data synchronization scenarios debuts. As the default engine of SeaTunnel, it supports high-throughput, low-latency, and strong-consistent synchronous job operation, which is faster, more stable, more resource-saving, and easy to use.

The overall design of the ​​SeaTunnel Engine follows the path below:

  1. Faster, SeaTunnel Engine’s execution plan optimizer aims to reduce data network transmission, thereby reducing the loss of overall synchronization performance caused by data serialization and de-serialization, allowing users to complete data synchronization operations faster. At the same time, a speed limit is supported to synchronize data at a reasonable speed.
  2. More stable, SeaTunnel Engine uses Pipeline as the minimum granularity of checkpoint and fault tolerance for data synchronization tasks. The failure of a task will only affect its upstream and downstream tasks, which avoids task failures that cause the entire job to fail or rollback. At the same time, SeaTunnel Engine also supports data cache for scenarios where the source data has a storage time limit. When the cache is enabled, the data read from the source will be automatically cached, then read by the downstream task and written to the target. Under this condition, even if the data cannot be written due to the failure of the target, it will not affect the regular reading of the source, preventing the data from the source is deleted when expired.
  3. Space-saving, SeaTunnel Engine uses Dynamic Thread Sharing technology internally. In the real-time synchronization scenario, for the tables with a large amount but small data sizes per table, SeaTunnel Engine will run these synchronization tasks in shared threads to reduce unnecessary thread creation and save system space. On the reading and data writing side, the design goal of SeaTunnel Engine is to minimize the amount of JDBC connections; in CDC scenarios, SeaTunnel Engine will reuse log reading and parsing resources.
  4. Simple and easy to use, SeaTunnel Engine reduces the dependence on third-party services and can implement cluster management, snapshot storage, and cluster HA functions independently of big data components such as Zookeeper and HDFS. This is very useful for users who currently lack a big data platform, or are unwilling to rely on a big data platform for data synchronization.

In the future, SeaTunnel Engine will further optimize its functions to support full synchronization and incremental synchronization of offline batch synchronization, real-time synchronization, and CDC.

New features

【Basic functions of SeaTunnel Engine】

2.3.0-beta is the debut release version of SeaTunnel Engine, which implements some basic functions, details please refer to: https://github.com/apache/incubator-seatunnel/issues/2272

【Cluster Management】

  • Support stand-alone operation;
  • Support cluster operation;
  • Support autonomous cluster (decentralized), which saves the users from specifying a master node for the SeaTunnel Engine cluster, because it can select a master node by itself during operation, and a new master node will be chosen automatically when the master node fails.
  • Autonomous Cluster nodes-discovery and nodes with the same cluster_name will automatically form a cluster.

【Core functions】

  • Supports running jobs in local mode, and the cluster is automatically destroyed after the job once completed;
  • Supports running jobs in Cluster mode (single machine or cluster), submitting jobs to the SeaTunnel Engine service through the SeaTunnel Client, and the service continues to run after the job is completed and waits for the next job submission;
  • Support offline batch synchronization;
  • Support real-time synchronization;
  • Batch-stream integration, all SeaTunnel V2 connectors can run in SeaTunnel Engine;
  • Supports distributed snapshot algorithm, and supports two-stage submission with SeaTunnel V2 connector, ensuring that data is executed only once.
  • Support job invocation at the Pipeline level to ensure that it can be started even when resources are limited;
  • Supports fault tolerance for jobs at the Pipeline level. Task failure only affects the Pipeline where it is located, and only the task under the Pipeline needs to be rolled back;
  • Support dynamic thread sharing to synchronize a large number of small data sets in real-time.

Connector update

Connector newly-added

With the joint efforts of the community, the 2.3.0-beta version has introduced 10 more connectors, including:

Image description

Connector optimization

  • [Source] [Fake]
  • [Improve] Supports direct definition of data values(row) (2839)
  • [Improve] Improve fake source connector: (2944)
  • Support user-defined map size
  • Support user-defined array size
  • Support user-defined string length
  • Support user-defined bytes length
  • [Improve] Support multiple splits for fake source connector (2974)
  • [Improve] Supports setting the number of splits per parallelism and the reading interval between two splits (3098)
  • [Source] [ClickHouse]
  • [Improve] ClickHouse Source random use host when config multi-host (3108)
  • [Source] [FtpFile]
  • [Improve] Support to extract partition from SeaTunnelRow fields (3085)
  • [Improve] Support parse field from the file path (2985)
  • [Source] [HDFSFile]
  • [Improve] Support to extract partition from SeaTunnelRow fields (3085)
  • [Improve] Support parse field from the file path (2985)
  • [Source] [LocalFile]
  • [Improve] Support to extract partition from SeaTunnelRow fields (3085)
  • [Improve] Support parse field from the file path (2985)
  • [Source] [OSSFile]
  • [Improve] Support to extract partition from SeaTunnelRow fields (3085)
  • [Improve] Support parse field from the file path (2985)
  • [Source] [IoTDB]
  • [Improve] Improve IoTDB Source Connector (2917)
  • Support extract timestamp、device、measurement from SeaTunnelRow
  • Support TINYINT、SMALLINT
  • Support flush cache to the database before prepareCommit
  • [Sink] [Assert]
  • [Improve] 1. Support check the number of rows (2844) (3031):
  • check rows that are not empty
  • check the minimum number of rows
  • check the maximum number of rows
  • [Improve] 2. Support direct define of data values(row) (2844) (3031)
  • [Improve] 3. Support setting parallelism as 1 (2844) (3031)
  • [Sink] [ClickHouse]
  • [Improve] ClickHouse Support Int128,Int256 Type (3067)
  • [Sink] [Console]
  • [Improve] Console sink support print subtask index (3000)
  • [Sink] [IoTDB]
  • [Improve] Improve IoTDB Sink Connector (2917)
  • Support align by SQL syntax
  • Support SQL split ignore case
  • Support restore split offset to at-least-once
  • Support read timestamp from RowRecord
  • [Sink] [Kudu]
  • [Improve] Kudu Sink Connector Support to upsert row (2881)

Connector Bug fixes

  • [Source] [FtpFile]
  • [BugFix] Fix the bug of the incorrect path in the windows environment (2980)
  • [Source] [HDFSFile]
  • [BugFix] Fix the bug of the incorrect path in the windows environment (2980)
  • [Source] [LocalFile]
  • [BugFix] Fix the bug of the incorrect path in Windows environment (2980)
  • [Source] [OSSFile]
  • [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
  • [Sink] [Enterprise-WeChat]
  • [BugFix] Fix Enterprise-WeChat Sink data serialization (2856)
  • [Sink] [FtpFile]
  • [BugFix] Fix the bug of the incorrect path in Windows environment (2980)
  • [BugFix] Fix filesystem get error (3117)
  • [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from config file (3083)
  • [Sink] [HDFSFile]
  • [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
  • [BugFix] Fix filesystem get an error (3117)
  • [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
  • [Sink] [LocalFile]
  • [BugFix] Fix the bug of the incorrect path in the windows environment (2980)
  • [BugFix] Fix filesystem get an error (3117)
  • [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
  • [Sink] [OSSFile]
  • [BugFix] Fix the bug of the incorrect path in the windows environment (2980)
  • [BugFix] Fix filesystem get an error (3117)
  • [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
  • [Sink] [IoTDB]
  • [BugFix] Fix IoTDB connector sink NPE (3080)
  • [Sink] [JDBC]
  • [BugFix] Fix JDBC split exception (2904)

Connector V1 update

  • [Sink] [Spark-Hbase]
  • [BugFix] Handling null values (3099)

Other updates

Feature optimization and update

  • [Improve] [Sink] Support define parallelism for sink connector (2941)
  • [Improve] [all] change Log to @slf4j (3001)
  • [Improve] [format] [text] Support read & write SeaTunnelRow type (2969)
  • [Improve] [api] [flink] extraction unified method (2862)
  • [Feature] [deploy] Add Helm charts (2903)
  • [Feature] seatunnel-text-format

Bug fixes

  • [BugFix] Fix the Assert connector name error in the config/plugin_config file (3127)
  • [BugFix] [starter] Fix connector-v2 flink & spark dockerfile (3007)
  • [BugFix] [core] Fix spark engine parallelism parameter does not work (2965)
  • [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
  • [BugFix] [format] [json] Fix Jason package conflict with spark (2934)
  • [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
  • [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
  • [BugFix] [seatunnel-translation-base] Fix Source restore state NPE (2878)

Documentation update

  • Add the coding guide (2995)

Acknowledgment

Thanks to all the community members who have participated in the 2.3.0- beta version release work, your effort will make SeaTunnel more and more powerful! Here is the list of the contributors to the release(alphabetically by GitHub ID):

Image description

Top comments (0)