Apache SeaTunnel

Posted on Jun 21, 2022

New release! Support for Kubernetes, multiple connectors added, SeaTunnel 2.1.2 is here!

#bigdata #opensource #apache

In the month or so since the release of Apache SeaTunnel (Incubating) 2.1.1, the community has accepted nearly 100 PRs from teams or individuals around the world to bring you version 2.1.2. Stability is enhanced in this release, and new features, documentation, examples, and other optimizations have also been made.

This article will introduce the Apache SeaTunnel (Incubating) 2.1.2 update in detail.

Release Note:

https://github.com/apache/incubator-seatunnel/blob/2.1.2/release-note.md

Download at: https://seatunnel.apache.org/download

01 Major feature updates

Webhook and Http 2 connectors are added to enhance Http-related data handling capabilities.

Special thanks to tmljob for his contribution.

01 Webhook

This connector allows users to implement a variety of useful functions such as task scheduling, event scheduling, data pushing, etc., as long as the output side provides support for Http service capabilities.

See https://seatunnel.apache.org/docs/2.1.2/connector/source/Webhook for details.

02 Http

The new version supports reading Http interface data, to provide upstream with the ability to transfer the data to SeaTunnel for further processing via Http. Http is a common standard interface, by which you can access a variety of business. It is used as shown below.

Http { url = "http://date.jsontest.com/" result_table_name= "response_body" }

The Kafka and ElasticSearch connectors have been added to the FlinkSQL module and SeaTunnel can now use SQL to read and write data from these data sources.

Transform support for UUIDs and Replace has been added, allowing more flexibility for the simple processing of data. Custom functions have also been added to help users implement various custom business logic.

03 Support for running SeaTunnel on Kubernetes

As Kubernetes has become a must-have component in the cloud-native era, SeaTunnel naturally needs to provide the corresponding support.

The official adaptation of SeaTunnel to run on Kubernetes can be found in the tutorial
https://seatunnel.apache.org/docs/2.1.2/start/kubernetes

02 Specific updates

01 [Connector]

Added support for Spark webhook connector
Optimized the Jar package structure of Connector
Added Spark Replace transform component
Added Spark Uuid transform component
Added Oracle adaptation to Flink's JDBC source
Newly support for Flink HTTP connector
Added Flink registration of custom functions
Flink SQL module adds support for Kafka and ElasticSearch connectors

02 [Core]

Add support for Flink application runtime mode
Support for dynamic addition of Flink configuration

03 [Bug Fix]

Fix some types conversion issues with Clickhouse Sink component
Fix the problem that the Spark runtime script fails for the first time in some cases.
Fix the problem that Spark on yarn cluster mode cannot get the configuration file in some cases.
Fix the problem that Spark extraJavaOptions cannot be empty.
Repair the problem that internal files cannot be decompressed in Spark standalone cluster mode.
Repair the problem that Clickhouse Sink cannot handle multi-node configuration properly.
Repair the error of Flink SQL configuration parsing.
Repair the problem of incomplete matching of Flink JDBC Mysql types.
Repair the problem that variables cannot be set in Flink mode
Repair the problem that the configuration of SeaTunnel cannot be checked in Flink mode.

04 Optimization

Upgrade Jackson version to 12.6
Add wizard for deploying SeaTunnel to Kubernetes
Tweak some generic type code
Added Flink SQL e2e module
Flink JDBC connector added pre SQL and post SQL features
Use @AutoService to generate SPI files
Flink FakeSourceStream support for mock data
Support reading Hive data via Flink JDBC connector
ClickhouseFile support for ReplicatedMergeTree engine
Hive sink support saving ORC format data
Support for Spark Redis sink with custom expiration times
Add Spark JDBC transaction isolation level configuration
Replace Fastjson in code with Jackson

03 Acknowledgements

Thanks to the following contributors for their dedication and hard work,(GitHub IDs, in no particular order), we were able to get this release out quickly, and welcome more people to join contributions to the Apache SeaTunnel (Incubating) community.

v-wx-v, GezimSejdiu, zhongjiajie, CalvinKirs, ruanwenjun, tmljob, Hisoka-X, 1996fanrui, wuchunfu, legendtkl, mans2singh, whb-bigdata, xpleaf, wuzhenhua01, chang-wd, quanzhian, taokelu, gleiyu, chenhu, dijiekstra, tobezhou33, LingangJiang, mosence, asdf2014, waywtdcc, Emor-nj, dik111, forecasted

About SeaTunnel

SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need SeaTunnel?

SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

SeaTunnel Usage Scenarios

data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with SeaTunnel quickly?

Want to experience SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-10u1eujlc-g4E~ppbinD0oKpGeoo_dAw

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Come and join us!

DEV Community