Meet Apache SeaTunnel, a new Apache Top-Level Project!

#seatun #opensource #datascience #bigdata

Introduction

Apache SeaTunnel, a cloud-native high-performance massive data integration tool, has recently graduated to become a Top-Level Project (TLP) of the Apache Software Foundation (ASF). Created in 2002, the incubator helps incoming projects to the Apache Software Foundation (called ‘podlings’) adopt the Apache style of governance and operation and guides them to the ASF services available to Apache projects. Podlings also benefit from designated mentors that help projects navigate all the ASF teams and help facilitate a podling’s growth and operation.

We spoke with Eric Gao, the SeaTunnel PMC Member, to learn more about the project, its technology, community goals, and future plans.

Congratulations on graduating to Top-Level Project status! Briefly, how would describe what is Apache SeaTunnel and who is its audience?

Apache SeaTunnel is a high-performance, distributed, massive data integration tool that provides an all-in-one solution for heterogeneous data integration and data synchronization. It is an open-source project of the Apache Software Foundation. Its audience includes data scientists, data engineers, and data analysts who combine data from different sources into a single, unified data store.

Q: What are the key features of Apache SeaTunnel?

A: Some of the key features of Apache SeaTunnel include

Supports multiple synchronization scenarios, including batch synchronization, real-time synchronization, and CDC synchronization;
Support for more than 100 data sources, including transaction database, big data database, cloud database, SaaS, and Binlog, among others;
Supports multiple data computing engines includingSeaTunnel Zeta, Flink, Spark; and
High performance and ease of use for deployment and monitoring.

Q: What are the advantages of Apache SeaTunnel?

A： Apache SeaTunnel stands out in the aspect of

Support hundreds of data sources, fast transmission speed, and high accuracy;
To reduce complexity, API-based connectors can be compatible with offline synchronization, real-time synchronization, full synchronization, incremental synchronization, CDC real-time synchronization, and other scenarios;
Simple and easy to use, provide a draggable and SQL language interface, save developers more time, provide job visual management, scheduling, operation, and monitoring capabilities. Accelerate the integration of low-code and no-code tools;
It is easy to maintain and supports single-node or cluster deployment. If you select SeaTunnel Zeta engine deployment, you do not need to rely on big data components such as Spark and Flink. Q: What is the history and origin story of Apache SeaTunnel?

A: SeaTunnel was started in 2017 as Waterdrop and was renamed SeaTunnel in 2021. Its mission is to spread its heterogeneous data synchronization capability worldwide and to lower the threshold for using Spark, Flink, and other technologies for data integration. The project has released over 40 versions, attracted over 180 contributors from around the world, and plays a key role in the production practices of hundreds of companies.

Q: How has the SeaTunnel community grown, and what are its future plans?

A: Growing the community has been a tough but interesting challenge. The community grows from just a group of people now has grown to 180+ contributors from around the globe. People from China, the U.S., South Korea, India, Europe, the Philippines, Singapore, Australia, etc. are attracted by the same goal.

The SeaTunnel community plans to optimize the indicator monitoring system, dirty data collection, and flow rate control, and automatically create tables and monitor schema changes. Currently, SeaTunnel supports 180+ connectors, and the community hopes to grow this number more rapidly in the future.

We are now kicking off the “SeaTunnel Ambassador” program, where the ambassador represents the SeaTunnel community to take any form of action to facilitate communication, spread the SeaTunnel project to more people around the globe, and encourage people to devote themselves to the project iteration.

Also, the SeaTunnel community is constantly organizing virtual and offline meetups to gather SeaTunnel users and lovers. We have successfully held more than 30 meetups to share the use case, project updates, and personal growth stories from the community members, and we have set foot in Chengdu, Shanghai, and will meet with the community in other regions around the world.

We are trying to make SeaTunnel a world-class open-source data integration platform, anyone who is interested in the project is welcome to join us via Slack Channel: https://www.youtube.com/channel/UCwvRajMzc8aAT4LU-jCwsSA.

Q: How can people learn more about Apache SeaTunnel and try it out?

A: To learn more about SeaTunnel, visit the project’s website, GitHub repo, Twitter, Slack channel, or YouTube channel.