DEV Community

mrboogiej
mrboogiej

Posted on

Open Source ETL Project For Startups

To start your data-driven journey, there are many ways to achieve, you can build everything from the scratch or build based on the ability of third party tools, which is much more efficient and cost effective especially a good choice for most of startups.

Building based on Luigi & Metabase is what to recommend to achieve stable ETL and fast BI, and you can leverage this data solution in 15 mins.

Watch Video & Subscribe >
image

📚【GitHub - TLDR】
https://github.com/alibabacloud-howto/opensource_with_apsaradb/tree/main/luigi_metabase

💡【About Luigi】
https://github.com/spotify/luigi
Luigi was built at Spotify since 2012, it's open source and mainly used for getting data insights by showing recommendations, toplists, A/B test analysis, external reports, internal dashboards, etc.

It is actually a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop jobs, dumping data to/from databases, running machine learning algorithms, or anything else.

Reference: https://luigi.readthedocs.io/en/stable/

💡【About Metabase】
https://www.metabase.com/docs/latest/users-guide/01-what-is-metabase.html
https://github.com/metabase/metabase
Metabase is an open source business intelligence tool that lets you create charts and dashboards using data from a variety of databases and data sources. It was developed within venture studio Expa and spun out as an easy way for people to interact with data sets. It lets you ask questions about your data, and displays answers in formats that make sense, whether that’s a bar graph or a detailed table.

Metabase uses the default application database (H2) when initially start using Metabase. To enhance the database high availability behind the Metabase BI Server, it's better to use Alibaba Cloud Database RDS PostgreSQL as the backend database of Metabase. Metabase supports PostgreSQL and MySQL as the backend database.

If you are about to launch on Alibaba Cloud, your options can be following:
RDS MySQL
RDS PostgreSQL
PolarDB for MySQL/PostgreSQL

🎁Learn more about these cloud databases, and join in our free trial program.
https://www.alibabacloud.com/product/databases#J_2463051000

Oldest comments (0)