Universal Data Migration: Using Slingdata to Transfer Data Between Databases

#slingdata #clickhouse #postgres #datamigration

Overview

Data management can be complex, especially when dealing with high-volume data pipelines. Whether you're moving data between databases, from file systems to databases, or vice versa, the Extract & Load (EL) approach can streamline these processes. Enter Slingdata, a powerful, free CLI tool built with Go that offers a straightforward solution for creating and maintaining robust data pipelines.

What is Slingdata?

Slingdata is a passion project turned into a practical tool that simplifies data movement across various systems. Its key focus is on the Extract & Load (EL) process, enabling efficient data transfer between:

Database to Database
File System to Database
Database to File System

Key Features of Slingdata

Single Binary deployment (built with Go). See the installation page.
Use Custom SQL as a stream.
Manage / View / Test / Discover your connections with the sling conns sub-command.
Use Environment Variables for connections.
Provide YAML or JSON configurations (perfect for git version control).
Powerful Replication logic, to replicate many tables with a wildcard (my_schema.*).
Reads your existing DBT connections.
Use your environment variable in your YAML / JSON config (SELECT * from my_table where date = '{date}').
Convenient Transformations, such as the flatten option, which auto-creates columns from your nested fields.
Run Pre & Post SQL commands.
And many more!

Sample EL Process: ClickHouse to PostgreSQL

To showcase Slingdata’s capabilities, let's walk through a sample Extract & Load process moving data from ClickHouse to PostgreSQL.

Prerequisites

Ensure you have:

ClickHouse installed and running.
PostgreSQL installed and running.
Slingdata installed.

Step 1: Configure the Source Database

Let's assume our source database is ClickHouse. We can export a new environment variable or add the connection to the env.yaml of Slingdata files:

export MARKETING_URL="postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable"

To test this connection:

sling conns test MARKETING_URL

You should see the result:

INF success!

We will do the same for ClickHouse:

export DATAWAREHOUSE_URL="http://sling_user:sling_pass@clickhouse:8123/marketing"

To test this connection:

sling conns test DATAWAREHOUSE_URL

You should see the result:

INF success!

Now, let's create a database and a table in ClickHouse:

create database marketing;

CREATE TABLE `marketing`.users (
    id UInt64,
    name String,
    family String,
    age UInt8
) ENGINE = MergeTree()
ORDER BY id;

And in PostgreSQL, create a table with the same schema:

create table users(
   id serial not null primary key,
   name text not null,
   family text,
   age text not null
);

Insert data into ClickHouse:

insert into marketing.users values(1,'Shahab', 'Ranjbary', 28);
insert into marketing.users values(2,'Sepher', 'Ranjbary', 18);
insert into marketing.users values(3,'Mostafa', 'Fekri', 38);

Now, to move data to PostgreSQL, you can either create a replication file or run the following command:

sling run --src-conn DATAWAREHOUSE_URL --src-stream marketing.users --tgt-conn MARKETING_URL --tgt-object public.users

Conclusion

Slingdata offers a powerful and flexible way to manage data pipelines with ease. Its extensive features, combined with simple configurations and robust performance, make it an excellent tool for anyone dealing with high-volume data transfers.