Rahul-262001

Posted on May 6, 2024

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

#sql #opensource #etl #singer

In this guide, we'll walk through the process of extracting data from MySQL using tap-mysql and loading it into target-jsonl. This seamless process ensures efficient data transfer while maintaining integrity.

Step 1: Enable Python Virtual Environment (venv)

Even without administrative access, you can set up a Python virtual environment. Here's how:

Create a new virtual environment:
```
python3 -m venv <file_name>
```
Navigate to the Script Directory within the created file and activate the virtual environment:
```
.\activate.bat
```

Now, your virtual environment is active and ready for use.

Step 2: Install `tap-mysql` and `target-jsonl`

Utilize pip to install the necessary packages:

pip install tap-mysql target-jsonl

Step 3: Prepare Configuration Files

tap-mysql requires two input files: config.json and properties.json.

`config.json`:

{
    "host": "127.0.0.1",
    "port": "3306",
    "user": "root",
    "password": "root"
}

`properties.json`:

{
    "streams": [
        {
            "tap_stream_id": "sakila-actor_info",
            "table_name": "actor_info",
            "schema": {
                "properties": {
                    "actor_id": {
                        "inclusion": "available",
                        "minimum": 0,
                        "maximum": 65535,
                        "type": [
                            "null",
                            "integer"
                        ]
                    },
                    "first_name": {
                        "inclusion": "available",
                        "maxLength": 45,
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "last_name": {
                        "inclusion": "available",
                        "maxLength": 45,
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "film_info": {
                        "inclusion": "available",
                        "maxLength": 65535,
                        "type": [
                            "null",
                            "string"
                        ]
                    }
                },
                "type": "object"
            },
            "stream": "actor_info",
            "metadata": [
                {
                    "breadcrumb": [],
                    "metadata": {
                        "selected": true,
                        "replication-method": "FULL_TABLE",
                        "selected-by-default": false,
                        "database-name": "sakila",
                        "is-view": true
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "actor_id"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "smallint unsigned"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "first_name"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "varchar(45)"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "last_name"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "varchar(45)"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "film_info"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "text"
                    }
                }
            ]
        }
    ]
}

Step 4: Generate `properties.json`

Execute the following command in discover mode to generate catalog.json:

tap-mysql --config config.json --discover > catalog.json

Locate the JSON content of the desired table in catalog.json and copy it into another file. Let's name this file selected_table.json.

In selected_table.json, add the following lines within the curly braces to ensure the table is selected:

{
    "streams": [ // Paste the content here ]
}

This step ensures that only the selected table is included for extraction.

Step 5: Run `tap-mysql`

Execute the following command:

tap-mysql --config config.json --catalog selected_table.json

Congratulations! You've successfully extracted data from MySQL using tap-mysql.

Step 6: Send Data to `jsonl` Target

Run the following command to send the data to jsonl target:

tap-mysql --config config.json --catalog selected_table.json | target-jsonl

A file with the same name as the table will be created.

Step 7: Convert Output to a DataFrame

Here's an example of how to convert the output to a DataFrame using Python:

import pandas as pd
import json

data = []
with open("<file_name>.jsonl", "r") as f:
    for line in f:
        data.append(json.loads(line))

df = pd.DataFrame(data)
print(df.columns)

This step allows for further analysis and manipulation of the extracted data.

By following these steps, you've successfully extracted and transformed data from MySQL into a structured format, ready for analysis and insights.

DEV Community

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

Step 1: Enable Python Virtual Environment (venv)

Step 2: Install `tap-mysql` and `target-jsonl`

Step 3: Prepare Configuration Files

`config.json`:

`properties.json`:

Step 4: Generate `properties.json`

Step 5: Run `tap-mysql`

Step 6: Send Data to `jsonl` Target

Step 7: Convert Output to a DataFrame

Top comments (0)

Read next

🤓 Top 12 Open Source Repositories to Watch in 2025 to Become the Ultimate Developer

Thank You For 20K Followers! 🥳

🐧 Linux Command Cheat Sheet: Essential Commands with Examples

SQL Interview Questions: A Comprehensive Guide for Developers

Step 1: Enable Python Virtual Environment (venv)

Step 2: Install tap-mysql and target-jsonl

Step 3: Prepare Configuration Files

config.json:

properties.json:

Step 4: Generate properties.json

Step 5: Run tap-mysql

Step 6: Send Data to jsonl Target

Step 7: Convert Output to a DataFrame

Read next

🤓 Top 12 Open Source Repositories to Watch in 2025 to Become the Ultimate Developer

Thank You For 20K Followers! 🥳

🐧 Linux Command Cheat Sheet: Essential Commands with Examples

SQL Interview Questions: A Comprehensive Guide for Developers

Step 2: Install `tap-mysql` and `target-jsonl`

`config.json`:

`properties.json`:

Step 4: Generate `properties.json`

Step 5: Run `tap-mysql`

Step 6: Send Data to `jsonl` Target