DEV Community

Cover image for Storing and Handling Confidential Data in ClickHouse
Shahab Ranjbary
Shahab Ranjbary

Posted on

Storing and Handling Confidential Data in ClickHouse

Introduction

Handling sensitive data securely is critical, especially when migrating from one database to another. In this article, we'll explore how to manage and store confidential data in ClickHouse, focusing on a scenario where we migrate data from MySQL daily. We will use encryption to ensure data security and implement views to control access.

Scenario

We have a MySQL table containing users' general information, including confidential fields like phone numbers and national codes. Our goal is to migrate this data to ClickHouse every day while ensuring the sensitive fields are encrypted.

MySQL Table Setup

First, let's set up our MySQL database and table:

CREATE DATABASE users;

CREATE TABLE users.general_info (
    id INT,
    first_name VARCHAR(255),
    last_name VARCHAR(255),
    phone_number VARCHAR(255),
    national_code VARCHAR(255)
);

INSERT INTO users.general_info VALUES
(1, 'Shahab', 'Ranjbary', '0912345678', '123456'),
(2, 'Sepehr', 'Ranjbary', '0935999999', '982345'),
(3, 'Mostafa', 'Fekri', '09192222222', '3244442'),
(4, 'Saber', 'Farshbaf', '0912333435', '454554');
Enter fullscreen mode Exit fullscreen mode

ClickHouse Table Setup

Next, we create a similar table in ClickHouse with an additional column for the initialization vector (IV) used in encryption:

CREATE DATABASE users;

CREATE TABLE users.general_info (
    id INT,
    first_name VARCHAR(255),
    last_name VARCHAR(255),
    phone_number VARCHAR(255),
    national_code VARCHAR(255),
    iv VARCHAR(255)
)
ENGINE = MergeTree
PRIMARY KEY id
ORDER BY id
SETTINGS index_granularity = 8192;
Enter fullscreen mode Exit fullscreen mode

Data Migration and Encryption

We'll use the encrypt function to insert data into ClickHouse. This function will encrypt the confidential fields using AES-256-GCM encryption. You can automate this process using Airflow or other scheduling tools.

INSERT INTO users.general_info
SELECT 
    id, 
    encrypt('aes-256-gcm', first_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS first_name,
    encrypt('aes-256-gcm', last_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS last_name,
    encrypt('aes-256-gcm', phone_number, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS phone_number,
    encrypt('aes-256-gcm', national_code, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS national_code,
    'iv1' AS iv
FROM mysql('mysql:3306','users', 'general_info', 'root','123456');
Enter fullscreen mode Exit fullscreen mode

Real-Time Data Handling

If you use Kafka, NATS, or another streaming service, you can use a materialized view in ClickHouse to encrypt data upon ingestion.

CREATE MATERIALIZED VIEW users.general_info_mv TO users.general_info AS
SELECT 
    id, 
    encrypt('aes-256-gcm', first_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS first_name,
    encrypt('aes-256-gcm', last_name, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS last_name,
    encrypt('aes-256-gcm', phone_number, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS phone_number,
    encrypt('aes-256-gcm', national_code, 'keykeykeykeykeykeykeykeykeykey01', 'iv1') AS national_code,
    'iv1' AS iv
FROM kafka('kafka_broker:9092', 'topic', 'group', 'format');
Enter fullscreen mode Exit fullscreen mode

Decryption and Secure Access

For users who need access to decrypted data, we create a view that decrypts the confidential fields on the fly using the tryDecrypt function.

CREATE VIEW users.general_info_decryption AS
SELECT 
    id,
    tryDecrypt('aes-256-gcm', first_name, 'keykeykeykeykeykeykeykeykeykey01', iv) AS first_name,
    tryDecrypt('aes-256-gcm', last_name, 'keykeykeykeykeykeykeykeykeykey01', iv) AS last_name,
    tryDecrypt('aes-256-gcm', phone_number, 'keykeykeykeykeykeykeykeykeykey01', iv) AS phone_number,
    tryDecrypt('aes-256-gcm', national_code, 'keykeykeykeykeykeykeykeykeykey01', iv) AS national_code
FROM users.general_info;
Enter fullscreen mode Exit fullscreen mode

Managing Permissions

We can control data access by granting permissions selectively to users:

-- Grant access to the main table for all users
GRANT SELECT ON users.general_info TO public;

-- Grant access to the decrypted view only to specific users
GRANT SELECT ON users.general_info_decryption TO admin;
Enter fullscreen mode Exit fullscreen mode

Conclusion

We can securely handle confidential data by leveraging ClickHouse's encryption functions and carefully managing user access. This approach ensures that sensitive information remains protected during storage and transit, while still being accessible to authorized users.

Additional Tips

  • Key Management: Use a secure key management system (KMS) to store and rotate encryption keys. Avoid hardcoding keys in your scripts or configurations.
  • Auditing: Implement logging and monitoring to audit access to encrypted and decrypted data.
  • Performance: Be mindful of the performance implications of encryption and decryption. Test your setup to ensure it meets your performance requirements.

By following these best practices, you can enhance the security and compliance of your data handling processes in ClickHouse.

For more information on ClickHouse's encryption functions and best practices, refer to the ClickHouse documentation.

Top comments (2)

Collapse
 
kafeih profile image
kafeih

nice

Collapse
 
kafeih profile image
kafeih

nice